# The Josephus Problem: Part 1: The Problem

Posted in Computer Science

This is Part 1 of an N-part series.

## The Josephus Problem and Variations

The following problem, Cat and Mice, is Puzzle 88 in Boris Kordemsky's The Moscow Puzzles.

Purrer has decided to take a nap. He dreams he is encircled by $$n$$ mice: $$n-1$$ gray and 1 white. He hears is owner saying, "Purrer, you are to eat each $$m$$-th mouse, keeping the same direction. The last mouse you eat must be the white one.

This problem presents a scenario that is nearly identical to a classic problem that has a more grisly backstory: the Josephus problem.

Donald Knuth presents a more unsettling version of this problem, so I'll just let his (rather gleeful) description of the original problem suffice (except - what's this "we" stuff?)

From Section 1.3.2 of The Art of Computer Programming (Vol. 1), Exercise 22 presents the Josephus Problem:

22. (The Josephus Problem.) There are $$n$$ men arranged in a circle. Beginning with a particular position, we count around the circle and brutally execute every $$m^{th}$$ man (the circle closing as men are decapitated). For example, the execution order when $$n = 8, m = 4$$ is $$54613872$$: the first man is fifth to go, the second man is fourth, etc. Write a program which prints outs the order of execution when $$n = 24, m = 11$$. Try to design a clever algorithm that works at high speed when $$n$$ and $$m$$ are large.

### The Sushi Boat Variation

You are at the sushi boat restaurant, where plates of sushi in tiny boats float by in front of you.

There are $$n$$ plates of sushi, each on a sushi boat. Each plate of sushi is labeled $$1 \dots n$$ and arranged in order on the boats.

Beginning at plate 1, you count $$m$$ plates of sushi, stopping at the $$m^{th}$$ boat and taking the plate of sushi off the boat to eat it.

In what order will the sushi plates be stacked when you are done?

Which plate of sushi will be eaten last?

### More Backstory

More background on the Josephus problem and its various solutions is given in a letter to the editor from the Fibonacci Quarterly, Issue 1 of 1976 (Part 1 and Part 2) written in response to an article that gave a solution to the problem of "Idiot's Roulette" (identical to the Josephus problem as presented above) without referencing the Josephus problem. Here is the original article.

## The Tools

To solve the Josephus problem, we need to use several conceptual and computational tools. Below we cover some notation we will use and give links to pages on the charlesreid1.com wiki that are useful.

### Permutations

We can think of the outcome of the Josephus problem as a "Josephus permutation" - a permutation that reorders the sushi plates in the circle (numbered by their positoin in the circle) into the order in which they are removed from the circle.

For example, in Knuth's problem statement, he gives an example of $$n = 8, m = 4$$ (eating every 4th plate of sushi, in a train of 8 sushi boats), which results in the plates being removed in the following order:

$$5, 4, 6, 1, 3, 8, 7, 2$$

To write this permutation using mathematical notation, we write two rows. The top row is the ordering of plates in the circle, the "natural" ordering, and the second row is the order of removal of plates, which is the Josephus permutation:

$$\bigl(\begin{smallmatrix} 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ 5 & 4 & 6 & 1 & 3 & 8 & 7 & 2 \end{smallmatrix}\bigr)$$

As Knuth would point out, this Josephus permutation can be written in 40,319 other equivalent ways (that's 8! total, minus the 1 way shown above) by reordering the columns (as long as we reorder the columns in the top and bottom rows in the same way).

We can read this permutation from left to right as follows:

• The first sushi plate (labeled 1) will be eaten fifth;
• The second sushi plate (labeled 2) will be eaten fourth;
• etc.

Ordering the permutation as above (circle index on top, removal index on bottom) makes it easy to answer the second question, "Which sushi plate will be eaten last?"

We find 8 in the bottom row (the removal index), and read the corresponding number in the top row (the plate number/circle position), plate 6.

If we wish to answer the first question, "in what order will the plates be removed," we have a bit more work to do. We mentioned that the above permutation is 1 of a total of 40,320 equivalent ways of writing the same permutation. Another way of writing it would be to maintain the pairing between top and bottom but sort the bottom elements:

$$\bigl(\begin{smallmatrix} 4 & 8 & 5 & 2 & 1 & 3 & 7 & 6 \\ 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \end{smallmatrix}\bigr)$$

Now we can find the order of the plates by reading the top row right-to-left. The last plate removed is plate 6, so that will be on top of the stack of plates (stacks are first in, last out).

We cover permutations and permutation notation in the context of Rubiks Cubes on the Rubiks Cube/Permutations page of the charlesreid1.com wiki.

### Cycles

While the above permutation notation is useful, the variety of ways of expressing the same permutation is inconvenient. This is where cycles become useful - cycles are a way of implicitly representing both rows of the permutation.

To do this, we "thread" our way through the permutation to create the cycle of which items move to which positions.

Starting with the left-most column of the permutation,

$$\bigl(\begin{smallmatrix} 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ 5 & 4 & 6 & 1 & 3 & 8 & 7 & 2 \end{smallmatrix}\bigr)$$

we know that $$1 \rightarrow 5$$. Now we find the column that has 5 in the top: the fifth column. We write $$5 \rightarrow 3$$. Now we find the column with 3 in the top: the third column. We write $$3 \rightarrow 6$$. Next, we write $$6 \rightarrow 8$$, then $$8 \rightarrow 2$$, then $$2 \rightarrow 4$$. Once we see that the last step of the cycle is $$4 \rightarrow 1$$, which brings us back to the beginning, we write that closed cycle in parentheses. If we have elements left, we repeat the procedure starting with any of the remaining elements. Sometimes there is a single cycle, and sometimes there are multiple cycles. In this case we have two cycles:

$$\left( 1 \, 5 \, 3 \, 6 \, 8 \, 2 \, 4 \right) \left( 7 \right)$$

This indicates that 7 does not change position, i.e., the 7th plate of sushi is eaten 7th.

This notation is very convenient for finding solutions, and it turns out that Knuth gives a general solution procedure for the Josephus problem that involves a rather complicated application of the cycle notation, so a solid understanding of cycle notation is important.

We cover cycle notation in the context of Rubiks Cubes on the Rubiks Cube/Permutations page of the charlesreid1.com wiki (in particular, the sections on Permutation Algebra that cover intercalation products).

Among the many ways of solving the Josephus problem, the easiest method is to just carry out the procedure by hand to find the final Josephus permutation, then use it to answer the original question. This technique is referred to as the simulation technique.

This technique would become infeasible if we were to ask more difficult versions of the Josephus problem, such as posing a scenario where there are 5 million plates of sushi, and we wish to know which position (which plate of sushi) will be eaten 2,999,998th, and we also wish to have the answer instantaneously.

If we're dealing with smaller values of n and m, though, we can simulate a solution to the Josephus problem using a circular linked list.

Briefly, a linked list is a type of list whose elements consist of nodes, small bundles containing a piece of data (the list item's value) and pointers to other nodes (the next and/or previous elements in the list).

See Lists Study Guide for a summary of important information about and properties of lists.

A circular linked list is just what it sounds like: each of the elements points to the next element, and the last element in the list points to the first element in the list. We can use the list by maintaining a pointer to a particular item (the first item in the list).

We can insert items into the list by creating a new node, and inserting it between two nodes (e.g., the front and back of the list) by updating the pointers of the front and back nodes to point to the new list item.

Usefully, we can also remove items from the list with some pointer manipulation. To remove a node, we modify the next/previous pointers of the nodes before/after the doomed node so that they point to each other instead of to the doomed node.

This allows us to explicitly model the sushi boat (a.k.a. the "kill ring") in the Josephus problem.

Circular linked lists are covered on the charlesreid1.com wiki on the Linked Lists/Java/Circular page, and are implemented in Java in the cs/java repo: https://git.charlesreid1.com/cs/java/src/branch/master/lists/linked-lists

A Python implementation used to solve the Josephus problem is available in the cs/josephus repo: repo: https://git.charlesreid1.com/cs/josephus

## TeX for Diagrams

In addition to writing The Art of Computer Programming, which has remained a gold standard algorithm textbook for over 40 years, Knuth also invented the TeX typesetting system, which is also the gold standard for typesetting mathematical equations.

We use the PGF/TikZ package to draw polygons that are useful in illustrating the circles of the Josephus problem and in visualizing various permutations.

A few examples and links to Github Gists with TeX code follow.

### Empty Josephus Circle Diagram

Link to gist with TeX code

Here is the TeX code to generate this diagram:

\documentclass[border=2mm]{standalone}
\usepackage{tikz}
\usepackage{xintexpr}
\usetikzlibrary{shapes.geometric}

\begin{document}
\begin{tikzpicture}[scale=3]

% make a node with variable name pol (with the list of features given) at the location (0,0), and don't label it
\node (pol) [draw=none, thick, black!90!black,rotate=0,minimum size=6cm,regular polygon, regular polygon sides=11] at (0,0) {};

% anchor is "corner 1"
% label is 1/2/3/4/etc
% placement is placement w.r.t. coordinate location
\foreach \anchor/\label/\placement in
{corner 1/$1$/above,
corner 2/$2$/above left,
corner 3/$3$/left,
corner 4/$4$/left,
corner 5/$5$/below left,
corner 6/$6$/below,
corner 7/$7$/below,
corner 8/$8$/below right,
corner 9/$9$/right,
corner 10/${10}$/right,
corner 11/${11}$/above right}
\draw[shift=(pol.\anchor)] plot coordinates{(0,0)} node[font=\scriptsize,\placement] {\label};

% draw a circle connecting all points

% Draw a red dot at the starting point

% optional: black dots at each circle location

\end{tikzpicture}
\end{document}


### Josephus Circle Diagram With Permutation Paths

Next, we can illustrate cycles in the permutation by drawing paths between connected nodes.

The edges are directed (1 -> 4 is not the same as 4 -> 1). We draw both directed and undirected versions.

Link to gist with TeX code

The code to generate these diagrams is below.

Undirected Paths:

\documentclass[border=2mm]{standalone}
\usepackage{tikz}
\usepackage{xintexpr}
\usetikzlibrary{shapes.geometric}

\begin{document}
\begin{tikzpicture}[scale=3]

% make a node with variable name pol (with the list of features given) at the location (0,0), and don't label it
\node (pol) [draw=none, thick, black!90!black,rotate=0,minimum size=6cm,regular polygon, regular polygon sides=11] at (0,0) {};

% anchor is "corner 1"
% label is 1/2/3/4/etc
% placement is placement w.r.t. coordinate location
\foreach \anchor/\label/\placement in
{corner 1/$1$/above,
corner 2/$2$/above left,
corner 3/$3$/left,
corner 4/$4$/left,
corner 5/$5$/below left,
corner 6/$6$/below,
corner 7/$7$/below,
corner 8/$8$/below right,
corner 9/$9$/right,
corner 10/${10}$/right,
corner 11/${11}$/above right}
\draw[shift=(pol.\anchor)] plot coordinates{(0,0)} node[font=\scriptsize,\placement] {\label};

% solution for n = 11, m = 4:
% ( 1 3 7 6 4 ) ( 2 8 ) ( 5 9 11 ) ( 10 )

% internal paths

% cycle (1 3 7 6 4)
\path [-] (pol.corner 1) edge (pol.corner 3);
\path [-] (pol.corner 3) edge (pol.corner 7);
\path [-] (pol.corner 7) edge (pol.corner 6);
\path [-] (pol.corner 6) edge (pol.corner 4);
\path [-] (pol.corner 4) edge (pol.corner 1);

% cycle 2 (2 8)
\path [-] (pol.corner 2) edge (pol.corner 8);
\path [-] (pol.corner 8) edge (pol.corner 2);

% cycle 3 (5 9 11 )
\path [-] (pol.corner 5) edge (pol.corner 9);
\path [-] (pol.corner 9) edge (pol.corner 11);
\path [-] (pol.corner 11) edge (pol.corner 5);

% draw a circle connecting all points

% Draw a red dot at the starting point

% optional: black dots at each circle location

\end{tikzpicture}
\end{document}


Directed Paths:

\documentclass[border=2mm]{standalone}
\usepackage{tikz}
\usepackage{xintexpr}
\usetikzlibrary{shapes.geometric}

\begin{document}
\begin{tikzpicture}[scale=3]

% make a node with variable name pol (with the list of features given) at the location (0,0), and don't label it
\node (pol) [draw=none, thick, black!90!black,rotate=0,minimum size=6cm,regular polygon, regular polygon sides=11] at (0,0) {};

% anchor is "corner 1"
% label is 1/2/3/4/etc
% placement is placement w.r.t. coordinate location
\foreach \anchor/\label/\placement in
{corner 1/$1$/above,
corner 2/$2$/above left,
corner 3/$3$/left,
corner 4/$4$/left,
corner 5/$5$/below left,
corner 6/$6$/below,
corner 7/$7$/below,
corner 8/$8$/below right,
corner 9/$9$/right,
corner 10/${10}$/right,
corner 11/${11}$/above right}
\draw[shift=(pol.\anchor)] plot coordinates{(0,0)} node[font=\scriptsize,\placement] {\label};

% solution for n = 11, m = 4:
% ( 1 3 7 6 4 ) ( 2 8 ) ( 5 9 11 ) ( 10 )

% internal paths

% cycle (1 3 7 6 4)
\path [->, shorten > = 3 pt, blue, shorten < = 4 pt, > = stealth] (pol.corner 1) edge (pol.corner 3);
\path [->, shorten > = 3 pt, blue, shorten < = 4 pt, > = stealth] (pol.corner 3) edge (pol.corner 7);
\path [->, shorten > = 3 pt, blue, shorten < = 4 pt, > = stealth] (pol.corner 7) edge (pol.corner 6);
\path [->, shorten > = 3 pt, blue, shorten < = 4 pt, > = stealth] (pol.corner 6) edge (pol.corner 4);
\path [->, shorten > = 3 pt, blue, shorten < = 4 pt, > = stealth] (pol.corner 4) edge (pol.corner 1);

% cycle 2 (2 8)
\path [->, shorten > = 3 pt, green, shorten < = 4 pt, > = stealth] (pol.corner 2) edge (pol.corner 8);
\path [->, shorten > = 3 pt, green, shorten < = 4 pt, > = stealth] (pol.corner 8) edge (pol.corner 2);

% cycle 3 (5 9 11 )
\path [->, shorten > = 3 pt, red, shorten < = 4 pt, > = stealth] (pol.corner 5) edge (pol.corner 9);
\path [->, shorten > = 3 pt, red, shorten < = 4 pt, > = stealth] (pol.corner 9) edge (pol.corner 11);
\path [->, shorten > = 3 pt, red, shorten < = 4 pt, > = stealth] (pol.corner 11) edge (pol.corner 5);

% draw a circle connecting all points

% draw a red dot at the starting point

% optional: black dots at each circle location

\end{tikzpicture}
\end{document}


## Next Steps: Examples and Solutions

So far in Part 1 we have covered some common forms of the Josephus problem.

In Part 2 we'll cover some examples of different $$n, m$$ values ($$n$$ is circle size, $$m$$ is skip length) and show how the process plays out.

In Part 3 we will show the solution of the special case of $$m = 2$$ (the double-step case).

In Part 4 we will show several ways to solve the general case, and walk through some examples where we apply the solution procedure.

# Approximating Pi (Happy Pi Day)

Posted in Mathematics

## Favorite Pi Approximations

What's your favorite $$\pi$$ approximation?

Some of my favorite approximations of $$\pi$$ come from Ramanujan-Sato series. These are mathematical series that generalize from a remarkable formula for $$\pi$$ given by Srinivasa Ramanujan, an Indian mathematician:

$$\pi^{-1} = \dfrac{\sqrt{8}}{99^2} \sum_{k \geq 0} \dfrac{ (4k)! }{ \left( 4^k k! \right)^4 } \dfrac{ 1103 + 26390k }{ 99^{4k} }$$

This completely novel formula opened up new branches of mathematics and provided a whole new class of $$\pi$$ approximations (the Ramanujan-Sato series) and approximations that are extremely accurate, making them very useful for computer applications. (Each term of the above sequence yields 8 additional decimal points of accuracy of $$\pi$$.) These approximations have also enabled world record calculations of numbers of digits of $$\pi$$.

But those are not the $$\pi$$ approximations that this blog post is about.

This blog post is about another set of $$\pi$$ approximations that I like - these come from another field Ramanujan had mastery over, continued fractions.

Continued fractions provide a whole alternative way of representing all the real numbers - rational and irrational.

## Continued Fractions and Convergents

Back in July of 2017, we wrote a blog post about how to find rational approximations of square roots using continued fractions and convergents, and we implemented a Java program to represent the irrational number $$\sqrt{n}$$ (where $$n$$ is not a perfect square).

In short, continued fractions are a way of expressing numbers, rational and irrational, in terms of recursive fractions, which look something like this:

$$x = b_0 + \cfrac{1}{ b_1 + \cfrac{1}{ b_2 + \cfrac{1}{ b_3 + \cfrac{1}{ b_4 + \cfrac{1}{ b_5 + \dots } } } } }$$

denoted

$$[b_0; b_1, b_2, b_3, \dots]$$

and, when evaluated, results in a series of fractions (rational approximations) called the convergents of the continued fraction.

In the denominator of the above representation, we have a 1 in each position, which makes the continued fraction a simple continued fraction. If the denominators are not 1, the continued fraction is a general continued fraction.

The continued fraction expansion of any rational number (or equivalently, the sequence of convergents) must terminate at some point (even if the number of terms ends up being very large).

The convergents always terminate if a number is rational. Conversely, if you can prove a number's continued fraction representation is a repeated sequence or continues forever, you can prove a number is irrational (a strategy employed in several proofs about properties of $$\pi$$).

## Simple Continued Fractions to Approximate Pi

For $$\pi$$, the convergents of the simple continued fraction (there is a single unique simple continued fraction representation of $$\pi$$) are unpredictable; the first few are:

$$\pi = [3; 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2...]$$

Each additional term leads to increasingly precise fractional approximations for $$\pi$$; they are:

$$[3;7] = \dfrac{22}{7} = 3.\overline{142857}$$
$$[3;7,15] = \dfrac{333}{106} = 3.1415094...$$
$$[3;7,15,1] = \dfrac{355}{113} = 3.1415929...$$
$$[3;7,15,1,292] = \dfrac{103993}{33102} = 3.14159265...$$

## General Continued Fractions to Approximate Pi

Every number has exactly one representation as a simple continued fraction, if we restrict the values of $$b_i$$ to be positive.

However, if we generalize the simple continued fraction to allow any integer in the numerator of the continued fractions, we get a general continued fraction:

$$x = b_0 + \cfrac{a_1}{ b_1 + \cfrac{a_2}{ b_2 + \cfrac{a_3}{ b_3 + \cfrac{a_4}{ b_4 + \cfrac{a_5}{ b_5 + \dots } } } } }$$

Unlike simple continued fractions, one number can be expressed in many ways with many different general continued fractions.

We use two general continued fractions as well as the simple continued fraction representation of $$\pi$$ to generate some $$\pi$$ approximations below.

## Odd Squares and Twos

The following general continued fraction

$$\dfrac{4}{\pi} = 1 + \cfrac{1^2}{ 2 + \cfrac{3^2}{ 2 + \cfrac{5^2}{ 2 + \cfrac{7^2}{ 2 + \cfrac{9^2}{ 2 + \dots } } } } }$$

can be turned into approximations for $$\pi$$ by finding the convergents of the above continued fraction, then reversing the numerator and denominator and multilpying the new numerator by 4.

Implementing a recurrence to turn the above into convergents and printing the first 24 terms and their approximate value:

                                      Pi
Convergent                                                          Approx
--------------------------------------------------------------------------------
8 / 3                                                               2.6666666666666665
52 / 15                                                             3.4666666666666668
304 / 105                                                           2.8952380952380952
3156 / 945                                                          3.3396825396825398
30936 / 10395                                                       2.9760461760461761
443748 / 135135                                                     3.2837384837384835
6115680 / 2027025                                                   3.0170718170718169
112074660 / 34459425                                                3.2523659347188758
1991580840 / 654729075                                              3.0418396189294024
44442113940 / 13749310575                                           3.2323158094055926
967171378320 / 316234143225                                         3.0584027659273318
25444221030900 / 7905853580625                                      3.2184027659273320
655370553511800 / 213458046676875                                   3.0702546177791836
19859578238549700 / 6190283353629375                                3.2081856522619421
590885791980523200 / 191898783962510625                             3.0791533941974261
20266826271207308100 / 6332659870762850625                          3.2003655154095472
684008280009204381000 / 221643095476699771875                       3.0860798011238333
26194878742247361184500 / 8200794532637891559375                    3.1941879092319412
988797092817095519958000 / 319830986772877770815625                 3.0916238066678385
41820004752592427401540500 / 13113070457687988603440625             3.1891847822775947
1745807922530722423852479000 / 563862029680583509947946875          3.0961615264636411
80816804632604843113153342500 / 25373791335626257947657609375       3.1850504153525301
3696894652389922594527576660000 / 1192568192774434123539907640625   3.0999440323738066


(16 decimal points are printed for decimal approximations above.)

This exhibits a pattern seen with general continued fractions, which is that they tend to jump above and below the value they approximate, depending on whether there are an even or odd number of terms in the continued fraction being included.

## Odd Squares, Threes, and Sixes

Another related continued fraction, this one for $$\pi$$ and with a more predictable pattern, is given by:

$$\pi = 3 + \cfrac{1^2}{ 6 + \cfrac{3^2}{ 6 + \cfrac{5^2}{ 6 + \cfrac{7^2}{ 6 + \cfrac{9^2}{ 6 + \dots } } } } }$$

Implementing a computer program to evaluate the convergents from the above patterns yields even more $$\pi$$ approximations:

                                Even More Pi
Convergent                                                          Approx
--------------------------------------------------------------------------------
19 / 6                                                              3.1666666666666665
141 / 45                                                            3.1333333333333333
1321 / 420                                                          3.1452380952380952
14835 / 4725                                                        3.1396825396825396
196011 / 62370                                                      3.1427128427128426
2971101 / 945945                                                    3.1408813408813407
50952465 / 16216200                                                 3.1420718170718169
974212515 / 310134825                                               3.1412548236077646
20570537475 / 6547290750                                            3.1418396189294020
475113942765 / 151242416325                                         3.1414067184965018
11922290683065 / 3794809718700                                      3.1417360992606653
322869019821075 / 102776096548125                                   3.1414796890042549
9388645795842075 / 2988412653476250                                 3.1416831892077552
291703390224616125 / 92854250304440625                              3.1415189855952756
9646071455650881825 / 3070380543400170000                           3.1416533941974261
338203386739761387075 / 107655217802968460625                       3.1415419859977827
12533792135642378629875 / 3989575718580595893750                    3.1416353566793886
489501901570061970946125 / 155815096120119939628125                 3.1415563302845726
20095772843114788169975625 / 6396619735457555416312500              3.1416238066678388
865107029346752986828909875 / 275374479611447760672253125           3.1415657346585473
38971636325356476834702484875 / 12404964652972837218854831250       3.1416160719181865
1833412715214285133654869268125 / 583597200719403932796125015625    3.1415721544829651
89918039850132576392201747480625 / 28621636626586418964957783375000 3.1416106990404735


## When to Use Simple Vs. General Continued Fractions

While general continued fractions are more flexible, allowing a given rational or irrational number to be expressed in a wider variety of ways, it is important to point out how much faster the simple continued fractions converge - with 15 terms, we arrive at around 15 accurate decimal places, versus the 1-3 decimal places of accuracy from over 20 terms in the above approximations.

Using the simple continued fraction representation of $$\pi$$,

$$\pi = 3 + \cfrac{1}{ 7 + \cfrac{1}{ 15 + \cfrac{1}{ 1 + \cfrac{1}{ 292 + \cfrac{1}{ 1 + \dots } } } } }$$

and implementing it with a computer, we get the following table of convergents and their approximate values:

                               Pi Simple
Convergent                                                          Approx
--------------------------------------------------------------------------------
22 / 7                                                          3.1428571428571428
333 / 106                                                       3.1415094339622640
355 / 113                                                       3.1415929203539825
103993 / 33102                                                  3.1415926530119025
104348 / 33215                                                  3.1415926539214212
208341 / 66317                                                  3.1415926534674368
312689 / 99532                                                  3.1415926536189365
833719 / 265381                                                 3.1415926535810779
1146408 / 364913                                                3.1415926535914038
4272943 / 1360120                                               3.1415926535893890
5419351 / 1725033                                               3.1415926535898153
80143857 / 25510582                                             3.1415926535897927
165707065 / 52746197                                            3.1415926535897936


## A Note on the Program

We implemented a program to convert terms in a continued fraction representation (simple or general) into convergents (rational approximations).

We covered how to do this with Java in a previous blog post, but this time we re-implemented it in Python and used some basic object-oriented Python techniques and class decorators (class methods, memoized functions, etc.)

You can find the code here: https://git.charlesreid1.com/cs/python/src/branch/master/math/pi_continued_fraction_convergents.py

We will return to the topic of continued fractions and cover general continued fractions (and importantly, the recurrence relation used to convert them into convergents in the above program) in a future blog post.

# Five Letter Words: Part 5: The Try Trie Tree

Posted in Computer Science

In Volume 4 Fascicle 0 of Donald Knuth's Art of Computer Programming, Knuth introduces a tool for exploring concepts in graph theory: the five-letter words. This is a collection of 5,757 five-letter words compiled by Knuth and useful in exploring ways of constructing efficient algorithms.

The word list is large enough that an $$O(N^2)$$ algorithm will take a solid chunk of CPU time, so there's a definite incentive to think carefully about implementation.

Knuth introduces a list of five-letter words, as well as associated exercises utilizing techniques from dynamic programming and graph theory, among other topics.

We have covered this topic before in prior blog posts:

and a recent addendum to Part 1:

We continue our coverage in this blog post with a newer problem, one that is rated by Knuth at 26, on his scale of 0 to 50:

00  Immediate
10  Simple (1 minute)
20  Medium (quarter hour)
30  Moderately hard
40  Term project
50  Research problem


(from Volume 1, Notes on the Exercises.)

Here's the list of words if you want to play along.

Link to the Stanford Graph Base.

Visit Five Letter Words on the charlesreid1.com wiki for details.

## Introduction to the Try Trie Tree Problem

In this blog post, we'll cover Exercise 35 of Volume 4, Fascicle of Donald Knuth's Art of Computer Programming.

The problem is as follows:

Sixteen well-chosen elements of WORDS(1000) lead to the branching pattern (figure), which is a complete binary trie of words that begin with the letter s. But there's no such pattern of words beginning with a, even if we consider the full collection WORDS(5757).

What letters of the alphabet can be used as the starting letter of sixteen words that form a complete binary trie within WORDS(n), given n?

For the benefit of those without the book, here is an attempt to represent the trie that Knuth includes in the exercise:

                     s

h                 t

e       o         a       e

e   l   r   w     l   r   a   e

sheep             stalk
sheet             stall

shelf             stars
shell             start

shore             steal
short             steam

shown             steel
shows             steep


To answer the question, of whether a complete binary trie can be completed for a given letter, given a set of n words, we construct a "try trie tree," which is a tree data structure that greedily builds a trie with as many branches as possible.

The full trie of $$26^4+1 = 456,977$$ nodes would be expensive to assemble in full, for each starting letter. Instead we use the word list to build up a tree of possible candidate branches for the trie. Once we've constructed all possible branches using the faster but less precise method, we verify that each candidate branch we have constructed either meets our criteria (can be included as a branch in a complete binary trie), or is pruned.

## The Try Trie Tree

To solve this problem, we define a TryTrieTree class that holds the nodes and links that make up our tree. We define some methods for it to perform the assembly and verification operations described below, then assemble one try trie tree for each letter of the alphabet to come up with an answer to the exercise.

## Constructing the Try Trie Tree

The construction procedure for the try trie tree proceeds in two steps:

Step 1 is to assemble a tree, from the top down, by searching the entire space of $$456,977$$ nodes and marking particular nodes and paths on this tree as candidates for the final perfect binary trie.

Step 2 is to revisit the candidate branches, proceeding from the bottom up, and determine if the candidate branches do, in fact, have enough sibling nodes and word matches to form a complete branch in a perfect binary trie.

We start Step 1 at the root node (the starting letter) and proceed from the root down, going level by level.

### Checking for Minimum Number of Matching Words

Step 1 proceeds from the top down and marks branches that are candidates to end up in the final perfect binary trie.

At each level of the trie, we count the number of words in the overall word set that have a prefix matching the prefix corresponding to that node.

For example, the trie node b on the path s-a-b would yield four words:

saber
sable
sabre
sabra


If enough words match, that branch of the trie is a possible candidate to end up in the perfect binary trie (but may be trimmed in Step 2).

Example: If we are assembling the complete trie for s given by the author in the exercise, we can verify that there are at least 16 words that begin with the letter s. We would then verify that there are at least 8 words that begin with sa, which there are. Then we would verify that there are at least 4 words that begin with saa, which there are not, so we would move on to verifying that there are at least 4 words that begin with sab, which there are. We would proceed in this fashion until we had assembled a candidate trie branch, s-a-b-r (which contains two words, sabra and sabre). For Step 1, we keep s-a-b-r as a candidate branch. (We will see in Step 2 that this branch will get trimmed.)

At each level of the trie, we apply the procedure: - At level 1, we require a minimum of $$2^{5-1} = 16$$ words. - At level 2, we require a minimum of $$2^{5-2} = 8$$ words. - At level 3, we require a minimum of $$2^{5-3} = 4$$ words. - At level 4, we require a minimum of $$2^{5-4} = 2$$ words.

NOTE: We are not explicitly constructing the trie, so we don't need to assemble the word leaves.

### Assemble Method

See the Try Trie Tree Code section for the code for the public and private assembly methods.

To peform the assembly of all possible branches of the try trie, we use the assemble() method. This is a public method that starts a recursive call to a private method _assemble().

We are given a starting letter (in the example given by the author, the starting letter is "s"). We explore every possible prefix that starts with the root letter, "sa", "sb", "sc", "sd", and so on.

For each of those possible prefixes, we explore every possible third letter, "saa", "sab", and so on, and then once more in a fourth step, "saaa", "saab", ..., for a total of $$26^4 = 456,976$$ iterations (checks for existence of words starting with a given substring).

A substantial number of these checks will do nothing - from the fact that

$$\frac{5757}{456976} \sim 5e3/5e5 \sim 0.01$$

we know that the maximum number of loops that would actually lead to a branch being added will be 1% of that $$26^4$$ total.

We also know that any efforts to speed up this program should focus on the way we are checking the number of words that start with a given substring - since that's where we'll spend most of our time.

The recursive assembly method takes a prefix string input (which maps to a location in the tree), and it explores all 26 possible children of that prefix (location in the trie).

When a leaf node is reached, at the fourth level, it represents the longest prefix in our trie. This is the base case of the recursive assembly function; the recursive function terminates at a fixed depth of 4.

### Verifying Branches and Bubbling Up Counts

However, as we noted, the counts we are using above in Step 1 are just approximations (checking there are a minimum number of words with a given prefix). There is no way to guarantee that a trie branch can be used in the final complete perfect binary trie until all child leaf nodes have been visited.

This is where Step 2 comes in. In Step 2 we perform a pre-order depth-first traversal of the tree, visiting the leaf nodes first and proceeding from the bottom up. The number of words matching the prefix of each leaf node must be 2, to keep the leaf node.

We then proceed up the tree, level by level, and at each level we require that each node have at least 2 "large" children. This is a recursive definition - for a child to be "large", it must itself have 2 "large" children, or (if it is a leaf node) it must have at least 2 words that match the 4-letter prefix.

In our TryTrieTree, a complete binary trie is only possible if each node at each level has two or more children that are "large enough", where "large enough" means that either (a) both child nodes have two or more children that are "large enough", or (b) if we are at a leaf node (representing 4 characters), and there are two or more words that begin with the 4 characters corresponding to this trie node.

Let's go through an example.

Continuing with the example above for s, we assembled the branch s-a-b-r, which contains the minimum two words required. However, s-a-b is not a common enough prefix! The only child of s-a-b with two or more words matching is s-a-b-r, which means we can't form a complete binary trie using this s-a-b branch.

We call this procedure a "bubble up" procedure, since it is bottom-up.

### Bubble Up Method

See the Try Trie Tree Code section for the code for the public and private bubble up methods.

Similar to the assembly method, our bubble up method is also a recursive method, performing a depth-first pre-order traversal. This ensures we reach leaf nodes before beginning our task, and that counts proceed from bottom-up.

## Try Trie Tree Code

Below we go through some of the code for the Try Trie Tree problem.

### Try Trie Trie Class

When dealing with trees, it's always a safe bet that we'll need a Node class, so we start with a utility class for tree nodes:

class Node(object):
def __init__(self, letter, count=0):
self.letter = letter
self.count = count
self.children = []
self.parent = None


The TryTrieTree class has a constructor that starts with an empty root. The tree should also contain a pointer to the original word set, so that we can reference it in later methods where needed.

class TryTrieTree(object):
def __init__(self,words):
self.root = None
self.words = words


In the final class we defined a __str__() method to create a string representation of the TryTrieTree, but we will skip that for now.

Next we have a method to set the root to a given Node:

    def set_root(self,root_letter):
self.root = Node(root_letter)


Additionally, we have two utility methods that help us navigate between locations in the tree and the corresponding string prefixes. These two methods convert between string prefixes (like s-a-b-r) and locations in the trie (like the r trie node at the end of the path s-a-b-r):

• get_prefix_from_node() (utility method): given a Node in the trie, return the string prefix that would lead to that Node.

• get_node_from_prefix() (utility method): given a string prefix, return the Node in the trie that corresponds to the given string prefix. Return None if no such Node exists.

These methods are given below.

First, to convert a particular node location to a string prefix, we use the parent pointer of each node to traverse up the tree and assemble the corresponding prefix string from the path (so that traversing from b to a to the root s would yield sab):

    def get_prefix_from_node(self,node):
"""Given a node in the trie,
return the string prefix that
"""
if node==None:
return ""
elif node==self.root:
return ""
else:
prefix = ""
while node.parent != None:
node = node.parent
prefix = node.letter + prefix
return prefix


and the reverse, to convert a string prefix into a location in the trie:

    def get_node_from_prefix(self,prefix):
"""Given a string prefix,
return the node that represents
the tail end of that sequence
of letters in this trie. Return
None if the path does not exist.
"""
assert self.root!=None

if prefix=='':
return None

assert prefix[0]==self.root.letter

# Base case
if len(prefix)==1:
return self.root

# Recursive case
parent_prefix, suffix = prefix[:len(prefix)-1],prefix[len(prefix)-1]
parent = self.get_node_from_prefix(parent_prefix)
for child in parent.children:
if child.letter == suffix:
return child

# We know this will end because we handle
# the base case of prefix="", and prefix
# is cut down by one letter each iteration.


### Code for Assembling the Tree

We assemble the tree using a private recursive method. Here is how that looks (again, these methods are defined on the TryTrieTree class):

    def assemble(self):
"""Assemble the trie from the set of words
passed to the constructor.
"""
assert self.root!=None

words = self.words

prefix = ''
candidate = self.root.letter
self._assemble(prefix,candidate,words)


In the private recursive method, we assemble the branches of the tree, only checking to make sure each branch has the minimum number of words required.

At the start of each assemble method, we whittle the set of words down to only the words that start with the prefix for the given node. This trick uses a little extra space but the payoff is avoiding searching the entire word set for each node to count the number of words matching a given prefix. If a node's parent is s-a-b and we have already done the work of filtering all words starting with sab, there is no need to repeat that work when finding and filtering all words that start with sabr.

    def _assemble(self,prefix,candidate,words):
"""Recursive private method called by assemble().
"""
prefix_depth = len(prefix)
candidate_depth = prefix_depth+1

ppc = prefix+candidate
words_with_candidate = [w for w in words if w[:candidate_depth]==ppc]


Next lines are the checks to ensure we have the minimum number of words required to form a candidate branch in the trie.

If we do, we will create a new child node for that branch and recurse by calling assemble on it.

Of course, we have to check for the base case, which in this scenario checks when we have reached the fixed trie depth of 4.

        min_branches_req = int(math.pow(2,5-candidate_depth))
max_number_branches = len(words_with_candidate)

# If we exceed the minimum number of
# as a new node on the trie.
if max_number_branches >= min_branches_req:

parent = self.get_node_from_prefix(prefix)

# If we are looking at the root node,
if prefix=='':
# parent will be None.
# In this case don't worry about
# creating new child or introducing
# parent and child, b/c the "new child"
# is the root (already exists).
pass

else:
# Otherwise, create the new child,
# and introduce the parent & child.
new_child = Node(candidate)
new_child.parent = parent
parent.children.append(new_child)

# Base case
if candidate_depth==4:
new_child.count = max_number_branches
return

# Recursive case
for new_candidate in ALPHABET:
new_prefix = prefix + candidate
self._assemble(new_prefix,new_candidate,words_with_candidate)

# otherwise, we don't have enough
# branches to continue downward,
# so stop here and do nothing.
return


### Code for Bubbling Up Large Children Counts

These are a little shorter and simpler than the assembly method above:

    def bubble_up(self):
"""Do a depth-first traversal of the
entire trytrietree, pruning as we go.
This is a pre-order traversal,
meaning we traverse children first,
then the parents, so we always
know the counts of children
(or we are on a leaf node).
"""
self._bubble_up(self.root)

def _bubble_up(self,node):
"""Pre-order depth-first traversal
starting at the leaf nodes and proceeding
upwards.
"""
if len(node.children)==0:
# Base case
# Leaf nodes already have counts
# Do nothing
return

else:
# Recursive case
# Pre-order traversal: visit/bubble up children first
for child in node.children:
self._bubble_up(child)

# Now that we've completed leaf node counts, we can do interior node counts.
# Interior node counts are equal to number of large (>=2) children.
large_children = [child for child in node.children if child.count >= 2]
node.count = len(large_children)


You can see how we converted the definition of "large children" into a rule above - we use the recursive case of the "large children" definition in the recursive case, and we use the base case of the "large children definition" (for leaf nodes) when we are on the base case.

Also note that each leaf node was initialized with the number of words that start with the corresponding 4-letter prefix (that was done in the assembly method), but we could just as easily do it in the base case, as the leaf nodes are the base case.

## Wrap it in a Bow

We can add some extra wrapping around our class, and call each of the methods in order for the various letters of the alphabet.

Below, we process an input argument n (which is the size of the wordlist, 5757, if the user does not specify n). It then creates a TryTrieTree for each letter, and determines if a complete binary trie can be constructed. Finally, it prints a summary of the results.

#!/usr/bin/env python
from get_words import get_words
import sys
import math

"""
tries.py

Donald Knuth, Art of Computer Programming, Volume 4 Fascicle 0
Exercise #35

Problem:
What letters of the alphabet can be used
as the starting letter of sixteen words that
form a complete binary trie within
WORDS(n), given n?
"""

ALPHABET = "abcdefghijklmnopqrstuvwxyz"
FIVE = 5

class Node(object):
...

class TryTrieTree(object):
...

def trie_search(n, verbose=False):

words = get_words()
words = words[:n]

perfect_count = 0
imperfect_count = 0
for letter in ALPHABET:

tree = TryTrieTree(words)
tree.set_root(letter)
tree.assemble()
tree.bubble_up()
#print(tree)

if tree.root.count >= 2:

if verbose:
print("The letter {0:s} has a perfect binary trie in WORDS({1:d}).".format(
letter, n))
perfect_count += 1

else:

if verbose:
print("The letter {0:s} has no perfect binary trie in WORDS({1:d}).".format(
letter, n))
imperfect_count += 1

if verbose:
print("")
print("Perfect count: {:d}".format(perfect_count))
print("Imperfect count: {:d}".format(imperfect_count))

return perfect_count, imperfect_count

def trie_table():
"""Compute and print a table of
number of words n versus number of
perfect tries formed.
"""
print("%8s\t%8s"%("n","perfect tries"))

ns = range(1000,5757,500)
for n in ns:
p,i = trie_search(n)
print("%8d\t%8d"%(n,p))

n = 5757
p,i = trie_search(n)
print("%8d\t%8d"%(n,p))

if __name__=="__main__":
if len(sys.argv)<2:
n = 5757
else:
n = int(sys.argv[1])
if n > 5757:
n = 5757

_,_ = trie_search(n, verbose=True)

#trie_table()


## Output

When we run with n = 1000, we can see that s is the only letter that forms a perfect binary trie for that value of n:

$python tries.py 1000 The letter a has no perfect binary trie in WORDS(1000). The letter b has no perfect binary trie in WORDS(1000). The letter c has no perfect binary trie in WORDS(1000). The letter d has no perfect binary trie in WORDS(1000). The letter e has no perfect binary trie in WORDS(1000). The letter f has no perfect binary trie in WORDS(1000). The letter g has no perfect binary trie in WORDS(1000). The letter h has no perfect binary trie in WORDS(1000). The letter i has no perfect binary trie in WORDS(1000). The letter j has no perfect binary trie in WORDS(1000). The letter k has no perfect binary trie in WORDS(1000). The letter l has no perfect binary trie in WORDS(1000). The letter m has no perfect binary trie in WORDS(1000). The letter n has no perfect binary trie in WORDS(1000). The letter o has no perfect binary trie in WORDS(1000). The letter p has no perfect binary trie in WORDS(1000). The letter q has no perfect binary trie in WORDS(1000). The letter r has no perfect binary trie in WORDS(1000). The letter s has a perfect binary trie in WORDS(1000). The letter t has no perfect binary trie in WORDS(1000). The letter u has no perfect binary trie in WORDS(1000). The letter v has no perfect binary trie in WORDS(1000). The letter w has no perfect binary trie in WORDS(1000). The letter x has no perfect binary trie in WORDS(1000). The letter y has no perfect binary trie in WORDS(1000). The letter z has no perfect binary trie in WORDS(1000). Perfect count: 1 Imperfect count: 25  In fact, 978 is the smallest number of words to find any perfect tries: $ python tries.py 978
The letter a has no perfect binary trie in WORDS(978).
The letter b has no perfect binary trie in WORDS(978).
The letter c has no perfect binary trie in WORDS(978).
The letter d has no perfect binary trie in WORDS(978).
The letter e has no perfect binary trie in WORDS(978).
The letter f has no perfect binary trie in WORDS(978).
The letter g has no perfect binary trie in WORDS(978).
The letter h has no perfect binary trie in WORDS(978).
The letter i has no perfect binary trie in WORDS(978).
The letter j has no perfect binary trie in WORDS(978).
The letter k has no perfect binary trie in WORDS(978).
The letter l has no perfect binary trie in WORDS(978).
The letter m has no perfect binary trie in WORDS(978).
The letter n has no perfect binary trie in WORDS(978).
The letter o has no perfect binary trie in WORDS(978).
The letter p has no perfect binary trie in WORDS(978).
The letter q has no perfect binary trie in WORDS(978).
The letter r has no perfect binary trie in WORDS(978).
The letter s has a perfect binary trie in WORDS(978).
The letter t has no perfect binary trie in WORDS(978).
The letter u has no perfect binary trie in WORDS(978).
The letter v has no perfect binary trie in WORDS(978).
The letter w has no perfect binary trie in WORDS(978).
The letter x has no perfect binary trie in WORDS(978).
The letter y has no perfect binary trie in WORDS(978).
The letter z has no perfect binary trie in WORDS(978).

Perfect count: 1
Imperfect count: 25


Running with the full 5757 words leads to 11 more perfect tries:

\$ python tries.py 5757
The letter a has no perfect binary trie in WORDS(5757).
The letter b has a perfect binary trie in WORDS(5757).
The letter c has a perfect binary trie in WORDS(5757).
The letter d has a perfect binary trie in WORDS(5757).
The letter e has no perfect binary trie in WORDS(5757).
The letter f has a perfect binary trie in WORDS(5757).
The letter g has no perfect binary trie in WORDS(5757).
The letter h has a perfect binary trie in WORDS(5757).
The letter i has no perfect binary trie in WORDS(5757).
The letter j has no perfect binary trie in WORDS(5757).
The letter k has no perfect binary trie in WORDS(5757).
The letter l has a perfect binary trie in WORDS(5757).
The letter m has a perfect binary trie in WORDS(5757).
The letter n has no perfect binary trie in WORDS(5757).
The letter o has no perfect binary trie in WORDS(5757).
The letter p has a perfect binary trie in WORDS(5757).
The letter q has no perfect binary trie in WORDS(5757).
The letter r has a perfect binary trie in WORDS(5757).
The letter s has a perfect binary trie in WORDS(5757).
The letter t has a perfect binary trie in WORDS(5757).
The letter u has no perfect binary trie in WORDS(5757).
The letter v has no perfect binary trie in WORDS(5757).
The letter w has a perfect binary trie in WORDS(5757).
The letter x has no perfect binary trie in WORDS(5757).
The letter y has no perfect binary trie in WORDS(5757).
The letter z has no perfect binary trie in WORDS(5757).

Perfect count: 12
Imperfect count: 14


If we assemble a table of number of five letter words n versus number of perfect tries formed, nearly half show up only after we include 4,500 words.

       n    perfect tries
1000           1
1500           1
2000           1
2500           1
3000           3
3500           3
4000           4
4500           6
5000          11
5500          12
5757          12