charlesreid1.com blog

Approximating Pi (Happy Pi Day)

Posted in Mathematics

permalink

Favorite Pi Approximations

What's your favorite \(\pi\) approximation?

Some of my favorite approximations of \(\pi\) come from Ramanujan-Sato series. These are mathematical series that generalize from a remarkable formula for \(\pi\) given by Srinivasa Ramanujan, an Indian mathematician:

$$ \pi^{-1} = \dfrac{\sqrt{8}}{99^2} \sum_{k \geq 0} \dfrac{ (4k)! }{ \left( 4^k k! \right)^4 } \dfrac{ 1103 + 26390k }{ 99^{4k} } $$

This completely novel formula opened up new branches of mathematics and provided a whole new class of \(\pi\) approximations (the Ramanujan-Sato series) and approximations that are extremely accurate, making them very useful for computer applications. (Each term of the above sequence yields 8 additional decimal points of accuracy of \(\pi\).) These approximations have also enabled world record calculations of numbers of digits of \(\pi\).

But those are not the \(\pi\) approximations that this blog post is about.

This blog post is about another set of \(\pi\) approximations that I like - these come from another field Ramanujan had mastery over, continued fractions.

Continued fractions provide a whole alternative way of representing all the real numbers - rational and irrational.

Continued Fractions and Convergents

Back in July of 2017, we wrote a blog post about how to find rational approximations of square roots using continued fractions and convergents, and we implemented a Java program to represent the irrational number \(\sqrt{n}\) (where \(n\) is not a perfect square).

In short, continued fractions are a way of expressing numbers, rational and irrational, in terms of recursive fractions, which look something like this:

$$ x = b_0 + \cfrac{1}{ b_1 + \cfrac{1}{ b_2 + \cfrac{1}{ b_3 + \cfrac{1}{ b_4 + \cfrac{1}{ b_5 + \dots } } } } } $$

denoted

$$ [b_0; b_1, b_2, b_3, \dots] $$

and, when evaluated, results in a series of fractions (rational approximations) called the convergents of the continued fraction.

In the denominator of the above representation, we have a 1 in each position, which makes the continued fraction a simple continued fraction. If the denominators are not 1, the continued fraction is a general continued fraction.

The continued fraction expansion of any rational number (or equivalently, the sequence of convergents) must terminate at some point (even if the number of terms ends up being very large).

The convergents always terminate if a number is rational. Conversely, if you can prove a number's continued fraction representation is a repeated sequence or continues forever, you can prove a number is irrational (a strategy employed in several proofs about properties of \(\pi\)).

Simple Continued Fractions to Approximate Pi

For \(\pi\), the convergents of the simple continued fraction (there is a single unique simple continued fraction representation of \(\pi\)) are unpredictable; the first few are:

$$ \pi = [3; 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2...] $$

Each additional term leads to increasingly precise fractional approximations for \(\pi\); they are:

$$ [3;7] = \dfrac{22}{7} = 3.\overline{142857} $$
$$ [3;7,15] = \dfrac{333}{106} = 3.1415094... $$
$$ [3;7,15,1] = \dfrac{355}{113} = 3.1415929... $$
$$ [3;7,15,1,292] = \dfrac{103993}{33102} = 3.14159265... $$

General Continued Fractions to Approximate Pi

Every number has exactly one representation as a simple continued fraction, if we restrict the values of \(b_i\) to be positive.

However, if we generalize the simple continued fraction to allow any integer in the numerator of the continued fractions, we get a general continued fraction:

$$ x = b_0 + \cfrac{a_1}{ b_1 + \cfrac{a_2}{ b_2 + \cfrac{a_3}{ b_3 + \cfrac{a_4}{ b_4 + \cfrac{a_5}{ b_5 + \dots } } } } } $$

Unlike simple continued fractions, one number can be expressed in many ways with many different general continued fractions.

We use two general continued fractions as well as the simple continued fraction representation of \(\pi\) to generate some \(\pi\) approximations below.

Odd Squares and Twos

The following general continued fraction

$$ \dfrac{4}{\pi} = 1 + \cfrac{1^2}{ 2 + \cfrac{3^2}{ 2 + \cfrac{5^2}{ 2 + \cfrac{7^2}{ 2 + \cfrac{9^2}{ 2 + \dots } } } } } $$

can be turned into approximations for \(\pi\) by finding the convergents of the above continued fraction, then reversing the numerator and denominator and multilpying the new numerator by 4.

Implementing a recurrence to turn the above into convergents and printing the first 24 terms and their approximate value:

                                      Pi
Convergent                                                          Approx 
--------------------------------------------------------------------------------
8 / 3                                                               2.6666666666666665
52 / 15                                                             3.4666666666666668
304 / 105                                                           2.8952380952380952
3156 / 945                                                          3.3396825396825398
30936 / 10395                                                       2.9760461760461761
443748 / 135135                                                     3.2837384837384835
6115680 / 2027025                                                   3.0170718170718169
112074660 / 34459425                                                3.2523659347188758
1991580840 / 654729075                                              3.0418396189294024
44442113940 / 13749310575                                           3.2323158094055926
967171378320 / 316234143225                                         3.0584027659273318
25444221030900 / 7905853580625                                      3.2184027659273320
655370553511800 / 213458046676875                                   3.0702546177791836
19859578238549700 / 6190283353629375                                3.2081856522619421
590885791980523200 / 191898783962510625                             3.0791533941974261
20266826271207308100 / 6332659870762850625                          3.2003655154095472
684008280009204381000 / 221643095476699771875                       3.0860798011238333
26194878742247361184500 / 8200794532637891559375                    3.1941879092319412
988797092817095519958000 / 319830986772877770815625                 3.0916238066678385
41820004752592427401540500 / 13113070457687988603440625             3.1891847822775947
1745807922530722423852479000 / 563862029680583509947946875          3.0961615264636411
80816804632604843113153342500 / 25373791335626257947657609375       3.1850504153525301
3696894652389922594527576660000 / 1192568192774434123539907640625   3.0999440323738066

(16 decimal points are printed for decimal approximations above.)

This exhibits a pattern seen with general continued fractions, which is that they tend to jump above and below the value they approximate, depending on whether there are an even or odd number of terms in the continued fraction being included.

Odd Squares, Threes, and Sixes

Another related continued fraction, this one for \(\pi\) and with a more predictable pattern, is given by:

$$ \pi = 3 + \cfrac{1^2}{ 6 + \cfrac{3^2}{ 6 + \cfrac{5^2}{ 6 + \cfrac{7^2}{ 6 + \cfrac{9^2}{ 6 + \dots } } } } } $$

Implementing a computer program to evaluate the convergents from the above patterns yields even more \(\pi\) approximations:

                                Even More Pi
Convergent                                                          Approx 
--------------------------------------------------------------------------------
19 / 6                                                              3.1666666666666665
141 / 45                                                            3.1333333333333333
1321 / 420                                                          3.1452380952380952
14835 / 4725                                                        3.1396825396825396
196011 / 62370                                                      3.1427128427128426
2971101 / 945945                                                    3.1408813408813407
50952465 / 16216200                                                 3.1420718170718169
974212515 / 310134825                                               3.1412548236077646
20570537475 / 6547290750                                            3.1418396189294020
475113942765 / 151242416325                                         3.1414067184965018
11922290683065 / 3794809718700                                      3.1417360992606653
322869019821075 / 102776096548125                                   3.1414796890042549
9388645795842075 / 2988412653476250                                 3.1416831892077552
291703390224616125 / 92854250304440625                              3.1415189855952756
9646071455650881825 / 3070380543400170000                           3.1416533941974261
338203386739761387075 / 107655217802968460625                       3.1415419859977827
12533792135642378629875 / 3989575718580595893750                    3.1416353566793886
489501901570061970946125 / 155815096120119939628125                 3.1415563302845726
20095772843114788169975625 / 6396619735457555416312500              3.1416238066678388
865107029346752986828909875 / 275374479611447760672253125           3.1415657346585473
38971636325356476834702484875 / 12404964652972837218854831250       3.1416160719181865
1833412715214285133654869268125 / 583597200719403932796125015625    3.1415721544829651
89918039850132576392201747480625 / 28621636626586418964957783375000 3.1416106990404735

When to Use Simple Vs. General Continued Fractions

While general continued fractions are more flexible, allowing a given rational or irrational number to be expressed in a wider variety of ways, it is important to point out how much faster the simple continued fractions converge - with 15 terms, we arrive at around 15 accurate decimal places, versus the 1-3 decimal places of accuracy from over 20 terms in the above approximations.

Using the simple continued fraction representation of \(\pi\),

$$ \pi = 3 + \cfrac{1}{ 7 + \cfrac{1}{ 15 + \cfrac{1}{ 1 + \cfrac{1}{ 292 + \cfrac{1}{ 1 + \dots } } } } } $$

and implementing it with a computer, we get the following table of convergents and their approximate values:

                               Pi Simple
Convergent                                                          Approx 
--------------------------------------------------------------------------------
22 / 7                                                          3.1428571428571428
333 / 106                                                       3.1415094339622640
355 / 113                                                       3.1415929203539825
103993 / 33102                                                  3.1415926530119025
104348 / 33215                                                  3.1415926539214212
208341 / 66317                                                  3.1415926534674368
312689 / 99532                                                  3.1415926536189365
833719 / 265381                                                 3.1415926535810779
1146408 / 364913                                                3.1415926535914038
4272943 / 1360120                                               3.1415926535893890
5419351 / 1725033                                               3.1415926535898153
80143857 / 25510582                                             3.1415926535897927
165707065 / 52746197                                            3.1415926535897936

A Note on the Program

We implemented a program to convert terms in a continued fraction representation (simple or general) into convergents (rational approximations).

We covered how to do this with Java in a previous blog post, but this time we re-implemented it in Python and used some basic object-oriented Python techniques and class decorators (class methods, memoized functions, etc.)

You can find the code here: https://git.charlesreid1.com/cs/python/src/branch/master/math/pi_continued_fraction_convergents.py

We will return to the topic of continued fractions and cover general continued fractions (and importantly, the recurrence relation used to convert them into convergents in the above program) in a future blog post.

Tags:    pi    continued fractions    number theory    mathematics    python    irrational numbers   

Five Letter Words: Part 5: The Try Trie Tree

Posted in Computer Science

permalink

Table of Contents



About the Five-Letter Words

In Volume 4 Fascicle 0 of Donald Knuth's Art of Computer Programming, Knuth introduces a tool for exploring concepts in graph theory: the five-letter words. This is a collection of 5,757 five-letter words compiled by Knuth and useful in exploring ways of constructing efficient algorithms.

The word list is large enough that an \(O(N^2)\) algorithm will take a solid chunk of CPU time, so there's a definite incentive to think carefully about implementation.

Knuth introduces a list of five-letter words, as well as associated exercises utilizing techniques from dynamic programming and graph theory, among other topics.

We have covered this topic before in prior blog posts:

and a recent addendum to Part 1:

We continue our coverage in this blog post with a newer problem, one that is rated by Knuth at 26, on his scale of 0 to 50:

00  Immediate
10  Simple (1 minute)
20  Medium (quarter hour)
30  Moderately hard
40  Term project
50  Research problem

(from Volume 1, Notes on the Exercises.)

Here's the list of words if you want to play along.

Link to the Stanford Graph Base.

Visit Five Letter Words on the charlesreid1.com wiki for details.

Introduction to the Try Trie Tree Problem

In this blog post, we'll cover Exercise 35 of Volume 4, Fascicle of Donald Knuth's Art of Computer Programming.

The problem is as follows:

Sixteen well-chosen elements of WORDS(1000) lead to the branching pattern (figure), which is a complete binary trie of words that begin with the letter s. But there's no such pattern of words beginning with a, even if we consider the full collection WORDS(5757).

What letters of the alphabet can be used as the starting letter of sixteen words that form a complete binary trie within WORDS(n), given n?

For the benefit of those without the book, here is an attempt to represent the trie that Knuth includes in the exercise:

                     s

            h                 t

        e       o         a       e

      e   l   r   w     l   r   a   e

      sheep             stalk
      sheet             stall

          shelf             stars
          shell             start

              shore             steal
              short             steam

                  shown             steel
                  shows             steep

To answer the question, of whether a complete binary trie can be completed for a given letter, given a set of n words, we construct a "try trie tree," which is a tree data structure that greedily builds a trie with as many branches as possible.

The full trie of \(26^4+1 = 456,977\) nodes would be expensive to assemble in full, for each starting letter. Instead we use the word list to build up a tree of possible candidate branches for the trie. Once we've constructed all possible branches using the faster but less precise method, we verify that each candidate branch we have constructed either meets our criteria (can be included as a branch in a complete binary trie), or is pruned.

The Try Trie Tree

To solve this problem, we define a TryTrieTree class that holds the nodes and links that make up our tree. We define some methods for it to perform the assembly and verification operations described below, then assemble one try trie tree for each letter of the alphabet to come up with an answer to the exercise.

Constructing the Try Trie Tree

The construction procedure for the try trie tree proceeds in two steps:

Step 1 is to assemble a tree, from the top down, by searching the entire space of \(456,977\) nodes and marking particular nodes and paths on this tree as candidates for the final perfect binary trie.

Step 2 is to revisit the candidate branches, proceeding from the bottom up, and determine if the candidate branches do, in fact, have enough sibling nodes and word matches to form a complete branch in a perfect binary trie.

We start Step 1 at the root node (the starting letter) and proceed from the root down, going level by level.

Checking for Minimum Number of Matching Words

Step 1 proceeds from the top down and marks branches that are candidates to end up in the final perfect binary trie.

At each level of the trie, we count the number of words in the overall word set that have a prefix matching the prefix corresponding to that node.

For example, the trie node b on the path s-a-b would yield four words:

saber
sable
sabre
sabra

If enough words match, that branch of the trie is a possible candidate to end up in the perfect binary trie (but may be trimmed in Step 2).

Example: If we are assembling the complete trie for s given by the author in the exercise, we can verify that there are at least 16 words that begin with the letter s. We would then verify that there are at least 8 words that begin with sa, which there are. Then we would verify that there are at least 4 words that begin with saa, which there are not, so we would move on to verifying that there are at least 4 words that begin with sab, which there are. We would proceed in this fashion until we had assembled a candidate trie branch, s-a-b-r (which contains two words, sabra and sabre). For Step 1, we keep s-a-b-r as a candidate branch. (We will see in Step 2 that this branch will get trimmed.)

At each level of the trie, we apply the procedure: - At level 1, we require a minimum of \(2^{5-1} = 16\) words. - At level 2, we require a minimum of \(2^{5-2} = 8\) words. - At level 3, we require a minimum of \(2^{5-3} = 4\) words. - At level 4, we require a minimum of \(2^{5-4} = 2\) words.

NOTE: We are not explicitly constructing the trie, so we don't need to assemble the word leaves.

Assemble Method

See the Try Trie Tree Code section for the code for the public and private assembly methods.

To peform the assembly of all possible branches of the try trie, we use the assemble() method. This is a public method that starts a recursive call to a private method _assemble().

We are given a starting letter (in the example given by the author, the starting letter is "s"). We explore every possible prefix that starts with the root letter, "sa", "sb", "sc", "sd", and so on.

For each of those possible prefixes, we explore every possible third letter, "saa", "sab", and so on, and then once more in a fourth step, "saaa", "saab", ..., for a total of \(26^4 = 456,976\) iterations (checks for existence of words starting with a given substring).

A substantial number of these checks will do nothing - from the fact that

\(\frac{5757}{456976} \sim 5e3/5e5 \sim 0.01\)

we know that the maximum number of loops that would actually lead to a branch being added will be 1% of that \(26^4\) total.

We also know that any efforts to speed up this program should focus on the way we are checking the number of words that start with a given substring - since that's where we'll spend most of our time.

The recursive assembly method takes a prefix string input (which maps to a location in the tree), and it explores all 26 possible children of that prefix (location in the trie).

When a leaf node is reached, at the fourth level, it represents the longest prefix in our trie. This is the base case of the recursive assembly function; the recursive function terminates at a fixed depth of 4.

Verifying Branches and Bubbling Up Counts

However, as we noted, the counts we are using above in Step 1 are just approximations (checking there are a minimum number of words with a given prefix). There is no way to guarantee that a trie branch can be used in the final complete perfect binary trie until all child leaf nodes have been visited.

This is where Step 2 comes in. In Step 2 we perform a pre-order depth-first traversal of the tree, visiting the leaf nodes first and proceeding from the bottom up. The number of words matching the prefix of each leaf node must be 2, to keep the leaf node.

We then proceed up the tree, level by level, and at each level we require that each node have at least 2 "large" children. This is a recursive definition - for a child to be "large", it must itself have 2 "large" children, or (if it is a leaf node) it must have at least 2 words that match the 4-letter prefix.

In our TryTrieTree, a complete binary trie is only possible if each node at each level has two or more children that are "large enough", where "large enough" means that either (a) both child nodes have two or more children that are "large enough", or (b) if we are at a leaf node (representing 4 characters), and there are two or more words that begin with the 4 characters corresponding to this trie node.

Let's go through an example.

Continuing with the example above for s, we assembled the branch s-a-b-r, which contains the minimum two words required. However, s-a-b is not a common enough prefix! The only child of s-a-b with two or more words matching is s-a-b-r, which means we can't form a complete binary trie using this s-a-b branch.

We call this procedure a "bubble up" procedure, since it is bottom-up.

Bubble Up Method

See the Try Trie Tree Code section for the code for the public and private bubble up methods.

Similar to the assembly method, our bubble up method is also a recursive method, performing a depth-first pre-order traversal. This ensures we reach leaf nodes before beginning our task, and that counts proceed from bottom-up.

Try Trie Tree Code

Below we go through some of the code for the Try Trie Tree problem.

Try Trie Trie Class

When dealing with trees, it's always a safe bet that we'll need a Node class, so we start with a utility class for tree nodes:

class Node(object):
    def __init__(self, letter, count=0):
        self.letter = letter
        self.count = count
        self.children = []
        self.parent = None

The TryTrieTree class has a constructor that starts with an empty root. The tree should also contain a pointer to the original word set, so that we can reference it in later methods where needed.

class TryTrieTree(object):
    def __init__(self,words):
        self.root = None
        self.words = words

In the final class we defined a __str__() method to create a string representation of the TryTrieTree, but we will skip that for now.

Next we have a method to set the root to a given Node:

    def set_root(self,root_letter):
        self.root = Node(root_letter)

Additionally, we have two utility methods that help us navigate between locations in the tree and the corresponding string prefixes. These two methods convert between string prefixes (like s-a-b-r) and locations in the trie (like the r trie node at the end of the path s-a-b-r):

  • get_prefix_from_node() (utility method): given a Node in the trie, return the string prefix that would lead to that Node.

  • get_node_from_prefix() (utility method): given a string prefix, return the Node in the trie that corresponds to the given string prefix. Return None if no such Node exists.

These methods are given below.

First, to convert a particular node location to a string prefix, we use the parent pointer of each node to traverse up the tree and assemble the corresponding prefix string from the path (so that traversing from b to a to the root s would yield sab):

    def get_prefix_from_node(self,node):
        """Given a node in the trie,
        return the string prefix that
        would lead to that node.
        """
        if node==None:
            return ""
        elif node==self.root:
            return ""
        else:
            prefix = ""
            while node.parent != None:
                node = node.parent
                prefix = node.letter + prefix
            return prefix

and the reverse, to convert a string prefix into a location in the trie:

    def get_node_from_prefix(self,prefix):
        """Given a string prefix,
        return the node that represents
        the tail end of that sequence
        of letters in this trie. Return
        None if the path does not exist.
        """
        assert self.root!=None

        if prefix=='':
            return None

        assert prefix[0]==self.root.letter

        # Base case
        if len(prefix)==1:
            return self.root

        # Recursive case
        parent_prefix, suffix = prefix[:len(prefix)-1],prefix[len(prefix)-1]
        parent = self.get_node_from_prefix(parent_prefix)
        for child in parent.children:
            if child.letter == suffix:
                return child

        # We know this will end because we handle
        # the base case of prefix="", and prefix
        # is cut down by one letter each iteration.

Code for Assembling the Tree

We assemble the tree using a private recursive method. Here is how that looks (again, these methods are defined on the TryTrieTree class):

    def assemble(self):
        """Assemble the trie from the set of words
        passed to the constructor.
        """
        assert self.root!=None

        words = self.words

        # start with an empty prefix
        prefix = ''
        candidate = self.root.letter
        self._assemble(prefix,candidate,words)

In the private recursive method, we assemble the branches of the tree, only checking to make sure each branch has the minimum number of words required.

At the start of each assemble method, we whittle the set of words down to only the words that start with the prefix for the given node. This trick uses a little extra space but the payoff is avoiding searching the entire word set for each node to count the number of words matching a given prefix. If a node's parent is s-a-b and we have already done the work of filtering all words starting with sab, there is no need to repeat that work when finding and filtering all words that start with sabr.

    def _assemble(self,prefix,candidate,words):
        """Recursive private method called by assemble().
        """
        prefix_depth = len(prefix)
        candidate_depth = prefix_depth+1

        ppc = prefix+candidate
        words_with_candidate = [w for w in words if w[:candidate_depth]==ppc]

Next lines are the checks to ensure we have the minimum number of words required to form a candidate branch in the trie.

If we do, we will create a new child node for that branch and recurse by calling assemble on it.

Of course, we have to check for the base case, which in this scenario checks when we have reached the fixed trie depth of 4.

        min_branches_req = int(math.pow(2,5-candidate_depth))
        max_number_branches = len(words_with_candidate)

        # If we exceed the minimum number of 
        # branches required, add candidate
        # as a new node on the trie.
        if max_number_branches >= min_branches_req:

            parent = self.get_node_from_prefix(prefix)

            # If we are looking at the root node,
            if prefix=='':
                # parent will be None.
                # In this case don't worry about
                # creating new child or introducing
                # parent and child, b/c the "new child"
                # is the root (already exists).
                pass

            else:
                # Otherwise, create the new child,
                # and introduce the parent & child.
                new_child = Node(candidate)
                new_child.parent = parent
                parent.children.append(new_child)

            # Base case
            if candidate_depth==4:
                new_child.count = max_number_branches
                return

            # Recursive case
            for new_candidate in ALPHABET:
                new_prefix = prefix + candidate
                self._assemble(new_prefix,new_candidate,words_with_candidate)

        # otherwise, we don't have enough
        # branches to continue downward,
        # so stop here and do nothing.
        return

Code for Bubbling Up Large Children Counts

These are a little shorter and simpler than the assembly method above:

    def bubble_up(self):
        """Do a depth-first traversal of the
        entire trytrietree, pruning as we go.
        This is a pre-order traversal,
        meaning we traverse children first,
        then the parents, so we always 
        know the counts of children
        (or we are on a leaf node).
        """
        self._bubble_up(self.root)


    def _bubble_up(self,node):
        """Pre-order depth-first traversal
        starting at the leaf nodes and proceeding
        upwards.
        """
        if len(node.children)==0:
            # Base case
            # Leaf nodes already have counts          
            # Do nothing
            return

        else:
            # Recursive case
            # Pre-order traversal: visit/bubble up children first
            for child in node.children:
                self._bubble_up(child)

            # Now that we've completed leaf node counts, we can do interior node counts.
            # Interior node counts are equal to number of large (>=2) children.
            large_children = [child for child in node.children if child.count >= 2]
            node.count = len(large_children)

You can see how we converted the definition of "large children" into a rule above - we use the recursive case of the "large children" definition in the recursive case, and we use the base case of the "large children definition" (for leaf nodes) when we are on the base case.

Also note that each leaf node was initialized with the number of words that start with the corresponding 4-letter prefix (that was done in the assembly method), but we could just as easily do it in the base case, as the leaf nodes are the base case.

Wrap it in a Bow

We can add some extra wrapping around our class, and call each of the methods in order for the various letters of the alphabet.

Below, we process an input argument n (which is the size of the wordlist, 5757, if the user does not specify n). It then creates a TryTrieTree for each letter, and determines if a complete binary trie can be constructed. Finally, it prints a summary of the results.

#!/usr/bin/env python
from get_words import get_words
import sys
import math

"""
tries.py

Donald Knuth, Art of Computer Programming, Volume 4 Fascicle 0
Exercise #35

Problem:
What letters of the alphabet can be used
as the starting letter of sixteen words that
form a complete binary trie within
WORDS(n), given n?
"""

ALPHABET = "abcdefghijklmnopqrstuvwxyz"
FIVE = 5


class Node(object):
    ...


class TryTrieTree(object):
    ...

def trie_search(n, verbose=False):

    words = get_words()
    words = words[:n]

    perfect_count = 0
    imperfect_count = 0
    for letter in ALPHABET:

        tree = TryTrieTree(words)
        tree.set_root(letter)
        tree.assemble()
        tree.bubble_up()
        #print(tree)

        if tree.root.count >= 2:

            if verbose:
                print("The letter {0:s} has a perfect binary trie in WORDS({1:d}).".format(
                    letter, n))
            perfect_count += 1

        else:

            if verbose:
                print("The letter {0:s} has no perfect binary trie in WORDS({1:d}).".format(
                    letter, n))
            imperfect_count += 1

    if verbose:
        print("")
        print("Perfect count: {:d}".format(perfect_count))
        print("Imperfect count: {:d}".format(imperfect_count))

    return perfect_count, imperfect_count



def trie_table():
    """Compute and print a table of
    number of words n versus number of
    perfect tries formed.
    """
    print("%8s\t%8s"%("n","perfect tries"))

    ns = range(1000,5757,500)
    for n in ns:
        p,i = trie_search(n)
        print("%8d\t%8d"%(n,p))

    n = 5757
    p,i = trie_search(n)
    print("%8d\t%8d"%(n,p))


if __name__=="__main__":
    if len(sys.argv)<2:
        n = 5757
    else:
        n = int(sys.argv[1])
        if n > 5757:
            n = 5757

    _,_ = trie_search(n, verbose=True)

    #trie_table()

Output

When we run with n = 1000, we can see that s is the only letter that forms a perfect binary trie for that value of n:

$ python tries.py 1000
The letter a has no perfect binary trie in WORDS(1000).
The letter b has no perfect binary trie in WORDS(1000).
The letter c has no perfect binary trie in WORDS(1000).
The letter d has no perfect binary trie in WORDS(1000).
The letter e has no perfect binary trie in WORDS(1000).
The letter f has no perfect binary trie in WORDS(1000).
The letter g has no perfect binary trie in WORDS(1000).
The letter h has no perfect binary trie in WORDS(1000).
The letter i has no perfect binary trie in WORDS(1000).
The letter j has no perfect binary trie in WORDS(1000).
The letter k has no perfect binary trie in WORDS(1000).
The letter l has no perfect binary trie in WORDS(1000).
The letter m has no perfect binary trie in WORDS(1000).
The letter n has no perfect binary trie in WORDS(1000).
The letter o has no perfect binary trie in WORDS(1000).
The letter p has no perfect binary trie in WORDS(1000).
The letter q has no perfect binary trie in WORDS(1000).
The letter r has no perfect binary trie in WORDS(1000).
The letter s has a perfect binary trie in WORDS(1000).
The letter t has no perfect binary trie in WORDS(1000).
The letter u has no perfect binary trie in WORDS(1000).
The letter v has no perfect binary trie in WORDS(1000).
The letter w has no perfect binary trie in WORDS(1000).
The letter x has no perfect binary trie in WORDS(1000).
The letter y has no perfect binary trie in WORDS(1000).
The letter z has no perfect binary trie in WORDS(1000).

Perfect count: 1
Imperfect count: 25

In fact, 978 is the smallest number of words to find any perfect tries:

$ python tries.py 978
The letter a has no perfect binary trie in WORDS(978).
The letter b has no perfect binary trie in WORDS(978).
The letter c has no perfect binary trie in WORDS(978).
The letter d has no perfect binary trie in WORDS(978).
The letter e has no perfect binary trie in WORDS(978).
The letter f has no perfect binary trie in WORDS(978).
The letter g has no perfect binary trie in WORDS(978).
The letter h has no perfect binary trie in WORDS(978).
The letter i has no perfect binary trie in WORDS(978).
The letter j has no perfect binary trie in WORDS(978).
The letter k has no perfect binary trie in WORDS(978).
The letter l has no perfect binary trie in WORDS(978).
The letter m has no perfect binary trie in WORDS(978).
The letter n has no perfect binary trie in WORDS(978).
The letter o has no perfect binary trie in WORDS(978).
The letter p has no perfect binary trie in WORDS(978).
The letter q has no perfect binary trie in WORDS(978).
The letter r has no perfect binary trie in WORDS(978).
The letter s has a perfect binary trie in WORDS(978).
The letter t has no perfect binary trie in WORDS(978).
The letter u has no perfect binary trie in WORDS(978).
The letter v has no perfect binary trie in WORDS(978).
The letter w has no perfect binary trie in WORDS(978).
The letter x has no perfect binary trie in WORDS(978).
The letter y has no perfect binary trie in WORDS(978).
The letter z has no perfect binary trie in WORDS(978).

Perfect count: 1
Imperfect count: 25

Running with the full 5757 words leads to 11 more perfect tries:

$ python tries.py 5757
The letter a has no perfect binary trie in WORDS(5757).
The letter b has a perfect binary trie in WORDS(5757).
The letter c has a perfect binary trie in WORDS(5757).
The letter d has a perfect binary trie in WORDS(5757).
The letter e has no perfect binary trie in WORDS(5757).
The letter f has a perfect binary trie in WORDS(5757).
The letter g has no perfect binary trie in WORDS(5757).
The letter h has a perfect binary trie in WORDS(5757).
The letter i has no perfect binary trie in WORDS(5757).
The letter j has no perfect binary trie in WORDS(5757).
The letter k has no perfect binary trie in WORDS(5757).
The letter l has a perfect binary trie in WORDS(5757).
The letter m has a perfect binary trie in WORDS(5757).
The letter n has no perfect binary trie in WORDS(5757).
The letter o has no perfect binary trie in WORDS(5757).
The letter p has a perfect binary trie in WORDS(5757).
The letter q has no perfect binary trie in WORDS(5757).
The letter r has a perfect binary trie in WORDS(5757).
The letter s has a perfect binary trie in WORDS(5757).
The letter t has a perfect binary trie in WORDS(5757).
The letter u has no perfect binary trie in WORDS(5757).
The letter v has no perfect binary trie in WORDS(5757).
The letter w has a perfect binary trie in WORDS(5757).
The letter x has no perfect binary trie in WORDS(5757).
The letter y has no perfect binary trie in WORDS(5757).
The letter z has no perfect binary trie in WORDS(5757).

Perfect count: 12
Imperfect count: 14

If we assemble a table of number of five letter words n versus number of perfect tries formed, nearly half show up only after we include 4,500 words.

       n    perfect tries
    1000           1
    1500           1
    2000           1
    2500           1
    3000           3
    3500           3
    4000           4
    4500           6
    5000          11
    5500          12
    5757          12

Tags:    python    computer science    graphs    algorithms    art of computer programming    knuth    five letter words    tries    trees   

Five Letter Words: Part 4: Revisiting Diff by One

Posted in Computer Science

permalink

Table of Contents

About the Five-Letter Words

In Volume 4, Facsimile 0 of Donald Knuth's Art of Computer Programming, in which Knuth covers graph theory, he introduces a list of five-letter words as part of a data set useful in exploring graph theory and graph algorithms.

The list of words is part of the Stanford Graph Base, a set of data sets that are useful for studying graph theory and networks.

See Five Letter Words on the charlesreid1.com wiki for details.

Different by 1, Revisited

This post is revisiting an exercise from the above volume, Exercise 28:

Find pairs of SGB word vectors that differ by +/- 1 in each component.

In a prior blog post (Part 1), we had inerpreted the question as finding word vectors whose Euclidean distance differed by 1 total, which is the same as a Hamming distance of 1.

However, on revisiting the (more interesting) question actually being posed by the author, we find a different and more difficult problem.

As an example of what Knuth is asking for:

rover -> spuds

Each letter of the words are within an edit distance of 1, at each position.

There are 38 such pairs:

$ python diff_by_one_fixed.py
abaft babes
absit baths
adder beefs
ambit blahs
anger boffs
anode boned
bider chefs
bidet chefs
biffs cheer
ghost hints
hobos inapt
holds inker
honed inode
hoods inner
hoofs inner
hoots input
hoped inode
ingot johns
needs odder
needs offer
rider sheds
rifer sheds
rinds shoer
robed spade
robot snaps
robot spans
rover spuds
ruffs steer
runts stout
rusts strut
sheer tiffs
sheet tiffs
shout tints
sides theft
sneer toffs
splat tombs
spuds toter
stuns tutor
Found 38 pairs of words that differ by +/-1 in each component.

The approach we used was as follows:

  • Iterate over each word in the wordlist (use the first 1,000 words to keep it shorter for testing)
  • For each word:
  • Generate all variations that are within +/-1 using recursive backtracking (could also use algorithm to generate all 32 binary codes of length 5, where 0 = -1, 1 = +1)
  • For each of the 32 variations,
    • Check if the word is in the wordset (O(1) cost if using a hash table/set)
    • If so, add ordered pair (word1,word2) to a set of solutions (to avoid dupes)

Different by N, Revisited

We went back and modified the code to take a distance parameter d, but storage and compute cost, as well as the sparsity of the graph of shared bigrams and trigrams among these 5,000 words, means the number of pairs increases exponentially.

| Distance  | Number of pairs   | Walltime       |
|-----------|-------------------|----------------|
| 1         | 38                |     0.26 s     |
| 2         | 525               |     5.26 s     |
| 3         | 4982              |    38.87 s     |
| 4         | ?? (10^5 est.)    |  10 min (est.) |

You can find the diff_by_n.py script here: https://git.charlesreid1.com/cs/five-letter-words/

The output:

$ python diff_by_n.py
abaft babes
absit baths
adder beefs
ambit blahs
anger boffs
anode boned
bider chefs
bidet chefs
biffs cheer
ghost hints
..
Found 38 pairs of words that differ by +/-1 in each component.
Time: 0.2673 s

aback babel
aback cabal
abaft babes
abash cacti
abide baked
abide caged
abide caked
abler bands
abler bangs
abode caned
...
Found 525 pairs of words that differ by +/-2 in each component.
Time: 5.2617 s

abaca ceded
abaci babel
abaci cabal
abaci decaf
abaci decal
aback babel
aback cabal
aback decal
abaft babes
abaft bedew
...
Found 4982 pairs of words that differ by +/-3 in each component.
Time: 38.8743 s

September 2019

Mocking AWS in Unit Tests

May 2018

Current Projects