charlesreid1.com research & teaching

Computing Square Roots: Part 1: Using Newton's Method

Posted in Mathematics

permalink

Table of Contents

Newton's Method for Finding Roots of Equations

Suppose we have a function \(f(x)\) and we want to compute values of \(x\) for which \(f(x)=0\). These values of \(x\) are called the roots of \(f(x)\).

We can compute the roots using Newton's Method, which utilizes the derivative of the function to iteratively compute the roots of the function.

Newton's method begins with an initial guess. It evaluates the derivative of the fnction at the initial guess, which gives the slope of the tangent line, and computes the root of the tangent line as the next approximation of the root of the function.

This is based on the point-slope formula,

$$ y - y_0 = m(x - x_0) $$

Now the left side becomes

$$ f(x) - f(x_0) = f'(x_0)(x - x_0) $$

and at the root, we know \(f(x)=0\), so rearranging this equation into an expression for the root \(x\) gives:

$$ x = x_0 - \dfrac{f(x_0)}{f'(x_0)} $$

Now, this is the equation for the next approximation for the root. To turn this into an iterative procedure, we write this as

$$ x_{i+1} = x_{i} - \dfrac{f(x_i)}{f'(x_i)} $$

Newton's Method then allows us to evaluate the above expression as many times as we would like to achive the desired accuracy.

Two caveats with Newton's Method:

  • The function must be relatively well-behaved; Newton's Method does not converge for functions with large, high-order derivatives.
  • The convergence of the method depends on the accuracy of the initial guess. If you can make a good guess, do it!

Newton's Method for Finding Square Roots

Note that we can use the procedure and equation from above to compute the numerical value of a given function to an arbitrary degree of accuracy, so long as we have the derivative (and a program that can keep track of arbitrary digits).

Suppose we want to use Newton's Method to compute \(\sqrt{2}\). Then we can solve for the roots of the following function:

$$ f(x) = x^2 - a $$

for \(a = 2\).

Note that the derivative of this function is computed using the power rule, which is trivial to implement, so we can also use this method to compute general \(n^{th}\) roots of \(a\), by solving for roots of:

$$ f(x) = x^n - a $$

For a monomial with nonzero power, the derivative is always l\(f'(x) = n x^{n-1}\), so in the square root case we have \(f'(x) = 2x\). Now we plug these two functions into the iterative formula for Newton's Method to get:

$$ \begin{array} a x_{i+1} &=& x_{i} - \dfrac{f(x_n)}{f'(x_n)} \\ x_{i+1} &=& x_{i} - \dfrac{x_i^2 - a}{2 x_i } \end{array} $$

Newton's Method for Finding Square Roots: Program

This can be implemented in a programming language to yield an iterative method for computing square roots. We can either specify a tolerance (better), or a number of iterations. Here is a static method for computing a square root using Newton's method. The user specifies the value of \(a\), the number they want to compute the square root of; \(x_0\), the initial guess; and a tolerance, which controls the number of decimal places of accuracy of the final answer.

    /** Compute a square root using Newton's Method, to a specified tolerance.
     *
     * @param a Compute the square root of a.
     * @param x0 Initial guess.
     * @param tol Tolerance (stopping criteria).
     */
    public static double nmsqrttol(double a, double x0, double tol) { 
        double xi, xip1, err;
        xi = xip1 = x0;
        err = x0;
        while(err > tol) { 
            xip1 = xi - (xi*xi-a)/(2*xi);
            err = Math.abs( xip1 - xi );
            xi = xip1;
        }
        return xip1;
    }

(There is no check for infinite loops because our function is a smooth polynomial and Newton's Method will always converge.)

Here is sample output from a program that varies the tolerance and prints the corresponding value of the square root that was computed:

$ javac SquareRoot.java && java SquareRoot
Actual value sqrt(2) = 1.4142135623730951
Testing Newton's Method, Specifying Tolerance:
Tol         sqrt(2)
0.1         1.4166666666666667
0.01        1.4142156862745099
0.001       1.4142135623746899
0.0001      1.4142135623746899
1e-05       1.4142135623746899
1e-06       1.4142135623730951
1e-07       1.4142135623730951
1e-08       1.4142135623730951

Accuracy

Now that we've coded up Newton's Method, we can determine the number of accurate digits. There are a couple of ways to do this, but I went with a string comparison method. I start with a text file containing thousands of digits of the square root of 2, then I compute the square root of 2 using Newton's Method. The longest common substring gives me the number of accurate digits, plus "1.", so if I subtract 2 I get the total number of accurate digits after the decimal place.

javac SquareRoot.java && java SquareRoot
Tolerance = 0.1     Number of accurate digits = 2
Tolerance = 0.01        Number of accurate digits = 5
Tolerance = 0.001       Number of accurate digits = 11
Tolerance = 0.0001      Number of accurate digits = 11
Tolerance = 1e-05       Number of accurate digits = 11
Tolerance = 1e-06       Number of accurate digits = 15
Tolerance = 1e-07       Number of accurate digits = 15
Tolerance = 1e-08       Number of accurate digits = 15
Tolerance = 1e-09       Number of accurate digits = 15
Tolerance = 1e-10       Number of accurate digits = 15
Tolerance = 1e-11       Number of accurate digits = 15
Tolerance = 1e-12       Number of accurate digits = 15
Tolerance = 1e-13       Number of accurate digits = 15
Tolerance = 1e-14       Number of accurate digits = 15

Speed

It is also important to measure performance, in the form of speed. How fast is Newton's Method relative to the built-in square root function in the math library?

Turns out the performance of Newton's Method is much worse than the built-in math library's square root function. The Newton's Method defined above is around 100 times slower. Here are the results of a simple timing test, in which we compute the square root 10 million times, timing how long it takes, and divide by the number of operations to yield the time per operation (or rather, as it is slightly easier to grasp, the time per 1k operations);:

    /** Time Newton's Method.
     *
     * How long does it take to achieve 10 digits of accuracy? */
    public static void testTime() { 

        int Nops;
        double a;
        double initialGuess;
        double tol;
        double time;

        Tim tim;

        Nops = 10000000;
        a = 2;
        initialGuess = 1;
        tol = 1E-8;
        tim = new Tim();
        tim.tic();
        for(int i=0; i<Nops; i++) { 
            nmsqrttol(a, initialGuess, tol);
        }
        tim.toc();
        time = 1000*tim.elapsedms()/Nops;
        System.out.println("Newton's Method Time (ms) per 1k operations: "+time);


        Nops = 10000000;
        a = 2;
        tim = new Tim();
        tim.tic();
        for(int i=0; i<Nops; i++) { 
            Math.sqrt(a);
        }
        tim.toc();
        time = 1000*tim.elapsedms()/Nops;
        System.out.println("Math Library Time (ms) per 1k operations: "+time);

    }

and the results:

javac SquareRoot.java && java SquareRoot
Newton's Method Time (ms) per 1k operations: 0.016
Math Library Time (ms) per 1k operations: 3.0E-4

While the accuracy of Newton's Method for square roots may not be that great, it ain't bad, for 9 lines of code.

Next Steps

In a blog post to follow, we'll compare the speed and accuracy of square root computations using Newton's Method to an alternative approach involving the continued fraction representation of square roots. This particularly interesting technique can also be used to solve the Pell equation, a quadratic Diophantine equation of the form:

$$ x^2 - D y^2 = 1 $$

But more on that in a future post...

References

  1. Press, William et al. "Numerical Recipes in C." Cambridge Unviersity Press (2007).

  2. "SquareRoot.java". Charles Reid. git.charlesreid1.com. <https://charlesreid1.com:3000/cs/java/src/master/numerics/newtons-method>

CSE 143 Final Project: Hilbert Sort: 3. The Code

Posted in Computer Science

permalink

Table of Contents

This is the third in a series of three posts detailing the Hilbert Sort problem, its solution, and its implementation. This post deals with the code to solve the Hilbert Sort problem.

Hilbert Sort: Pseudocode

From our prior post, here is the psudocode for our Hilbert Sort function:

define hilbert_sort( unsorted queue, square dimension ):
    create southwest queue
    create northwest queue
    create northeast queue
    create southeast queue
    for each point:
        if in southwest:
            create new point using X -> Y, Y -> X
            add to southwest queue
        if in northwest:
            create new point using X -> 2X, Y -> 2Y - S
            add to northwest queue
        if in northeast:
            create new point using X -> 2X - S, Y -> 2Y - S
            add to northeast queue
        if in southeast:
            create new point using X -> S - 2Y, Y -> 2S - 2X
            add to southeast queue

        hilbertsort(southwest queue, square dimension)
        hilbertsort(northwest queue, square dimension)
        hilbertsort(northeast queue, square dimension)
        hilbertsort(southeast queue, square dimension)

        create new results queue
        add points from southwest into results queue
        add points from northwest into results queue
        add points from northeast into results queue
        add points from southeast into results queue
        return results queue

Because we are manually sorting, and we want order to be preserved, we should be using a queue to organize points as we sort them. That way, we add them in sorted order, and we are then able to remove them in sorted order.

Hilbert Sort: Code

We begin by covering a utility class used by the Hilbert Sort method to store \((X,Y)\) points. This is a simple example of a composition design pattern. Next, we cover the bulk of the problem solution: the recursive sort method that partiions points into quadrants. Finally, we cover the main method, which demonstrates reading data from an input file and passing it to the sort method.

Hilbert Sort: Utility Classes

To organize \((X,Y)\) point data, we use a simple class using composition. This is defined next to the HilbertSort class.

/**
 * An (x,y) Point class. 
 */
class Point {
    int x, y; // (x,y) point.
    String name; // Each (x,y) point has a name in the file. Used for output.
    /** Constructor. */
    public Point(int x, int y, String name) { 
        this.x = x; this.y = y; this.name = name;
    }
    /** String representation (x,y). */
    public String toString() { 
        return "("+this.x+","+this.y+","+this.name+")";
    }
}

Recursive Sort Function

Following is the recursive sort method, which (like merge sort) consists of a split step, which partitions an \(S \times S\) square into quadrants and distributes points in the square into their corresponding quadrants, and a merge step, which stitches together each quadrant in the correct order.

    /** Recursive implementation of a Hilbert sort. */
    public static Queue<Point> hilbertSort(Queue<Point> inputP, int S) {
        // Recursive method:
        // Apply the Hilbert geometrical quadrant division 
        // to sort points by when they are visited by a Hilbert curve.
        //
        // Base case: 
        // There are 1 or fewer points in each quadrant.
        // Keep splitting into quadrants until we reach the base case. 
        if(inputP.size()<1) {
            return new LinkedList<Point>();
        } else if(inputP.size()==1) {
            return inputP;
        }

        // split by quadrant
        Queue<Point> qSW = new LinkedList<Point>();
        Queue<Point> qNW = new LinkedList<Point>();
        Queue<Point> qNE = new LinkedList<Point>();
        Queue<Point> qSE = new LinkedList<Point>();

        // Sort points by dividing into quadrants
        for(Point p : inputP) { 

            // Prepare for the tricky part.
            //
            // Rotate the quadrant, and points in it,
            // so that everything is now translated to fit
            // how the template of the Hilbert curve is being drawn.
            // (SW->NW->NE-SE)

            boolean inSWquadrant = (2*p.x <= S) && (2*p.y <= S);
            boolean inNWquadrant = (2*p.x <= S) && (2*p.y >= S);

            boolean inNEquadrant = (2*p.x >= S) && (2*p.y >= S);
            boolean inSEquadrant = (2*p.x >= S) && (2*p.y <= S);

            // Each time we sort (x,y) points into quadrants,
            // we also transform each coordinate point 
            // in such a way that it rescales to an S x S square,
            // but does not modify the order of the points. 
            //
            // Note that we can keep everything as integers by
            // continuing to look at an S x S square,
            // and double the x and y values to shift them over/up.
            //
            // Two easy cases:
            if(inNWquadrant) {
                // Northwest quadrant: 
                // - shift y down by S/2
                // - keep x and y in same order
                qNW.add( new Point(2*p.x, 2*p.y-S, p.name) );

            } else if(inNEquadrant) {
                // Northeast quadrant:
                // - shift x and y down by S/2
                // - keep x and y in same order
                qNE.add( new Point(2*p.x - S, 2*p.y - S, p.name) );

            } else if(inSWquadrant) { 
                // Southwest quadrant:
                // - x and y need to swap places 
                // - that's it.
                qSW.add( new Point(2*p.y, 2*p.x, p.name) );

            } else if(inSEquadrant) { 
                // Southeast quadrant:
                // - trickiest quadrant.
                // - We want to preserve S - x, distance from right side
                // - we want to use it as the new y coordinate
                qSE.add( new Point(S - 2*p.y, 2*(S - p.x), p.name) );

            }

        }
        // Sort til you reach the base case.
        qSW = hilbertSort(qSW, S); 
        qNW = hilbertSort(qNW, S); 
        qNE = hilbertSort(qNE, S); 
        qSE = hilbertSort(qSE, S);

        Queue<Point> result = new LinkedList<Point>();
        for(Point q : qSW) result.add(q);
        for(Point q : qNW) result.add(q);
        for(Point q : qNE) result.add(q); 
        for(Point q : qSE) result.add(q);

        return result;
    }

Main Method

The last part of the code is the portion that loads the points and their labels from a file, and populates a Queue of Point objects from it. This queue of points is then sorted and returned in order.

    /** Main driver. */
    public static void main(String[] args) { 

        Scanner stdin = new Scanner(new BufferedReader(new InputStreamReader(System.in)));

        int n = stdin.nextInt();
        int S = stdin.nextInt();

        // n lines of 3 tokens each
        Queue<Point> inputPoints = new LinkedList<Point>();
        for(int i=0; i<n; i++) { 
            int x0 = stdin.nextInt();
            int y0 = stdin.nextInt();
            String label = stdin.next();
            inputPoints.add(new Point(x0,y0,label));
        }
        Queue<Point> sortedPoints = hilbertSort(inputPoints, S);
        for(Point p : sortedPoints) { 
            System.out.println(p.name);
        }
    }

References

  1. "ACM Pacific Region Programming Competition." Association of Computing Machinery. Accessed 19 June 2017. <http://acmicpc-pacnw.org/>

  2. "Hilbert Sort." Git repository, git.charlesreid1.com. Charles Reid. Updated 16 June 2017. <https://charlesreid1.com:3000/cs/finalproject-143/src/master/hilbert/HilbertSort.java>

CSE 143 Final Project: Hilbert Sort: 2. The Solution Algorithm

Posted in Computer Science

permalink

Table of Contents

This is the second in a series of three posts detailing the Hilbert Sort problem, its solution, and its implementation. This post solves the problem.

Hilbert Sort Problem

In the prior post, we covered the Hilbert Sort problem, but we state it once more succinctly here before detailing a solution to the problem.

The Hilbert Sort problem asks the following: given a set of labeled \((x,y)\) points, how can we sort the points according to the order in which they are visited by a space-filling Hilbert curve?

Revisiting the example input and output provided, the input provides the number of points and size of the grid on the first line, followed by each point's coordinates and label.

Input:
    14 25
    5 5 Honolulu
    5 10 PugetSound
    5 20 Victoria
    10 5 Berkeley
    10 10 Portland
    10 15 Seattle
    10 20 Vancouver
    15 5 LasVegas
    15 10 Sacramento
    15 15 Kelowna
    15 20 PrinceGeorge
    20 5 Phoenix
    20 10 SaltLakeCity
    20 20 Calgary

Output:
    Honolulu
    Berkeley
    Portland
    PugetSound
    Victoria
    Vancouver
    Seattle
    Kelowna
    PrinceGeorge
    Calgary
    SaltLakeCity
    Sacramento
    LasVegas
    Phoenix

Space is the Place

To solve the Hilbert Sort problem, we have to avoid the temptation to think about the Hilbert curve and the way that it is constructed. While we spent quite a bit of time talking about the Hilbert curve and how it is constructed, the curve itself is not what we are interested in - we are interested in the order in which the points are visited.

Also remember, the motivation of solving the Hilbert Sort problem is to arrange spatial \((x,y)\) points so closer points are nearer together.

No matter how many iterations of the Hilbert curve we end up drawing, we always apply the same procedure: cut the square into four quadrants, reflect the southwest corner about the bottom left to top right diagonal, and reflect the southeast corner about the bottom right to top left diagonal.

We will always visit points in the southwest quadrant before we visit points in the northwest quadrant; we will always visit points in the northwest corner before we visit points in the northeast corner; etc.

The Reflections

The trickiest part of the Hilbert Sort problem is the reflection that happens to the lower left and lower right quadrants.

Reflected Quadrants

Start with the first step of the Hilbert sort - take a square with points contained in it. Split the square into four quadrants (with the intention of creating four sub-problems). However, to conform to the Hilbert Curve construction process, the lower left and lower right squares must be reflected. The lower left square is reflected about the bottom left to upper right diagonal, while the lower right square is reflected about the bottom right to upper left diagonal.

Convince yourself of this by studying the curve construction procedure as illustrated by Hilbert himself in his 1890 paper (a.k.a., Hilbert Illustrates A Hilbert Curve):

Hilbert Illustrates Construction of Hilbert Curve

We are working toward a recursive method - and recursive methods call themselves repeatedly, apply to subproblems that are trivially similar. However, to translate this into a recursive problem, we have to deal with the rotations within the current recursive step, in such a way that we don't need to know the orientation of the prior square to know the order in which to visit the squares - it is always southwest, northwest, notheast, southwest.

After we split the squares into quadrants, after we toss out any quadrants with no points, we walk through each of the four quadrants in order (southwest, northwest, northeast, southwest). If there is a single point in the quadrant, we add it to the priority queue.

It is here that we take care of the rotation - before we recursively call the Hilbert sort method on the quadrant itself.

Scaling

We have a prescribed order for the four quadrants in the current recursive level, and the current recursive level is working its way through each of those four quadrants. But remember, our algorithm only cares about the order of points. It does not care about the \((x,y)\) location. So we can ireflect \((x,y)\) points by changing their \((x,y)\) coordinate locations. Ultimately we are only changing the program's internal representation of each point, not the original data on disk, so we can think of \((x,y)\) as mutable for our purposes.

This is an important part of our solution: scaling (and reflecting) each quadrant before recursively calling the Hilbert sort method on the points contained in it.

If we are considering a single quadrant of dimensions \(\frac{S}{2} \times \frac{S}{2}\), containing points \((x,y)\), we may be able to pass in the corners of our square, plus the \((x,y)\) points contained in it. However, as our squares get smaller, the distance between points gets smaller as well, so this has an upper limit as to how many points it can sort.

On the other hand, we can avoid passing all that information around and using doubles, by just rescaling everything to the given quadrant. We want each recursive level to completely forget about where in the recursive stack it is, how large its square is relative to the original, etc. All it should be doing is solving the same problem repeatedly - which is what recursion is best at. If we double the sides of the square, we get a shape with original size \(S \times S\). To keep the points shifted correctly we double their \((x,y)\) coordinates to \((2x, 2y)\).

Once this transformation is performed, we are ready to call the Hilbert Sort function recursively - for the northwest and northeast quadrants only. The southwest and southeast quadrants still have a ways to go.

Reflection

In addition to the scale-up transformation, southwest and southeast qaudrant points must be reflected about their diagonals.

Here's an example of what the process looks like in action:

Hilbert Sort Poster Flowchart

Solving the Reflection Problem

The above section described where in the process the reflection of the \((x,y)\) points should happen. The process of applying the reflection differs between the southwest and southeast quadrants.

In the southwest quadrant, points are being reflected about the diagonal line \(y=x\), so the reflection of \((x,y)\) points in the southwest quadrant can be performed by swapping the \(x\) and \(y\) values of all of the points in that quadrant.

In the southeast quadrant, the points are refelected about the diagonal \(y = -x\), but it is not quite \(y = -x\), given that there is an offset of a half-quadrant width on the left.

After an \((x,y)\) point is transformed, it has a height equal to the distance from the point's x coordinate to the start of the qudarant. In equations,

$$ y = S - x $$

Further, after an \((x,y)\) point is transformed, the distance from the top of the bounding box to the former y coordinate is the new x coordinate,

$$ x = \frac{S}{2} - y $$

The relative x coordinates of each point (relative meaning, 0 starts at the beginning of the curent square, rather than the whole square) are the x coordinates minus the half-quadrant width.

Once these reflections are performed, we pass the resulting \((x,y)\) points on to a new Hilbert sort. The new Hilbert sort will be operating on an \(S x S\) square, as before. Importantly, the \((x,y)\) points have been transformed in such a way that the order in which the Hilbert curve visits each point has not been affected.

Hilbert Sort Procedure

The implementation strategy is, obviously, recursive. What we want to do at each level is: Start with a square and points contained in the square. Cut the square under consideration into four quadrants. * Apply a transformation to each square so that it is re-oriented in a manner that matches our original Hilbert curve.

Once each of those squares goes through all of its respective recursive calls, it will return a sorted list of points. Then we will know what to do - we collect each of the sorted points from each of the four quadrants in order, maintain that order, and return those sorted quadrants.

To nail down the details, treat the square under consideration as ranging from \((0,0)\) to \((S,S)\).

Each time we cut a square into quadrants, we re-orient ourselves as to where \((0,0)\) is located and which quadrants will be visited in which order. If we are in the lower left quadrant, \(x\) is below \(\frac{S}{2}\) and \(y\) is below \(\frac{S}{2}\), so we rotate and reflect by swapping x and y:

        X -> Y
        Y -> X

If we are in the upper left quadrant, x is below \(\frac{S}{2}\), y is above \(\frac{S}{2}\), so subtract \(\frac{S}{2}\) from y and we're done.

        X -> X
        Y -> Y-(S/2)

If we are in the upper right quadrant, x is above \(\frac{S}{2}\), y is above \(\frac{S}{2}\), so subtract \(\frac{S}{2}\) from both

        X -> X - S/2
        Y -> Y - S/2

If we are in the lower right quadrant, our x and y values are now relative to the quadrant bounding box. The distance to the top of the bounding box to the y coordinate becomes our new x coordinate, while the distance from the right of the bounding box S to the x coordinate becomes our new y coordinate:

        X -> S/2 - Y
        Y -> S - X

Recursion always requires a base case and a recursive case. Our "base case" is the simple comparison of one or no points in each of our four quadrants. If we get to this base case, we know the order in which the Hilbert Curve will visit each of those points.

If we are not at the base case, if we have a large number of points to sort, we can bin together all the points in a given quadrant, and consider the order in which those points go with an additional level of finer granularity.

Pseudocode

set square dimension S

create unsorted queue
load points into unsorted queue

create sorted queue
sorted queue = hilbert_sort( unsorted queue, square dimension )

Now here is the Hilbert sort function:

define hilbert_sort( unsorted queue, square dimension ):
    create southwest queue
    create northwest queue
    create northeast queue
    create southeast queue
    for each point:
        if in southwest:
            create new point using X -> Y, Y -> X
            add to southwest queue
        if in northwest:
            create new point using X -> 2X, Y -> 2Y - S
            add to northwest queue
        if in northeast:
            create new point using X -> 2X - S, Y -> 2Y - S
            add to northeast queue
        if in southeast:
            create new point using X -> S - 2Y, Y -> 2S - 2X
            add to southeast queue

        hilbertsort(southwest queue, square dimension)
        hilbertsort(northwest queue, square dimension)
        hilbertsort(northeast queue, square dimension)
        hilbertsort(southeast queue, square dimension)

        create new results queue
        add points from southwest into results queue
        add points from northwest into results queue
        add points from northeast into results queue
        add points from southeast into results queue
        return results queue

References

  1. "ACM Pacific Region Programming Competition." Association of Computing Machinery. Accessed 19 June 2017. <http://acmicpc-pacnw.org/>

  2. "Über die stetige Abbildung einer Linie auf ein Flächenstück." D. Hilbert. Mathematische Annalen 38 (1891), 459–460. (pdf)

  3. "Hilbert Curve." Wikipedia: The Free Encyclopedia. Wikimedia Foundation. Edited 29 April 2017. Accessed 23 June 2017. <https://en.wikipedia.org/wiki/Hilbert_curve>