charlesreid1.com research

The Z-Machine: A Simple Turing Machine

Posted in Computer Science

permalink

TOC

Background

Recently I discovered the wonderful blog of John Graham-Cumming. One of hist posts, from 2013, details a question that he had to answer for the Oxford University Department of Computer Science's "interviews" (which, I believe, are a kind of final examination "interview" to graduate, not an interview for admittance to the program). Graham-Cumming describes one of the quetions he was presented with during his interview.

The Z-Machine: Setup

Here is the problem setup:

Suppose you have a computer with a very simple memory layout. The memory consists of a series of numbered locations, each of which can store numbers. These numbers are positive or negative integers. Here is an illustration of an example of this memory layout:

Z Machine Memory Layout

The Z-Machine: Instructions

Z-Machine Instructions

The machine can only perform three instructions: Zero (Z), Increment (I), and Jump (J).

The Z operator zeros out a location in memory. The operation specifies which index should be zeroed out. For example, Z4 will zero out index 4 (which is the 5th item in memory, since indexing starts at 0).

The I operator increments the value at a location in memory by 1. The operation specifies which index should be incremented. For example, I6 will increment index 6 (the 7th item in memory) by 1.

The J operator compares two locations in memory. If the values are different, the jump operator will branch (that is, jump to a different location in the code). The two locations are specified when calling the operator, and an arrow (or operation number) indicates where the operator should branch TO if the values are not the same. If the values are the same, the code continues.

The program stops when it reaches the end of the instruction list.

Simple Example: Loop

Here is an example of a loop program. This program sets memory index 4 to zero, then increments it until it is equal to the value in memory index 20:

001   Z4
002   I4
003   J4,20 --> 002

The instruction J4,20 --> 002 indicates that the values in cell 4 and cell 20 should be compared, and if they are not equal, the machine should jump to instruction 002.

Implementing an Addition Operation on the Z-Machine

Graham-Cumming includes the following programming challenge in his blog post:

Suppose a machine has two numbers in the first two locations in memory. Utilize these three operations to add the two numbers together and put the result into the third location in memory.

Under what circumstances does the program fail?

The Solution Approach (The Maths)

To approach the solution, start with the maths. What we're doing is trying to define a "complex" arithmetical operation (addition) from simpler "unit" operations (increment by one), so it will be helpful to wipe our mental slate clean and start at the very beginning of the problem.

When I teach a math class, whether it be a developmental math class, an algebra class, or a calculus class, I always spend the first "full" lecture by guiding the students through this very procedure. Here's how I set the tone: "Imagine that you are stranded on a desert island, with no calculators, no math books, nothing but your fingers and toes. Now suppose you are tasked with reinventing all of mathematics, entirely from scratch. How would you do it?"

This is a challenging task - and part of the challenge is just knowing where to begin (just how clean should you wipe the mental slate?). The Z-Machine problem formulation resolves that problem by explicitly enumerating valid operations. But let's continue with the desert island analogy for a bit.

If we begin at what is truly the beginning, we can start with a single unit, the number 1. (If we want to fast forward through thousands of years of human history, we can instead start with the number 0 in addition to the number 1.) Having only a single number is boring, because we can't count anything. We need a way to generate more numbers. So, we begin by defining an increment operation. We begin with the unit, 1. We declare that we can combine 1 with any other number. When we combine 1 with another 1, we get a new, larger number - which we arbitrarily call two, and represent using this funny squiggle: 2.

Now that we have defined the increment operation, adding a unit, we can begin to generate new numbers. We start with 1+1, which gives 2. The next number can be found by adding 1 to 2, which gives us a new number that we arbitrarily call three, and represent with a funny squiggle: 3.

We continue in this manner, until we reach 9, and run out of squiggles to write. The next number we will get is a special number, because it is equal to the total number of fingers. When we get to 9, and add one more, we get "two hands", which we arbitrarily call ten. If we want to keep counting beyond ten, we're stuck, because we've run out of fingers. But we can take a shortcut - we can let one toe represent "two hands". So, we hold up one toe, to represent ten. To write ten, we can let the first digit represent how many toes we are holding up, and the second digit represent how many fingers we are holding up. That means we can write our "two hands" quantity as 10 - one toe, no fingers.

We can keep on incrementing by 1, and using this system we can count all the way up to 99, at which point we will need another pair of hands or feet to keep generating new numbers, or we can suppose that after counting to 99, we are able to hold numbers in our head.

But once again, we're generating numbers slowly. We want a way to generate more numbers, faster, so we can count higher. So, we define a new addition operation. Rather than adding 1, we define the general operation of addition recursively. To add two numbers like a and b, we can define this addition in terms of a unit increment:

$$ a + b = a + 1 + 1 + 1 + \dots + 1 $$

We increment the quantity a by 1, b times. This gives us a way to add arbitrary numbers together, so now we can reach much larger numbers by taking the largest number that we can count to, and adding that number to itself.

Extending this approach can lead us from an increment operation (performed b times) to an addition operation (+b).

It can also lead from an addition operation (performed b times) to a multiplication operation (*b).

Extending the idea further, we can apply the multiplication operation (performed b times) and obtain an exponentiation operation (^b).

This recursive definition of new operations can continue as long as we'd like: applying the exponentiation operation b times yields tetration (^b^b^b^b^b...^b).

But let's get back to addition.

Solution 1: Positive Integers Only

Adding two positive integers is the simplest case. Essentially, we just perform two loops: the first loop increments the result and increments a temporary variable 1, and does that until the temporary variable 1 is equal to the first number. The second loop increments the result and increments the result by 1 for a number of times equal to the number at index 1.

001   Z2                    // clear space for the result
002   Z3                    // clear space for temp variable 1
003   I2                    // increment result
004   I3                    // increment temp variable 1
005   J3,0 --> 003
006   Z4                    // clear space for temp variable 2
007   I2                    // increment result
008   I3                    // increment temp variable 2
009   J4,1 --> 007
010   Z3                    // clean up

(Because we only have an increment operation at our disposal, there is no way for us to deal with negative numbers. Dittos for non-integer real numbers.)

This method will fail when either of the two numbers we are adding are zero.

Solution 2: Dealing With Zeros

A second solution that is a bit more challenging is dealing with the case of possible zeros in the first or second position. The algorithm above will increment the result and the temporary variable at least once (similar to a do-while loop structure), which will always cause the comparison operation J2,0 or J3,1 to fail if either cell 0 or cell 1 holds a zero.

Here is code that can deal more gracefully with a zero in either the first or second positions. This utilizes some extra space in memory to keep track of whether index 0 is a zero and whether index 1 is a zero.

// initialize
001     Z3
002     Z4 // temp 0
003     Z5 // temp 1
004     Z6 // is index 0 a zero?
005     Z7 // is index 1 a zero?
006     Z8 // zero
007     Z9 // one
008     I9

// increment by amount in index 0
009     J0,8 --> 014
010     I6
011     J4,6 --> 014
012     I4
013     I3
014     J0,4 --> 009

// increment by amount in index 1
015     J1,8 --> 020
016     I7
017     J7,8 --> 020
018     I5
019     I3
020     J1,5 --> 017

// clean up
021     Z4
022     Z5
023     Z6
024     Z7

The central idea behind this algorithm is, we keep incrementing the target cell while a condition is false, and the condition we are checking is based on a separate, independent counter. That allows us to correctly increment (and stop incrementing) based on the two numbers in index 0 and index 1. (We don't want the final result cell to be involved in our final condition check.)

This pattern can also be expanded to work for adding an arbitrary number of numbers; one simply needs to add an additional temp variable and an additional "is zero" variable for each new number being added to the total, then another block of 6 statements to increment by the amount in the given index. The block of 6 statements checks if the number we are adding is zero, and if it is not, the result is incremented by that many times.

Implementing a Decrement Operator on the Z-Machine

Suppose an operator places a number into cell 0 of the Z-Machine's memory. We require that the Z-Machine subtract 1 from that number, and place it in cell 1.

The pseudocode approach here is to increment two cells in a particular order: cell 2, which contains a sentinel value, is incremented. The program them checks if cell 2 is equal to cell 0. If it is not, the program increments cell 1, and repeats. If cell 2 is equal to cell 0, the program stops before cell 1 is incremented, leaving it one less than the original value in cell 0.

001   Z1            // decrement result
002   Z2            // zero
003   Z3            // one
004   I3
005   J2,3 --> 007  // always false
006   I1
007   I2
008   J2,0 --> 006

This pseudocode uses hard-coded constants (zero and one) to create a jump condition that is always false and therefore always followed. This allows the machine to skip a line of code like instruction 006, and perform the increment operation in a staggered manner, as described above.

Implementing a Less Than Operator on the Z-Machine

Another challenging operation to implement with the Z-Machine is a comparison operator. Suppose that an operator places two numbers into the first two memory positions of the Z-Machine. That is, index 0 contains a number A, and index 1 contains a number B. Supposing these numbers are both natural numbers (either 0 or positive integers), a comparison operator will compare the two numbers, select the smaller of the two numbers, and place it into the third position in memory (index 2).

The pseudocode approach to implement the comparison operator is to create a counter that start at zero, and check if it is equal to A or B (the numbers at index 0 and index 1). If we continue to increment our counter, and check if it is equal to A or B, and stop when it reaches either A or B, we can guarantee that we will stop when the counter reaches the smaller of the two numbers.

In order to increment the memory cell at index 2 to hold the smaller of the two numbers at index 0 and index 1, we can use the following Z-Machine code, which continually checks if the number at index 2 is equal to either the number at index 0 or the number at index 1, increments if false, and stops when true (when it reaches the smaller of the two).

001     Z2 // smaller of the two numbers
002     Z3 // zero
003     Z4 // one
004     I4

005     J0,2 --> 007
006     J3,4 --> 011
007     J1,2 --> 009
008     J3,4 --> 011
009     I2
010     J3,4 --> 005
011     Z4

Note that this code successfully handles the case where either number (or both) is 0 or any positive integer.

Who Cares? (Or, How To Build A Computer)

This whole exercise may appear, at first glance, to be an exercise in trivial pursuit. Why bother reinventing the wheel? Isn't this nothing more than an entertaining puzzle?

To the contrary - the process of assembling a sequence of simple operations into a cascade of more complex operations is precisely how computational devices are assembled from circuit components. For example, a flip flop circuit utilizes a pair of NOR (negation of OR) gates to store bits. The Apollo Guidance Computer was composed entirely of NOR gates.

In fact, the Apollo Guidance Computer is a fantastic example of a computational device constructed from a set of such simple instructions as the ones available in the Z-Machine. A few example operations from the Wikipedia article on the Apollo Guidance Computer:

AD (add)
    Add the contents of memory to register A and store the result in A.

INDEX
    Add the data retrieved at the address specified by the instruction 
    to the next instruction. INDEX can be used to add or subtract 
    an index value to the base address specified by the operand 
    of the instruction that follows INDEX. This method is used 
    to implement arrays and table look-ups.

It is not unusual for a hardware platform to have a small set of basic commands or instructions that can be carried out, and for that set of instructions to be different from hardware platform to hardware platform. Designing a new computational device requires the system designer to adapt to the hardware's capabilities - not the other way around. For that reason, it is important to keep those engineering and puzzle-solving skills sharp. You never know when you'll be designing a new computer device.

Sources

  1. "The Two Problems I Had To Solve In My Oxford Interview." John Graham-Cumming. Published 2 May 2013. Accessed 24 April 2017. <http://blog.jgc.org/2013/05/the-two-problems-i-had-to-solve-in-my.html>

  2. "Flip Flop (electronics)." Wikipedia. The Wikimedia Foundation. Edited 13 April 2017. Accessed 24 April 2017. <https://en.wikipedia.org/wiki/Flip-flop_%28electronics%29>

  3. "Apollo Guidance Computer." Wikipedia. The Wikimedia Foundation. Edited 5 April 2017. Accessed 24 April 2017. <https://en.wikipedia.org/wiki/Apollo_Guidance_Computer>

Tags:    turing machine    computer science    computer engineering    apollo    assembly   

Setting Up a Self-Hosted Github Clone with Gitea

Posted in Git

permalink

TOC

Intro

In this post we'll cover how to set up a web-hosted git server that uses Gitea, a Github clone that is written in Go. While this post is not, strictly speaking, research, having your own hosted git server certainly makes it easier to manage research codes and set up organizations to own different bits of code.

Here's an image of a git repository in the finished product:

Image of finished product

We will start by covering the server configuration for Gitea, and then we'll cover how to use Gitea's git repository.

Running with Gitea

Gitea is a web app for running a self-hosted Github clone. It is implemented entirely in the Go language, so all of the pages that are served up are assembled by the Go binary. The Go binary also has a local instance of git running, and any actions taken through the Gitea web interface are translated, by the Go binary, into actions in the git reopsitory.

Out of the box, Gitea provides all of this functionality, and takes care of all the details, so you don't have to worry about any of this. But if you build it yourself, you can modify the content the binary serves up and the actions it takes.

In this guide, we'll stick to the binary.

DIagram of how the gitea binary runs as an http server on port 3000.

Binary or Source

There are some problems with the source code that may make compilation from source impossible. (See charlesreid1.com wiki notes.) The binary version of Gitea is recommended.

Configuring Gitea Directories

Gitea expects a certain directory organization, specifically, a folder for binaries, certificates, a database, git repositories, and the log.

The recommended arrangment is:

/path/to/www/gitea/
   \
    \---------- bin/
     \--------- certs/
      \-------- data/
       \------- repositories/
        \------ log/

Once the files are organized in this way, navigate to the bin directory and (preferrably using tmux or screen to send to background) execute the command:

./gitea web

This runs a setup page on port 3000.

Note that you may or may not need to set your $GOPATH variable:

export GOPATH="${HOME}/gocode"

Go ahead and follow the instructions on the page to get Gitea set up. The gitea.io page has good documentation.

Opening the Firewall

Assuming your port 3000 was closed on the firewall, and assuming you plan to run the gitea service as a public service listening on an external port, you will want to open your firewall to accept incoming traffic on port 3000:

# allow traffic on port 3000
iptables -A INPUT -p tcp --dport 3000 -j ACCEPT
#  to allow forwarding for 3000 to internal LAN 
iptables -A FORWARD -p tcp -j ACCEPT --dport 3000 -m state --state NEW

Setting Up and Configuring Gitea Server

To configure gitea, you will use a .ini configuration file contained in:

/path/to/www/gitea/custom/conf/app.ini

Here is an example gitea configuration file. It starts with a header, then has sections for different aspects like databases, the git repository, and the server:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Example Gitea Config File
;;
;; https://github.com/go-gitea/gitea/blob/master/conf/app.ini
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

APP_NAME = big ugly git repo (BUGR)
RUN_USER = git
RUN_MODE = prod

[database]
DB_TYPE  = sqlite3
SSL_MODE = disable
PATH     = /www/gitea/data/gitea.db

[repository]
ROOT = /www/gitea/repositories
PREFERRED_LICENSES = MIT License

[server]
PROTOCOL     = https
DOMAIN       = yourdomain.com
CERT_FILE    = /www/gitea/certs/cert.pem
KEY_FILE     = /www/gitea/certs/key.pem
SSH_DOMAIN   = git.yourdomain.com
HTTP_PORT    = 3000
ROOT_URL     = https://yourdomain.com:3000
DISABLE_SSH  = false
SSH_PORT     = 22
OFFLINE_MODE = false

[log]
ROOT_PATH = /www/gitea/log

Ultimately, there are many configuration examples available for Gitea, so we won't go into any more detail on that.

For details on how to configure Gitea with HTTPS, see charlesreid1.com/wiki/Gitea.

Let's move on to how we actually utilize Gitea, and focus especially on how to make gitea work in tandem with github and other git servers, instead of being yet another complication in the toolchain.

How To Use Gitea

As we mentioned before, the gitea binary is actually wrapping a git repo, and interactions with the binary (via the web) are translated into actions in that repo. This is important to understand, since it will prevent us from thinking about gitea as a "centralized" server, and get us thinking about gitea as just another computer with a copy of the repository on it.

Understanding how to use gitea also requires correct thinking about how distributed version control works - the way that each collaborator with a copy of the repo can also act as a decentralized git server. In this way, the gitea server becomes just another git client, with some extra frosting on top in the form of a web interface.

Pushing a Local Repo Into Gitea

If you want to move existing local repositories into gitea, it's easy. Remember that with git, everything is local, so all you need to do is push a copy to gitea, and gitea will have the whole repository locally too, including its entire history and any branches and tags that were on your local machine. If you have remotes, you'll need to get local copies of any branches or tags you want from that remote server (covered in more detail below).

The basic steps: Get everything you want locally (including branches and tags) Fetch from remotes, if you have any. Add new remote (gitea) Push to new remote

The end result will be a gitea repo fully-populated with the project's commit history.

Here are the steps in a bit more detail:

Start by creating a git repository, or cloning an existing one:

mkdir my_git_repo/
cd my_git_repo
git init .

or

git clone http://github.com/mot_pesli/coune_car

Next, create an empty git repo on the gitea server. Sign in using a gitea username and password. There will be a plus sign in the upper right, just like on Github. Click the plus sign, and choose create new repo. Enter the repo name, and a description. Make sure the repo will NOT automatically create a readme or a LICENSE file, as we will be importing all files from another repo anyway.

Once you've got a local copy of the repo with everything you want to push, and you have an empty repository on the gitea server ready for commits, add the gitea repository as a new remote to your local repository.

Example: suppose you have a gitea user named zappa who has created a new empty repo named linol-aquabus on the gitea server. Zappa would then cd to the local copy of the git repo, and add the gitea repository's url as a new remote called gitea:

git remote add gitea https://git.mydomain.com/zappa/linol-aquabus.git

# alternatively,

git remote add gitea https://mydomain.com:3000/zappa/linol-aquabus.git

Now Zappa can push the entire commit history of the current repo to the gitea remote git server:

git push origin gitea

# alternatively,

git push --all origin gitea

If Zappa now visits mydomain.com:3000/zappa/linol-aquabus the entire contents of the repository and its commit history in gitea's web interface will be available for browsing.

Copying Repos from Gitub to Gitea

The process for copying Github repositories to Gitea follows the above procedure closely:

git clone https://github.com/user/repo.git

Check out any branches or tags you may want to push to gitea.

Then add the gitea remote to the repo:

git remote add gitea https://git.mydomain.com/zappa/linol-aquabus.git

Now push the contents of the local repository to the remote gitea repository:

git push origin gitea

Once this command is executed, the gitea remote will have the entire commit history of the repo, which it will render and display through its web interface. Thus, the entire commit history will be immediately available to browse through gitea.

(Note that information about Github issues or Github user profiles is not stored in the git repo, just as Gitea comments and Gitea user profiles are not stored in the git repo.)

Gitea and Github - same repo, side by side.

"Moving" Repos from Gitub to Gitea

First, it should be stated up front that you do not have to choose between gitea and github - you can have a copy of a repo on both, frequently push to one or the other, and occasionally update the other as a kind of "mirror."

If you do want to move a repo from github to gitea, remember that github does not "own" the repo, so what you're actually doing is deleting the copy of your repo that lives on github's server.

To move the repo, all you have to do is copy it from github to somewhere else (whether it be your local computer, or to the gitea server), then delete it from github. You will not lose any information about the git repository history. Github does not keep/store that information - git does.

Where to Push Commits: Github or Gitea?

It may seem confusing to use multiple repositories side-by-side, but this is precisely how the designers of git intended it to be used - for decentralized version control.

That means Github or Gitea are not "centralized" servers that "own" the repository, they are merely another instance of git, running on a server, with a local copy of the repo.

Accordingly, if your project exists as a repo on gitea and as a repo on github, it is like the project existing as a repo on your hard drive and as a repo on your collaborator's hard drive - there is no contradiction. If you make changes to your local copy of the repo, you can commit those changes, then push the commits to someone else. In the same way, you can make changes to your local copy of the repo and commit those changes, then push the commits to either gitea, or github, or both - wherever you want.

Backing Up Gitea

It's important to say a few words about backups.

Because Gitea is configured (with the above config file) to use sqlite, it stores its entire database (the extra "frosting" of stars, wiki pages, issues, user pictures, etc.) in a single file on disk, in the data/ directory of the gitea installation.

The git repository is, similarly, a single file on disk, with the entire database of commits, blogs, commit histories, etc., all contained in the .git directory, which is contained in the repositories directory of the gitea installation.

That means backing up your git server is as easy as copying the gitea folder. No need to back up a MySQL database or do a funny pants dance to export your database to a proprietary format.

This is another central tenent of git's design philosophy - be fast, pack light, and travel easily.

How Gitea Fits Into Git Patterns

There are many resources on git, and some cover the concept of "git patterns," or templates for how to use git.

Because git is designed to be decentralized, gitea provides a "centralized" node for collaboration within a decentralized system. So you could use gitea to collaborate with your local Team A, and push and pull branches and work on various features together. Then, once Team A has put together a branch they've found useful, they can push it to a different remote server that is more widely accessed.

(Note that this is the logic behind why you need to check out all the branches and tags, locally, that you want to push to the gitea remote server - this allows you the flexibility to only share select branches, so that you might have some branches shared in common wth gitea and github, and other branches only existing on gitea.)

Team B and Team C may use github, and not gitea, to access the project code. The gitea server may for example be a private server on a local network, whereas github is better suited for collaboration. By pushing select branches from the Team A git server, gitea, to the github server, Team B and Team C will then be able to access them from github.

You could also have multiple gitea/gitlab/github enterprise servers for multiple organizations running on git, and they could share a code base by sharing git repositories, commit histories, branches, and so on.

Wrapup

Have a look at the Gitea project, the Go language, git, Github for all the nice software that made this possible.

Then have a look at the finished product at git.charlesreid1.com.

Gitea allows you to create organizations, useful for organizing repos related to particular functionality or platforms. For example, rpi is an organization that owns repositories with scripts and software for raspberry pis, docker is an organization that owns repositories with Dockerfiles, and mac holds Mac-related things.

This organization allows better organization of project code - a dotfiles repository that works for a Mac laptop or a Unix server is not necessarily good for a Raspberry Pi or a Docker container, so code is organized accordingly.

Organizations can also be used to organize project-level code. This git repo is in its beginning phases, but has great potential as a great tool that makes git even more functional and provides one more reason to switch to git.

Tags:    git    go    gitea   

Better Timing of Guava Traveling Salesperson Problem Code: Timing Scripts

Posted in Java

permalink

Before We Begin: The Code

Note that all of the code discussed/shown in this post is available from the traveling salesperson problem repository on git.charlesreid1.com. The guava/ directory contains the guava solution to the traveling salesperson problem, along with the timing scripts discussed below, and several example output files.

Introduction

Timing a piece of code can be tricky.

Choosing a random problem of a given size can be problematic, if you happen to randomly chose a particularly easy or difficult case. This can lead to an inaccurate picture of scale-up behavior. Timing results should be statistically averaged - the figure below shows scale-up behavior for the traveling salesperson problem when solving a single random problem, versus a hundred random problems.

Average versus one-time solutions, walltime versus problem size.

This post will cover some conceptual and code tools for measuring the timing and performance of code.

Timing Scripts

Timing a piece of code is a task that sounds simple, on its face. However, the task of profiling code quickly balloons from a single timing of a single bit of code to running hundreds of cases, juggling output from each run, and stitching together script upon script. Soon you find yourself swimming in a pool of output files, looking for your scripts...

In this post we'll cover a workflow for timing a piece of Java code that will help give you a method for thinking about timing. This also balances the need for practicality (i.e., just get it done in as simple a manner as possible) with good design (i.e., flexible inputs and the use of scripts).

Before You Time: Developing Your Algorithm

As you develop your algorithm, you may have variable values hard-coded and random number generators seeding from the same value each time. The intention, as you are developing, is to work out the bugs without the additional difficulty of solving a different problem each time you run the program.

Beginning Your Timing Journey

Once you've implemented your algorithm and verified that it works, the timing portion begins. You can time the code using mechanisms built into the language, which is the most accurate way of timing code. This is called instrumenting your code. To begin with, you might try timing a single problem size, but this information is hard to interpret without more data, so you start to time the code on problems of other sizes.

Here there be dragons.

Statistical Timing, a.k.a., If You Give A Mouse A Cookie

See If You Give A Mouse A Cookie.

Once you time the code solving a single problem, you will want to time the code solving multiple problems.

Once you time the code solving multiple problems, you will want to time the code solving each problem multiple times to get an accurate statistical picture of the solution time on a variety of problems.

Once you get statistics about multiple problem times, you will want to gather statics about variations in problem types and algorithm variations.

It simply does not end. A strategy for managing this deluge of new timing needs prevents spaghetti code.

Hierarchical Timing Strategy

The strategy here is to design code and scripts that are hierarchical. The levels of the hierarchy consist of the different scopes involved in timing: Single problem/program/binary Multiple problem/program/binary Statistical averages using case matrix Design of Computer Experiments: Testing Variations * Strong and weak scaling (if we decide to continue with parallelization of algorithm)

At each stage, we utilize scripts to bundle the task into a single command or script.

Single Problem/Program/Binary: Makefile

At the scale of solving a single problem, we need a tool that will compile and run Java code - even if it only happens once. Makefiles are an excellent tool for stringing together commands with flags (in the case of the traveling salesperson problem, we need to link to the Guava Jars when we compile and run).

By defining some variables at the top of the Makefile, and a few rules, we have a functional and easy way to build code with scalable complexity:

# Set path to guava
HOME=/Users/charles
GUAVA=$(HOME)/codes/guava/jars/guava-21.0.jar

# Set compile target
BIN=TSP
TARGET=$(BIN).java

# Set java class path
# Hard to believe this actually works, but ok:
CP=-cp '.:$(GUAVA)'


build:
    javac $(CP) $(TARGET) 

run:
    # If no size, use default
    java $(CP) $(BIN) 

time:
    # Java times itself, we just have to pass it the size
    java $(CP) $(BIN) $(SIZE) 

clean:
    rm -rf *.class

Link to code on git.charlesreid1.com

This enables us to run

$ make build 
/Users/charles/codes/guava/jars/guava-21.0.jar

and have a new version of the code compile. We can also run a problem with a default size of 5 nodes with a simple make run command:

$ make run
java -cp '.:/Users/charles/codes/guava/jars/guava-21.0.jar' TSP
------------------- TSP Version 2: The Pessimistic Algorithm ----------------------
!!!!!YAY!!!!!!  NEW SOLUTION    Route: [0, 2, 3, 4, 1]  Distance: 258
!!!!!YAY!!!!!!  NEW SOLUTION    Route: [0, 2, 3, 1, 4]  Distance: 257
!!!!!YAY!!!!!!  NEW SOLUTION    Route: [0, 2, 4, 1, 3]  Distance: 189
Found solution...?
Elapsed time 0.005529 s

and have the compiled code run. We can also feed arguments to make commands, and have them passed to the command that is executed:

$ make time SIZE=10
# Java times itself, we just have to pass it the size
java -cp '.:/Users/charles/codes/guava/jars/guava-21.0.jar' TSP 10
------------------- TSP Version 2: The Pessimistic Algorithm ----------------------
!!!!!YAY!!!!!!  NEW SOLUTION    Route: [0, 6, 5, 3, 9, 7, 2, 8, 1, 4]   Distance: 446
!!!!!YAY!!!!!!  NEW SOLUTION    Route: [0, 6, 5, 3, 9, 7, 2, 8, 4, 1]   Distance: 395
!!!!!YAY!!!!!!  NEW SOLUTION    Route: [0, 6, 5, 3, 9, 7, 4, 1, 8, 2]   Distance: 382
!!!!!YAY!!!!!!  NEW SOLUTION    Route: [0, 6, 5, 3, 9, 1, 4, 7, 2, 8]   Distance: 365
!!!!!YAY!!!!!!  NEW SOLUTION    Route: [0, 6, 5, 3, 9, 1, 4, 7, 8, 2]   Distance: 326
!!!!!YAY!!!!!!  NEW SOLUTION    Route: [0, 6, 5, 3, 2, 8, 7, 4, 1, 9]   Distance: 303
!!!!!YAY!!!!!!  NEW SOLUTION    Route: [0, 6, 5, 3, 2, 8, 4, 7, 9, 1]   Distance: 298
!!!!!YAY!!!!!!  NEW SOLUTION    Route: [0, 6, 5, 4, 7, 3, 2, 8, 9, 1]   Distance: 286
!!!!!YAY!!!!!!  NEW SOLUTION    Route: [0, 6, 5, 8, 2, 3, 7, 4, 1, 9]   Distance: 283
!!!!!YAY!!!!!!  NEW SOLUTION    Route: [0, 9, 1, 4, 7, 3, 6, 5, 8, 2]   Distance: 275
!!!!!YAY!!!!!!  NEW SOLUTION    Route: [0, 3, 2, 8, 5, 6, 4, 7, 9, 1]   Distance: 274
!!!!!YAY!!!!!!  NEW SOLUTION    Route: [0, 2, 8, 7, 4, 1, 9, 5, 6, 3]   Distance: 262
!!!!!YAY!!!!!!  NEW SOLUTION    Route: [0, 2, 8, 4, 7, 3, 6, 5, 9, 1]   Distance: 256
Found solution...?
Elapsed time 0.050795 s

For running single cases and gathering small amounts of (initial) timing data, Makefiles greatly simplify the workflow.

Multiple Program/Problem/Binary: Bash

Bash is a faithful scripting language that makes it easy to do simple mechanical tasks like run commands in for loops, necessary for the next level of complexity in our timing hierarchy.

While the task at hand is simple enough, and we could easily use Makefiles, we'll use a separate Bash script for separate functionality and keep things from getting overly complicated in the Makefile.

The essence of the timing script is combining the make time command, which takes a size parameter, with bash for loops.

Here is a stripped down version of the timing script:

make build 

for N in 4 5 6 7 8
do
    make time SIZE=${N} 
done

That's it... The rest is just printing!

The make time command passes the size argument on the command line. The Java Traveling Salesperson Problem code checks if there is an input argument on the command line, and if there is it uses that as the problem size.

Here's a more embellished script:

export RIGHTNOW="`date +"%Y%m%d_%H%M%S"`"
export OUT="timeout_tsp_java_${RIGHTNOW}.out"
touch ${OUT}
cat /dev/null > ${OUT}

# Compile
make build 

for N in {4..8..1}
do
    echo "**************************************" >> ${OUT}
    echo "Running TSP with $N nodes with Java..." >> ${OUT}

    make time SIZE=${N} >> ${OUT} 2>&1

    make dot
    mv graphviz.png graphviz_tsp_${N}.png

    echo "Done." >> ${OUT}

    echo "" >> ${OUT}

done

echo ""
echo ${OUTFILE}
echo ""

Link to code on git.charlesreid1.com

This creates a time-and-date-stamped output file in which all of the output of this script goes - and which can be parsed for plotting the results of timing studies.

Before it times the solution to the TSP, it dumps out a graphviz dot file containing a schematic of the graph that can be diagrammed:

Example graphviz dot output showing a 14-node TSP.

**************************************
Running TSP with 4 nodes with Java...
Found solution.
Elapsed time 0.004679 s
 Done.

**************************************
Running TSP with 5 nodes with Java...
Found solution.
Elapsed time 0.004543 s
 Done.

**************************************
Running TSP with 6 nodes with Java...
Found solution.
Elapsed time 0.006911 s
 Done.

**************************************
Running TSP with 7 nodes with Java...
Found solution.
Elapsed time 0.008286 s
 Done.

**************************************
Running TSP with 8 nodes with Java...
Found solution.
Elapsed time 0.014499 s
 Done.

In this way we can run a quick test matrix of different problem sizes, and see how the code scales.

Statistical Averaging

Of course, any good computational physicist will tell you that scaling behavior extrapolated from running a single case of a single problem size is folly - your random choice of graph could have been an unusually easy or difficult graph, throwing off the results of the timing.

A proper scaling study really needs to take into account the statistical trends in solution time on a random assortment of problems, so we need a way of scripting solutions to dozens or hundreds of random problems and computing statistically representative measures of code performance.

Our code implements a function to generate random graphs, but for testing and debugging purposes the random number generator was seeded with the same value each time. By making the random number generators truly random, each problem we solve will be a different random graph with the specified number ofn odes.

We can accomplish all of this using bash again: within the for loop over different problem sizes, we will add a for loop for repetitions. Here is the basic framework:

make build 

for N in {4..10..1}
do
    for repetition in {0..100..1}
    do
        make time SIZE=${N} 
    done
done

And the embellished version that prints the resulting timing information to a file:

# Compile
make build 

export RIGHTNOW="`date +"%Y%m%d_%H%M%S"`"

for N in {4..10..1}
do

    export OUT="avgtimeout_tsp_${RIGHTNOW}_${N}.out"
    touch ${OUT}

    echo "**************************************" >> ${OUT}
    echo "Running TSP with $N nodes with Java..." >> ${OUT}

    for repetition in {0..100..1}
    do
        make time SIZE=${N} >> ${OUT} 2>&1
    done
    echo "Done." >> ${OUT}
    echo "" >> ${OUT}

done

echo ""
echo ${OUTFILE}
echo ""

Link to code on git.charlesreid1.com

This results in a file with a large amount of information, but it can be trimmed down to the quantities of interest using a little command line fu. Here are filtered results from an 11-node traveling salesperson problem run approximately 102 times:

$ cat avgtimeout_tsp_20170330_235134_11.out | grep Elapsed | wc -l
     102

$ cat avgtimeout_tsp_20170330_235134_11.out | grep Elapsed
Elapsed time 0.082839 s
Elapsed time 0.130378 s
Elapsed time 0.173737 s
Elapsed time 0.100067 s
Elapsed time 0.166046 s

<clipped>

Elapsed time 0.147801 s
Elapsed time 0.285777 s
Elapsed time 0.078655 s
Elapsed time 0.081246 s
Elapsed time 0.174288 s

Now we have a Makefile that allows us to build and run with a single command, and a bash script that loops over each problem size and runs a set of computations for each problem size. But how to compute an average?

We have an assortment of choices - Python being the most obvious - but in the spirit of old-school Unix tools like make, cat, grep, and wc, let's use awk to compute the average walltime for each case size.

Awk to Compute Average Walltime

Given the following output of elapsed walltime for different problem sizes,

$ cat avgtimeout_tsp_20170330_235134_11.out | grep Elapsed
Elapsed time 0.082839 s
Elapsed time 0.130378 s
Elapsed time 0.173737 s
Elapsed time 0.100067 s
Elapsed time 0.166046 s

<clipped>

Elapsed time 0.147801 s
Elapsed time 0.285777 s
Elapsed time 0.078655 s
Elapsed time 0.081246 s
Elapsed time 0.174288 s

we want to extract two things: the first is the problem size (contained in the filename), and the second is the column of numbers.

We can extract the problem size from the filename using sed, by searching for the pattern _8.out or _10.out. Here is a Bash one-liner that does this for a variable $f containing the filename:

    N=$(echo $f | sed 's/^.*_\([0-9]\{1,\}\).out/\1/')

The next thing we wnat to do is extract the column of timing data. This is a good task for the cut utility. We can pass it two flags, -d to tell it what to use as a field delimiter, and -f to tell it which field (column) to return. To extract the third column,

    cat $f | grep "Elapsed" | cut -d" " -f3 

This command results in a series of floating point numbers. If we can pipe it to a program that can do simple math, like awk, we can compute an average (which is pretty simple math).

Since awk is a text-proecssing program that happens to be able to interpret numbers as numbers, we have to think like a text processing program. To compute an average, we accumulate the sum of each line in the file, using an accumulator variable. When we have gone through each line in the file, we divide this cumulative sum by the number of lines in the file.

    cat $f | grep "Elapsed" | cut -d" " -f3 | awk '{a+=$1} END{print a/NR}'

This one-liner uses two important concepts in awk, the first is the bracketed blocks {} some denoted with BEGIN and END, and the second is the set of built-in variables available in awk.

By surrounding statements by brackets, we denote a block of statements to be run together as the main body of the program. If the bracket is prefixed by BEFORE, this block of statements is run once at the beginning of the program. If the bracket is prefixed by END, this block of statements is run once at the end of the program. To compute an average, the main body of the program is to accumulate a cumulative sum variable. The END block, run once at the end, is to divide this cumulative sum by the total number of lines.

Finally, these values can be assigned to Bash variables using $(cmd) syntax:

    export N=$(echo $f | sed 's/^.*_\([0-9]\{1,\}\).out/\1/')
    export AVG=$(cat $f | grep "Elapsed" | cut -d" " -f3 | awk '{a+=$1} END{print a/NR}') 
    echo "Average time : ${N}-node TSP problem : ${AVG} s"

Here is the final script, which loops over all of the output files generated by the statistical average timing script, which runs 10-100 cases per problem size:

#!/bin/sh

for f in `/bin/ls -1 avgtimeout_tsp*`
do
    touch tmpfile 
    cat /dev/null > tmpfile

    export N=$(echo $f | sed 's/^.*_\([0-9]\{1,\}\).out/\1/')
    export AVG=$(cat $f | grep "Elapsed" | cut -d" " -f3 | awk '{a+=$1} END{print a/NR}') 
    echo "Average time : ${N}-node TSP problem : ${AVG} s"

    rm tmpfile
done

Link to code on git.charlesreid1.com

Example output:

$ ./avg_calcs.sh  | sort
Average time : 10-node TSP problem : 0.0492826 s
Average time : 11-node TSP problem : 0.125827 s
Average time : 12-node TSP problem : 0.272998 s
Average time : 13-node TSP problem : 0.56743 s
Average time : 14-node TSP problem : 1.24297 s
Average time : 14-node TSP problem : 1.66563 s
Average time : 15-node TSP problem : 8.08373 s
Average time : 16-node TSP problem : 16.4353 s
Average time : 17-node TSP problem : 94.7363 s
Average time : 18-node TSP problem : 749.798 s
Average time : 4-node TSP problem : 0.00473154 s
Average time : 4-node TSP problem : 0.00475648 s
Average time : 5-node TSP problem : 0.005247 s
Average time : 5-node TSP problem : 0.005765 s
Average time : 6-node TSP problem : 0.00694926 s
Average time : 7-node TSP problem : 0.0100795 s
Average time : 8-node TSP problem : 0.0167346 s
Average time : 9-node TSP problem : 0.028239 s

Inspecting the output from particular solutions of particular random graphs shows a wide variation in the number of shortest paths found. For example, for random graphs consisting of 11 nodes, here are some sample solutions. Note the difference in solution times:

------------------- TSP Version 2: The Pessimistic Algorithm ----------------------
NEW SOLUTION    Route: [0, 9, 2, 1, 8, 4, 5, 7, 6, 10, 3]   Distance: 597
NEW SOLUTION    Route: [0, 9, 2, 1, 8, 4, 5, 7, 6, 3, 10]   Distance: 582
NEW SOLUTION    Route: [0, 9, 2, 1, 8, 4, 5, 7, 10, 3, 6]   Distance: 560
NEW SOLUTION    Route: [0, 9, 2, 1, 8, 4, 5, 7, 3, 6, 10]   Distance: 553
NEW SOLUTION    Route: [0, 9, 2, 1, 8, 4, 5, 10, 3, 7, 6]   Distance: 540
NEW SOLUTION    Route: [0, 9, 2, 1, 8, 4, 5, 10, 7, 6, 3]   Distance: 532
NEW SOLUTION    Route: [0, 9, 2, 1, 8, 4, 5, 10, 7, 3, 6]   Distance: 503
NEW SOLUTION    Route: [0, 9, 2, 1, 8, 4, 5, 6, 3, 7, 10]   Distance: 498
NEW SOLUTION    Route: [0, 9, 2, 1, 8, 4, 10, 3, 7, 5, 6]   Distance: 450
NEW SOLUTION    Route: [0, 9, 2, 1, 8, 4, 10, 5, 7, 3, 6]   Distance: 440
NEW SOLUTION    Route: [0, 9, 2, 1, 8, 4, 10, 5, 6, 7, 3]   Distance: 422
NEW SOLUTION    Route: [0, 9, 2, 1, 8, 4, 6, 5, 10, 3, 7]   Distance: 421
NEW SOLUTION    Route: [0, 9, 2, 1, 8, 4, 6, 5, 10, 7, 3]   Distance: 380
NEW SOLUTION    Route: [0, 9, 2, 1, 8, 7, 10, 5, 6, 4, 3]   Distance: 364
NEW SOLUTION    Route: [0, 9, 2, 1, 8, 7, 3, 4, 6, 5, 10]   Distance: 357
NEW SOLUTION    Route: [0, 9, 2, 1, 5, 10, 7, 3, 8, 4, 6]   Distance: 352
NEW SOLUTION    Route: [0, 9, 2, 1, 5, 10, 7, 8, 3, 4, 6]   Distance: 336
NEW SOLUTION    Route: [0, 9, 2, 1, 5, 6, 4, 10, 7, 8, 3]   Distance: 330
NEW SOLUTION    Route: [0, 9, 2, 1, 4, 8, 7, 3, 6, 5, 10]   Distance: 326
NEW SOLUTION    Route: [0, 9, 2, 1, 4, 8, 3, 7, 6, 5, 10]   Distance: 320
NEW SOLUTION    Route: [0, 9, 2, 1, 4, 8, 3, 7, 10, 5, 6]   Distance: 298
NEW SOLUTION    Route: [0, 9, 2, 1, 4, 6, 5, 10, 7, 8, 3]   Distance: 261
NEW SOLUTION    Route: [0, 9, 2, 7, 8, 3, 1, 4, 6, 5, 10]   Distance: 250
NEW SOLUTION    Route: [0, 9, 8, 7, 2, 3, 1, 4, 6, 5, 10]   Distance: 245
Found solution.
Elapsed time 0.057047 s

------------------- TSP Version 2: The Pessimistic Algorithm ----------------------
NEW SOLUTION    Route: [0, 8, 1, 6, 4, 9, 10, 5, 2, 7, 3]   Distance: 650
NEW SOLUTION    Route: [0, 8, 1, 6, 4, 9, 10, 5, 2, 3, 7]   Distance: 641
NEW SOLUTION    Route: [0, 8, 1, 6, 4, 9, 10, 5, 7, 2, 3]   Distance: 638
NEW SOLUTION    Route: [0, 8, 1, 6, 4, 9, 10, 5, 7, 3, 2]   Distance: 630
NEW SOLUTION    Route: [0, 8, 1, 6, 4, 9, 10, 7, 5, 2, 3]   Distance: 589
NEW SOLUTION    Route: [0, 8, 1, 6, 4, 9, 10, 2, 3, 7, 5]   Distance: 580
NEW SOLUTION    Route: [0, 8, 1, 6, 4, 9, 10, 2, 5, 7, 3]   Distance: 548
NEW SOLUTION    Route: [0, 8, 1, 6, 4, 9, 2, 5, 7, 3, 10]   Distance: 533
NEW SOLUTION    Route: [0, 8, 1, 6, 4, 9, 7, 5, 2, 3, 10]   Distance: 522
NEW SOLUTION    Route: [0, 8, 1, 6, 4, 9, 7, 5, 2, 10, 3]   Distance: 495
NEW SOLUTION    Route: [0, 8, 1, 6, 7, 9, 4, 5, 2, 10, 3]   Distance: 494
NEW SOLUTION    Route: [0, 8, 1, 6, 7, 5, 4, 9, 2, 10, 3]   Distance: 493
NEW SOLUTION    Route: [0, 8, 1, 6, 3, 10, 2, 5, 7, 9, 4]   Distance: 490
NEW SOLUTION    Route: [0, 8, 1, 6, 3, 10, 2, 5, 7, 4, 9]   Distance: 486
NEW SOLUTION    Route: [0, 8, 1, 6, 9, 4, 7, 5, 2, 10, 3]   Distance: 467
NEW SOLUTION    Route: [0, 8, 1, 10, 2, 5, 7, 4, 9, 6, 3]   Distance: 453
NEW SOLUTION    Route: [0, 8, 1, 2, 5, 7, 4, 9, 6, 3, 10]   Distance: 442
NEW SOLUTION    Route: [0, 8, 1, 2, 5, 7, 4, 9, 6, 10, 3]   Distance: 437
NEW SOLUTION    Route: [0, 8, 1, 5, 7, 4, 9, 2, 10, 6, 3]   Distance: 433
NEW SOLUTION    Route: [0, 8, 10, 2, 1, 5, 4, 9, 7, 6, 3]   Distance: 427
NEW SOLUTION    Route: [0, 8, 10, 2, 1, 5, 7, 4, 9, 6, 3]   Distance: 400
NEW SOLUTION    Route: [0, 8, 2, 1, 5, 7, 4, 9, 6, 10, 3]   Distance: 399
NEW SOLUTION    Route: [0, 3, 7, 4, 9, 6, 10, 2, 1, 5, 8]   Distance: 394
NEW SOLUTION    Route: [0, 3, 7, 4, 9, 6, 10, 2, 8, 5, 1]   Distance: 389
NEW SOLUTION    Route: [0, 3, 6, 9, 4, 7, 5, 8, 1, 2, 10]   Distance: 384
NEW SOLUTION    Route: [0, 3, 6, 9, 4, 7, 5, 8, 10, 2, 1]   Distance: 382
NEW SOLUTION    Route: [0, 3, 6, 9, 4, 7, 5, 1, 2, 8, 10]   Distance: 374
NEW SOLUTION    Route: [0, 9, 4, 5, 1, 2, 8, 10, 3, 6, 7]   Distance: 373
NEW SOLUTION    Route: [0, 9, 4, 7, 3, 6, 10, 8, 5, 1, 2]   Distance: 365
NEW SOLUTION    Route: [0, 9, 4, 7, 3, 6, 10, 2, 1, 5, 8]   Distance: 360
NEW SOLUTION    Route: [0, 9, 4, 7, 3, 6, 10, 2, 8, 5, 1]   Distance: 355
NEW SOLUTION    Route: [0, 9, 4, 7, 5, 8, 1, 2, 10, 6, 3]   Distance: 345
NEW SOLUTION    Route: [0, 9, 4, 7, 5, 1, 2, 8, 10, 6, 3]   Distance: 335
Found solution.
Elapsed time 0.159882 s

------------------- TSP Version 2: The Pessimistic Algorithm ----------------------
NEW SOLUTION    Route: [0, 3, 10, 2, 5, 4, 7, 8, 1, 9, 6]   Distance: 500
NEW SOLUTION    Route: [0, 3, 10, 2, 5, 4, 7, 8, 1, 6, 9]   Distance: 498
NEW SOLUTION    Route: [0, 3, 10, 2, 5, 4, 7, 8, 6, 1, 9]   Distance: 493
NEW SOLUTION    Route: [0, 3, 10, 2, 5, 4, 7, 8, 6, 9, 1]   Distance: 481
NEW SOLUTION    Route: [0, 3, 10, 2, 5, 4, 7, 8, 9, 1, 6]   Distance: 443
NEW SOLUTION    Route: [0, 3, 10, 2, 5, 4, 7, 8, 9, 6, 1]   Distance: 429
NEW SOLUTION    Route: [0, 3, 10, 2, 5, 6, 1, 9, 8, 7, 4]   Distance: 417
NEW SOLUTION    Route: [0, 3, 10, 2, 6, 5, 4, 7, 8, 9, 1]   Distance: 360
NEW SOLUTION    Route: [0, 3, 10, 2, 1, 9, 6, 5, 4, 7, 8]   Distance: 359
NEW SOLUTION    Route: [0, 3, 10, 4, 5, 6, 2, 1, 9, 7, 8]   Distance: 358
NEW SOLUTION    Route: [0, 3, 10, 4, 5, 6, 2, 1, 9, 8, 7]   Distance: 342
NEW SOLUTION    Route: [0, 3, 10, 4, 5, 6, 9, 8, 7, 2, 1]   Distance: 340
NEW SOLUTION    Route: [0, 3, 10, 4, 7, 8, 9, 1, 2, 6, 5]   Distance: 310
NEW SOLUTION    Route: [0, 3, 7, 8, 9, 10, 4, 5, 6, 2, 1]   Distance: 305
NEW SOLUTION    Route: [0, 3, 9, 8, 7, 4, 10, 1, 2, 6, 5]   Distance: 303
NEW SOLUTION    Route: [0, 5, 4, 7, 3, 10, 8, 9, 6, 2, 1]   Distance: 301
NEW SOLUTION    Route: [0, 5, 4, 7, 3, 10, 1, 2, 6, 9, 8]   Distance: 293
NEW SOLUTION    Route: [0, 5, 4, 7, 3, 1, 6, 2, 10, 9, 8]   Distance: 286
NEW SOLUTION    Route: [0, 5, 4, 7, 3, 1, 2, 6, 9, 10, 8]   Distance: 281
NEW SOLUTION    Route: [0, 5, 4, 7, 3, 1, 2, 6, 10, 9, 8]   Distance: 275
NEW SOLUTION    Route: [0, 5, 4, 7, 8, 9, 6, 2, 1, 3, 10]   Distance: 270
NEW SOLUTION    Route: [0, 5, 4, 7, 8, 9, 10, 3, 1, 2, 6]   Distance: 269
NEW SOLUTION    Route: [0, 5, 4, 10, 3, 7, 8, 9, 6, 2, 1]   Distance: 268
NEW SOLUTION    Route: [0, 5, 4, 10, 9, 8, 7, 3, 1, 2, 6]   Distance: 252
NEW SOLUTION    Route: [0, 5, 4, 10, 9, 6, 2, 1, 3, 7, 8]   Distance: 248
NEW SOLUTION    Route: [0, 5, 6, 2, 10, 4, 7, 8, 9, 3, 1]   Distance: 235
NEW SOLUTION    Route: [0, 5, 6, 2, 1, 3, 10, 4, 7, 8, 9]   Distance: 231
NEW SOLUTION    Route: [0, 5, 6, 2, 1, 3, 7, 4, 10, 9, 8]   Distance: 212
NEW SOLUTION    Route: [0, 5, 6, 2, 1, 3, 9, 10, 4, 7, 8]   Distance: 207
Found solution.
Elapsed time 0.175282 s

------------------- TSP Version 2: The Pessimistic Algorithm ----------------------
NEW SOLUTION    Route: [0, 6, 7, 2, 5, 10, 4, 9, 3, 8, 1]   Distance: 515
NEW SOLUTION    Route: [0, 6, 7, 2, 5, 10, 4, 9, 3, 1, 8]   Distance: 495
NEW SOLUTION    Route: [0, 6, 7, 2, 5, 10, 4, 9, 8, 1, 3]   Distance: 423
NEW SOLUTION    Route: [0, 6, 7, 2, 5, 10, 4, 8, 9, 1, 3]   Distance: 368
NEW SOLUTION    Route: [0, 6, 7, 2, 5, 10, 4, 3, 8, 9, 1]   Distance: 361
NEW SOLUTION    Route: [0, 6, 7, 2, 5, 10, 4, 3, 1, 9, 8]   Distance: 341
NEW SOLUTION    Route: [0, 6, 7, 2, 5, 1, 9, 8, 3, 4, 10]   Distance: 330
NEW SOLUTION    Route: [0, 6, 7, 2, 10, 5, 1, 9, 8, 4, 3]   Distance: 309
NEW SOLUTION    Route: [0, 6, 7, 10, 2, 5, 1, 9, 8, 4, 3]   Distance: 284
NEW SOLUTION    Route: [0, 6, 7, 8, 9, 1, 5, 2, 10, 4, 3]   Distance: 266
NEW SOLUTION    Route: [0, 6, 7, 8, 9, 1, 5, 10, 2, 4, 3]   Distance: 258
NEW SOLUTION    Route: [0, 6, 5, 2, 10, 4, 3, 1, 9, 8, 7]   Distance: 250
NEW SOLUTION    Route: [0, 6, 5, 2, 10, 7, 8, 9, 1, 3, 4]   Distance: 243
NEW SOLUTION    Route: [0, 6, 5, 2, 4, 3, 1, 9, 8, 7, 10]   Distance: 225
NEW SOLUTION    Route: [0, 6, 5, 1, 9, 8, 7, 10, 2, 4, 3]   Distance: 197
Found solution.
Elapsed time 0.051783 s

------------------- TSP Version 2: The Pessimistic Algorithm ----------------------
NEW SOLUTION    Route: [0, 8, 9, 6, 1, 7, 3, 4, 5, 2, 10]   Distance: 607
NEW SOLUTION    Route: [0, 8, 9, 6, 1, 7, 3, 4, 2, 5, 10]   Distance: 573
NEW SOLUTION    Route: [0, 8, 9, 6, 1, 7, 3, 4, 10, 5, 2]   Distance: 562
NEW SOLUTION    Route: [0, 8, 9, 6, 1, 7, 3, 5, 2, 4, 10]   Distance: 513
NEW SOLUTION    Route: [0, 8, 9, 6, 1, 3, 5, 2, 7, 4, 10]   Distance: 496
NEW SOLUTION    Route: [0, 8, 9, 6, 7, 3, 1, 5, 2, 4, 10]   Distance: 493
NEW SOLUTION    Route: [0, 8, 9, 6, 7, 3, 1, 10, 4, 2, 5]   Distance: 489
NEW SOLUTION    Route: [0, 8, 9, 6, 7, 2, 4, 10, 1, 3, 5]   Distance: 487
NEW SOLUTION    Route: [0, 8, 9, 6, 7, 2, 5, 3, 1, 10, 4]   Distance: 480
NEW SOLUTION    Route: [0, 8, 9, 6, 4, 3, 5, 2, 7, 1, 10]   Distance: 454
NEW SOLUTION    Route: [0, 8, 9, 6, 4, 5, 2, 7, 3, 1, 10]   Distance: 453
NEW SOLUTION    Route: [0, 8, 9, 6, 4, 10, 5, 2, 7, 3, 1]   Distance: 452
NEW SOLUTION    Route: [0, 8, 9, 6, 4, 10, 1, 7, 3, 5, 2]   Distance: 443
NEW SOLUTION    Route: [0, 8, 9, 6, 4, 10, 1, 3, 5, 2, 7]   Distance: 411
NEW SOLUTION    Route: [0, 8, 9, 10, 4, 6, 1, 3, 5, 2, 7]   Distance: 407
NEW SOLUTION    Route: [0, 8, 9, 10, 4, 6, 7, 2, 5, 3, 1]   Distance: 402
NEW SOLUTION    Route: [0, 8, 9, 10, 4, 6, 5, 2, 7, 3, 1]   Distance: 390
NEW SOLUTION    Route: [0, 8, 9, 10, 4, 6, 2, 7, 3, 1, 5]   Distance: 388
NEW SOLUTION    Route: [0, 8, 9, 10, 4, 6, 2, 7, 1, 3, 5]   Distance: 387
NEW SOLUTION    Route: [0, 8, 9, 10, 4, 6, 2, 5, 3, 1, 7]   Distance: 384
NEW SOLUTION    Route: [0, 8, 9, 10, 1, 3, 5, 2, 7, 4, 6]   Distance: 376
NEW SOLUTION    Route: [0, 8, 9, 10, 6, 4, 3, 1, 7, 2, 5]   Distance: 371
NEW SOLUTION    Route: [0, 8, 9, 10, 6, 4, 3, 1, 5, 2, 7]   Distance: 367
NEW SOLUTION    Route: [0, 8, 9, 10, 6, 4, 2, 5, 3, 1, 7]   Distance: 366
NEW SOLUTION    Route: [0, 8, 9, 10, 6, 4, 7, 2, 5, 3, 1]   Distance: 365
NEW SOLUTION    Route: [0, 8, 9, 5, 3, 1, 10, 6, 4, 2, 7]   Distance: 358
NEW SOLUTION    Route: [0, 8, 9, 5, 2, 7, 3, 1, 10, 4, 6]   Distance: 352
NEW SOLUTION    Route: [0, 8, 2, 4, 6, 10, 9, 5, 3, 7, 1]   Distance: 344
NEW SOLUTION    Route: [0, 8, 2, 4, 6, 10, 9, 5, 3, 1, 7]   Distance: 328
NEW SOLUTION    Route: [0, 8, 2, 7, 3, 1, 5, 9, 10, 4, 6]   Distance: 321
NEW SOLUTION    Route: [0, 8, 2, 7, 1, 3, 5, 9, 10, 4, 6]   Distance: 320
NEW SOLUTION    Route: [0, 8, 5, 3, 1, 9, 10, 6, 4, 2, 7]   Distance: 319
NEW SOLUTION    Route: [0, 8, 5, 9, 10, 6, 4, 2, 7, 3, 1]   Distance: 314
NEW SOLUTION    Route: [0, 8, 5, 2, 7, 3, 1, 9, 10, 4, 6]   Distance: 313
NEW SOLUTION    Route: [0, 8, 4, 3, 5, 2, 7, 1, 9, 10, 6]   Distance: 299
NEW SOLUTION    Route: [0, 8, 4, 3, 7, 2, 5, 9, 1, 10, 6]   Distance: 293
NEW SOLUTION    Route: [0, 8, 4, 3, 1, 7, 2, 5, 9, 10, 6]   Distance: 278
NEW SOLUTION    Route: [0, 8, 4, 2, 7, 3, 1, 5, 9, 10, 6]   Distance: 277
NEW SOLUTION    Route: [0, 8, 4, 2, 7, 1, 3, 5, 9, 10, 6]   Distance: 276
NEW SOLUTION    Route: [0, 8, 4, 6, 10, 9, 5, 3, 1, 7, 2]   Distance: 263
NEW SOLUTION    Route: [0, 8, 4, 6, 10, 9, 5, 3, 1, 2, 7]   Distance: 262
NEW SOLUTION    Route: [0, 8, 4, 6, 10, 9, 5, 2, 7, 3, 1]   Distance: 249
NEW SOLUTION    Route: [0, 3, 5, 9, 10, 6, 4, 8, 1, 7, 2]   Distance: 245
NEW SOLUTION    Route: [0, 3, 5, 9, 10, 6, 4, 8, 1, 2, 7]   Distance: 244
NEW SOLUTION    Route: [0, 3, 7, 2, 5, 9, 10, 1, 8, 4, 6]   Distance: 242
NEW SOLUTION    Route: [0, 3, 7, 2, 5, 9, 10, 6, 4, 8, 1]   Distance: 231
NEW SOLUTION    Route: [0, 3, 1, 8, 4, 6, 10, 9, 5, 2, 7]   Distance: 215
Found solution.
Elapsed time 0.087109 s

Results

The figure below shows the results of the timing study when using the average of 100 different random problems, compared to the timing study performed using a single problem size.

Average versus one-time solutions, walltime versus problem size.

The results show dramatically different behavior, highlighting the importance of computing the statistical average of solution walltime for many different problems. This was not an issue that arose in discussing timing or profiling of codes to solve the 8 queens problem, because in that case the problem (and resulting decision tree) were determined by the choice of algorithm.

This information is also important ot keep in mind when comparing the timing performance of two algorithms - using many cases to compare two algorithms is preferrable, since it reduces the likelihood of randomly selecting a problem that highights a weakness of one algorithm or a strength of the other.

Summary

In this post we covered some scripting tools that make timing a lot easier to do, and some ways of thinking about and building up scripts to allow for more complex timing studies without an increase in the associated post-processing work involved.

We showed how to use make and a Makefile to create compact, expressive commands to build and run Java programs. We showed how to use Bash scripting to implement loops and run a case matrix of problems of various sizes, measuring timing data for hundreds of problems in total. All of these scripts made heavy use of Unix command line tools, demonstrating how to chain commands and functionality together on the command line to accomplish complex tasks. Finally, we showed how to use the awk programming language to compute the average of a set of numbers, exploring yet another application of this unusually handy language.

Tags:    computer science    command line    guava    graph    TSP    make    awk    performance