# Fixing Bottlenecks in the Guava Traveling Salesperson Problem Code

Posted in Java

## Intro

In a prior blog post we introduced you to the traveling salesperson problem (TSP), which involves finding the shortest path through every city in a group of cities connected by a network of roads. Using Google Guava, we have implemented a solution to the TSP in Java.

Our philosophy toward timing, profiling, and optimization is that it is always best to work from data - and timing is the first place to begin collecting data. As we will show in this blog post, simply timing your function for different problem sizes can reveal scaling behavior that indicates bottlenecks, bugs, or inefficiencies in the algorithm.

In this post, we use simple timing tools and a spreadsheet to plot scaling behavior and identify bottlenecks in the traveling salesperson problem code. Fixing the bottleneck led to a reduction in cost of two orders of magnitude.

Here's a preview:

In this post we'll cover what we did to time the problem, the initial results, and the algorithm improvement that led to the massive performance improvement.

But first, let's look at some of the graphs that are being solved.

## The Graphs We Are Solving

Let's start by having a look at some of the graphs we will be solving, and the representation of the problem.

### Visualizations of Graphs

The first few graphs start out simple: here is a randomly generated 4-node traveling salesman problem (we wil cover the code that generates the graph pictured here in a moment):

Shortest Route: [0, 2, 3, 1] Distance: 112.0

Here is another randomly generated graph with 5 nodes:

Shortest Route: [0, 4, 2, 1, 3] Distance: 130.0

With 6 nodes:

Shortest Route: [0, 2, 4, 1, 5, 3] Distance: 163.0

But problem of this sise are still trivially easy for a processor to handle. Our inefficient, first-pass algorithm started to show signs of eating up CPU cycles at around 9 nodes (albeit less than 1 second). Here is the graph with 9 nodes:

Shortest Route: [0, 6, 1, 7, 3, 2, 5, 8, 4] Distance: 166.0

With 12 nodes:

Shortest Route: [0, 7, 5, 4, 1, 8, 6, 10, 11, 9, 3, 2] Distance: 236.0

At 14 nodes, even the efficient algorithm crosses the 1 second threshold.

Shortest Route: [0, 2, 6, 10, 13, 12, 7, 4, 1, 11, 5, 3, 9, 8] Distance: 277.0

We tested randomly-generated, fully-connected graphs of up to 18 nodes, and the algorithm was able to compute solutions within a few minutes. Here is an 18-node graph:

Shortest Route: [0, 3, 10, 6, 12, 5, 11, 2, 14, 8, 13, 4, 7, 1, 9] Distance: 267.0

## The Guava TSP Solution

In a prior post we covered the implementation of a solution to the TSP using Guava's Network objects. This implementation utilized a recursive depth-first search algorithm to search for the shortest path among all nodes.

To recap, here was our pseudocode for the TSP solution:

explore(neighbors):

if(no more unvisited neighbors):
# This is the base case.
if total distance is less than current minimum:
save path and new minimum

else:
# This is the recursive case.
for neighbor in unvisited neighbors:
visit neighbor
explore(new_neighbors)
unvisit neighbor


And here is what the recursive backtracking explore() method looked like when implemented in Java:

    /** Recursive backtracking method: explore possible solutions starting at this node, having made nchoices */
public void explore(Node node, int nchoices) {

if(nchoices == graphSize) {
//
// BASE CASE
//
if(this.this_distance < this.min_distance || this.min_distance < 0) {
// Solution base case
this.min_distance = this.this_distance;
printSolution();
}

} else {
//
// RECURSIVE CASE
//
for(Node neighbor : neighbors) {
if(neighbor.visited == false) {

int distance_btwn = -10000;

for( Edge edge : graph.edgesConnecting(node, neighbor) ) {
distance_btwn = edge.value;
}

// Make a choice
this.route[nchoices] = neighbor.id;
neighbor.visit();
this.this_distance += distance_btwn;

// Explore the consequences
explore(neighbor,nchoices+1);

// Unmake the choice
this.route[nchoices] = -1;
neighbor.unvisit();
this.this_distance -= distance_btwn;
}
// Move on to the next choice (continue loop)
}
} // End base/recursive case
}


Note: full TSP code available at http://git.charlesreid1.com/charlesreid1/tsp.

### Timing the TSP Solution

To time the Guava solution to the TSP, we utilized Java's system time to measure the amount of time it took to compute solutions, excluding the time spent on graph construction.

Here is the code that performs the timing of the call to the explore method:

    public static void main(String[] args) throws IllegalArgumentException {

...

double conn = 1.00;
TSP t = new TSP(N,conn);

long start = System.nanoTime();
t.solve();
long end = System.nanoTime();
long duration = end - start;

System.out.printf("Elapsed time %03f s\n ", (duration/1E9) );
}


The elapsed time is computed using System.nanoTime().

Writing a script to feed variable size graphs and time the resulting code showed some pretty awful scaling behavior:

This scaling behavior reveals a bottleneck in the algorithm: the algorithm scales the same way the problem size scales. A more efficient algorithm would be capable of ruling out more of the solution space as the graph size grows, allowing the algorithm to scale better at large problem sizes.

This led to some reconsideration of the algorithm.

## Improving the Guava TSP Solution

The original TSP algorithm implemented a subtle flaw - not by implementing a mistake in the calculation, but by ignoring an important piece of information.

### The Flaw

As the recursive depth-first search traverses the graph, the algorithm is checking if all nodes have been traversed. When all nodes have been traversed, it then compares the distance of that journey to the current shortest journey. If the new journey is shorter, it is saved as the new shortest journey, otherwise it is ignored and we move on.

What this ignores is the fact that any path, at any point, can be checked to see if it is longer than the current minimum, and if it is, any possibilities that follow from it can be skipped.

For example, consider the TSP on a graph of six cities, A B C D E F.

Suppose that the algorithm is in the midst of the recursive backtracking solution, and has a current minimum distance and minimum path of the route A-B-E-D-C-F, which is 24 miles.

Now suppose that the algorithm is searching for solutions that begin with the choice A-E-C, and the distance A-E-C is 28 miles.

The naive algorithm ignores this information, and continues choosing from among the 3 remaining cities, computing the total length for $$3! = 6$$ additional routes, and finding that all six of them do not work.

The smart algorithm checks each time it chooses a new node whether the length of the current route exceeds the current minimum route distance (if one has been found/set). If not, the algorithm keeps going, but if so, it skips choosing neighbors and returns directly to the parent caller.

### Fixing the Flaw

Fixing the flaw is surpsingly easy: we just add an if statement.

Illustrating first with the pseudocode:

explore(neighbors):

if(no more unvisited neighbors):
# This is the base case.
if total distance is less than current minimum:
save path and new minimum

else:
# This is the recursive case.
if current distance is greater than current minimum:
skip
else:
for neighbor in unvisited neighbors:
visit neighbor
explore(new_neighbors)
unvisit neighbor


In our Java implementation, the algorithm simply prints out solutions as it goes, then returns to the calling function whether a solution was found or not. Thus, we can "skip" a set of solutions by just returning to the calling function, using a return statement.

        if(nchoices == graphSize) {
//
// BASE CASE
//
if(this.this_distance < this.min_distance || this.min_distance < 0) {
// Solution base case:
this.min_distance = this.this_distance;
printSolution();
}

} else {
//
// RECURSIVE CASE
//

/*
* The following lines result in a huge computational cost savings.
*/
if(this.min_distance>0 && this.this_distance>this.min_distance) {
// Just give up already. It's meaningless. There's no point.
return;
}

// Everything else stays exactly the same
for(Node neighbor : neighbors) {
if(neighbor.visited == false) {

int distance_btwn = -10000;

for( Edge edge : graph.edgesConnecting(node, neighbor) ) {
distance_btwn = edge.value;
}

// Make a choice
this.route[nchoices] = neighbor.id;
neighbor.visit();
this.this_distance += distance_btwn;

// Explore the consequences
explore(neighbor,nchoices+1);

// Unmake the choice
this.route[nchoices] = -1;
neighbor.unvisit();
this.this_distance -= distance_btwn;
}
// Move on to the next choice (continue loop)
}
} // End base/recursive case
}


### The Pessimist Algorithm

This algorithm is dubbed The Pessimist Algorithm. Let's see how it works. Here is that new if statement:

if(this.min_distance>0 && this.this_distance>this.min_distance) {
// Just give up already. It's meaningless. There's no point.
return;
}


This if statement tests two conditions - first, we check if a first minimum distance has actually been found, and second, we check if the distance of the current path is greater than the minimum distance. If it is, we give up continuing our search down this path, and just return back to the calling function.

This introduces a small computational cost - we now have an if statement to check every time the explore() method is called - but it results in such significant cost savings that it does not matter.

## Timing Results

Shown below is a graph of the walltime for various problem sizes, showing both the original algorithm and the pessimist algorithm and their scaling behavior.

The pessimist algorithm led to a drastic improvement in scale-up - the results are striking.

And here are the results in a table form:

-----------------------------------------------------------------------------------------
| Number of Nodes N | Initial Algorithm Walltime [s] | Pessimist Algorithm Walltime [s] |
|-------------------|--------------------------------|----------------------------------|
| 4                 | 0.005                          | 0.006                            |
| 5                 | 0.006                          | 0.006                            |
| 6                 | 0.009                          | 0.008                            |
| 7                 | 0.017                          | 0.011                            |
| 8                 | 0.029                          | 0.020                            |
| 9                 | 0.083                          | 0.023                            |
| 10                | 0.305                          | 0.053                            |
| 11                | 1.443                          | 0.118                            |
| 12                | 15.808                         | 0.149                            |
| 13                | 180.078                        | 0.524                            |
| 14                |                                | 1.276                            |
| 15                |                                | 3.905                            |
| 16                |                                | 216.827                          |
| 17                |                                | 106.992                          |
| 18                |                                | 337.930                          |
-----------------------------------------------------------------------------------------


For a problem with 13 nodes, the initial algorithm took 3 minutes; the pessimist algorithm didn't even break the one second mark!

## Future Work

Now that we've got the algorithm running faster and more efficiently, we can tackle larger problems and explore the impact of problem topology on solutions, and we can rest assured we have an efficient algorithm that can scale to larger and more interesting problems.

There are further improvements we could make to the algorithm to improve it, though. By examining the solutions that are found, we can see that the solutions usually, but not always, connects from each neighbor to its next-closest neighbor. If, when iterating over neighbors, we start by searching the nearest neighbors first, we can potentially get to the minimum solution faster, which would allow us to more quickly rule out larger portions of the solution space that are infeasible.

This would induce an additional overhead cost of sorting, since the Guava library returns the edges that connect to a node as an unordered Set. These edges would have to be added to a container and sorted to implement the nearest-neighbor search.

However, we saw with the pessimist solution that a small increase in complexity can rule out large enough portions of the solution space to make it worthwhile, so it may be that the cost of sorting each edge pays off in the computational savings that result.

# Python vs. Perl: N Queens Problem

Posted in Computer Science

## Background

Revisiting the N queens problem, this time implementing the solution in Python.

Verb-oriented solution, functional, and based on Perl solution

More fair comparison - both are interpreted languages, not compiled languages

Compare Python and Perl, ease of implementation, speed, flexibility

## N Queens Problem

As a recap from the last post about the N queens problem, we're solving a chess puzzle that asks: how many different configurations are there for placing $$N$$ queens on an $$N \times N$$ chessboard such that no queen attacks any other queen?

This is a popular problem in computer science because solutions often implement recursive backtracking.

See the Wikipedia article for details.

## N Queens Solution

Here is the pseudocode of the N queens solution being implemented here:

explore(column):
if last column:
# base case
else:
# recursive case
for each row:
if this is a safe row:
place queen on this row
explore(column+1)
remove queen from this row


This solution implements recursive backtracking to explore choices of where to place each queen. It keeps solutions simple, and can be implemented using only primitive built-in data types. Solutions are stringified version of these arrays, consisting of 8 digits, so they are likewise very simple.

### Perl Solution

As a reminder, the Perl solution was originally from Rosetta Code. Here's another link to the Perl solution on Github.

Github gist: nqueens.pl

### Python Solution

The solution requires the use of one array of data that is fixed in size, which for a given column stores a list of rows already occupied by queens, and one array of data that is variable in size, which stores where each queen has been placed.

The Python solution makes use of lists by using a fixed-size list for the occupied rows and a variable size list for storing queen locations. It utilizes buit-in methods for the list data type to append and pop, or add and remove items from the end of the list.

Github gist: nqueens.py

----------------------------------------------------------
| NQueens | Nsolutions | Java      | Perl     | Python   |
|---------|------------|-----------|----------|----------|
| 8       | 92         | 0.003628  | 0.016    | 0.018    |
| 9       | 352        | 0.006709  | 0.067    | 0.077    |
| 10      | 724        | 0.017473  | 0.259    | 0.359    |
| 11      | 2680       | 0.061291  | 1.542    | 1.684    |
| 12      | 14200      | 0.240463  | 8.431    | 8.618    |
| 13      | 73712      | 1.113491  | 48.542   | 50.401   |
| 14      | 365596     | 6.557336  | 303.278  | 322.576  |
| 15      | 2279184    | 42.619426 | 2057.052 | 1979.343 |
----------------------------------------------------------


The results of this test show that Python and Perl are fairly closely matched.

## Perl Profiling

Results of profiling the Perl code with Devel::NYTProf were detailed in a prior post.

Here are those results once again, for the 11 queens problem:

# Profile data generated by Devel::NYTProf::Reader
# Version: v6.04
# Format: time,calls,time/call,code
0.000238,2,0.000119,use Time::HiRes qw(time);
0.000039,2,0.000019,use strict;
0.000491,2,0.000246,use warnings;
0.000021,1,0.000021,my $start = time; 0.010338,2680,0.000004,push @solutions, "@queens\n"; 0.009993,2680,0.000004,return; 0.186298,164246,0.000001,$#attacked = 2 * $board_size; 0.150338,164246,0.000001,for( 0 ..$#queens) {
0.675523,1.26035e+06,0.000001,$attacked[$ix2 ] = 1;
1.242624,164246,0.000008,for my $row (0 ..$board_size-1) {
0.267469,166925,0.000002,explore($depth+1); 0.125272,166925,0.000001,$occupied[$row] = 0; 0.000002,1,0.000002,explore(0); 0.000011,1,0.000011,my$duration = time - $start; 0.000075,1,0.000075,print "Found ", scalar(@solutions), " solutions\n"; 0.000050,1,0.000050,printf "Execution time: %0.3f s \n",$duration;


## Python Profiling

The Python N queens solution was profiled with two tools: cProfile and line_profiler.

The built-in profiling tool cProfile gives a summary of how much time was spent in each method, but nothing lower level than that. It is similar to Java profiling tools. (cProfile documentation)

The line_profiler tool is designed to profile Python code line-by-line, which gives a much more useful breakdown of where the code spent all of its time. It is also helpful because this can be compared one-to-one with the results from Perl, and we can get an equal basis for comparing the two languages.

## Python Profiling Results

### cProfile Results

Here are the results from running the N queens problem for N = 11 through cProfile:

**************************************
Profiling 11 queens problem with Python...
*******************************
cProfile:
*******************************
Found 2680 solutions
996197 function calls (829272 primitive calls) in 2.237 seconds

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
166926/1    1.845    0.000    2.237    2.237 /Volumes/noospace/Users/charles/codes/hello-world/python/nqueens/nqueens.py:12(explore)
328492    0.275    0.000    0.275    0.000 {range}
166925    0.062    0.000    0.062    0.000 {method 'pop' of 'list' objects}
169605    0.029    0.000    0.029    0.000 {method 'append' of 'list' objects}
164247    0.026    0.000    0.026    0.000 {len}
1    0.000    0.000    2.237    2.237 /Volumes/noospace/Users/charles/codes/hello-world/python/nqueens/nqueens.py:4(<module>)
1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


### line_profiler Results

The line_profiler tool gives a more detailed picture of the code and where it spends its time, breaking down profiling information line-by-line. This tool can be installed with pip: pip install line_profiler. The (INSERT LINK)(nqueens repository on github) has a file that demonstrates how to use this tool. See Python/Profiling

Here are the results from the line_profiler tool run on the same (11 queens) problem:

**************************************
Profiling 11 queens problem with Python...
Found 2680 solutions
Wrote profile results to nqueens.py.lprof
Timer unit: 1e-06 s

Total time: 14.2258 s
File: nqueens.py
Function: explore at line 11

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
11                                           @profile
12                                           def explore(depth):
13                                               # base case
14    166926       187573      1.1      1.3      if(depth==board_size):
15                                                   # stringify/serialize the solution
16      2680        31497     11.8      0.2          solutions.append("%s"%(queens))
17      2680         3117      1.2      0.0          return
18
19                                               else:
20    164246       384688      2.3      2.7          attacked = 2*board_size*[0,]
21   1424595      1690693      1.2     11.9          for i in range(0,len(queens)):
22   1260349      1471304      1.2     10.3              ix1 = queens[i] + depth - i
23   1260349      1405141      1.1      9.9              attacked[ix1] = 1
24
25   1260349      1494095      1.2     10.5              ix2 = queens[i] - depth + i
26   1260349      1392563      1.1      9.8              attacked[ix2] = 1
27
28   1970952      2229922      1.1     15.7          for row in range(0,board_size):
29   1806706      2031139      1.1     14.3              if(occupied[row] or attacked[row]):
30    379432       374466      1.0      2.6                  continue
31
32                                                       # make a choice
33    166925       241114      1.4      1.7              queens.append(row)
34    166925       186833      1.1      1.3              occupied[row] = 1
35
36                                                       # explore the consequences
37    166925       610396      3.7      4.3              explore(depth+1)
38
39                                                       # unmake the choice
40    166925       288667      1.7      2.0              queens.pop()
41    166925       202555      1.2      1.4              occupied[row] = 0


## Python vs Perl: Walltime vs. Number of Solutions Tested

As with the prior post, I verified that both codes were testing the same number of solutions. Here is that table of the number of solutions for each value of N, together with the number of solutions tested:

-----------------------------------------------------------------------------
| NQueens | Nsolutions | NsolutionsTested | Java      | Perl     | Python   |
|---------|------------|------------------|-----------|----------|----------|
| 8       | 92         | 1965             | 0.003628  | 0.016    | 0.018    |
| 9       | 352        | 8042             | 0.006709  | 0.067    | 0.077    |
| 10      | 724        | 34815            | 0.017473  | 0.259    | 0.359    |
| 11      | 2680       | 164246           | 0.061291  | 1.542    | 1.684    |
| 12      | 14200      | 841989           | 0.240463  | 8.431    | 8.618    |
| 13      | 73712      | 4601178          | 1.113491  | 48.542   | 50.401   |
| 14      | 365596     | 26992957         | 6.557336  | 303.278  | 322.576  |
| 15      | 2279184    | 168849888        | 42.619426 | 2057.052 | 1979.343 |
-----------------------------------------------------------------------------


## The Winner: Perl for Small Problems, Python for Big Ones

It was not a big surprise to see that Perl and Python were nearly identical in their performance, and it testament to the fact that interpreted scripting languages like Perl and Python operate at one speed, and compiled code in C++ or Java operates at a completely different speed that is an order of magnitude faster (see the comparison of Perl and Java in a prior blog post).

Perl and Python were close enough in performance that, although Perl came out ahead on smaller problems and Python came out ahead on the biggest, a different CPU or platform, micro-optimizations, and various butterfly effects could easily turn the tables.

## Sources

1. "N-Queens Problem". Rosetta Code, GNU Free Documentation License. Edited 6 March 2017. Accessed 21 March 2017. <https://web.archive.org/web/20170320081421/http://rosettacode.org/wiki/N-queens_problem>

2. "nqueens.py". Charles Reid. Github Gist, Github Inc. Edited 25 March 2017. Accessed 25 March 2017. <https://gist.github.com/charlesreid1/1a2ecb3a83284290d4a9daf747d0d7e4>

3. "nqueens.pl". Charles Reid. Github Gist, Github Inc. Edited 20 March 2017. Accessed 23 March 2017. <https://gist.github.com/charlesreid1/4ce97a5f896ff1c89855a5d038d51535>

4. "The Python Profilers". Python 2.7.13 Documentation, Python Software Foundation. Updated 20 February 2017. Accessed 23 March 2017. <https://docs.python.org/2/library/profile.html>

5. "line_profiler". Python Package Index, Python Software Foundation. Updated 20 October 2016. Accessed 23 March 2017. <https://pypi.python.org/pypi/line_profiler/>

6. "Github - rkern/line_profiler". rkern, Github Repository, Github Inc. Updated 20 October 2016. Accessed 23 March 2017. <https://github.com/rkern/line_profiler>

7. "Python/Profiling". Charlesreid1.com wiki, Charles Reid. Edited 23 March 2017. Accessed 23 March 2017. <https://web.archive.org/web/20170326031708/http://charlesreid1.com/wiki/Python/Profiling>

# Solving the Traveling Salesperson Problem with Java and Guava

Posted in Java

## Background: Traveling Salesperson Problem (TSP)

The traveling salesperson problem, or TSP, is a classic programming problem and an important one in computer science, and applications in operations research and optimization.

The idea is that you have a set of $$N$$ cities, connected by various roads, each with their own distances. That is, we have a set of $$E$$ roads, each with their own distance $$d_j, j=1 \dots E$$.

The question is, what is the shortest path that a salesperson can take to visit all $$N$$ cities, traveling the shortest possible total distance and visiting each city once and only once?

Like the N queens problem, the traveling salesperson problem is a good candidate for recursive backtracking. Also like the N queens problem, there are certain shortcuts we can take to trim down the possibilities we explore.

Computer science pages on Wikipedia are generally pretty high in quality, and the traveling salesman problem page is no exception. It give a very thorough overview of the important aspects of the problem.

## Graphs

Graphs are mathematical objects first utilized by Leonhard Euler to solve the Seven Bridges of Köningsberg problem. The concept is simple: you have a bunch of dots connected with lines.

The dots are called nodes, and the lines are called edges.

Graphs can be directed, meaning the edges are like arrows with particular directions, or undirected, meaning the edges simply represent a connection between the two nodes.

Here is an example of a graph with five nodes, with each edge labeled with its distance:

We will skip over a vast amount of detail about graph theory that is both fascinating and useful, but M. E. J. Newman's paper "The structure and function of complex networks" (self-published and written as a course review) is an extremely detailed and academically rigorous overview of just about every important detail of the field.

### Number of Edges

The maximum number of roads or edges $$E$$ depends on the number of nodes as $$E = \dfrac{N(N-1)}{2}$$, which is derived from the formula for 2 choose N (because edges connect 2 nodes). For k choose N, the formula is given by:

$$C_{N,k} = \dfrac{N!}{k! (N-k)!}$$

and for 2 choose N, it is given by:

$$C_{N,2} = \dfrac{N(N-1)}{2}$$

This is the maximum number of undirected edges in a graph. If the graph is directed, then order matters (the edgge A->B is no longer the same as B->A), so we have to use the formula for k pick N,

$$P_{N,k} = \dfrac{N!}{(N-k)!}$$

which results in

$$P_{N,2} = N (N-1)$$

possible edges.

### Number of Solutions

Naturally, the question of the total solution space arises. Assuming the graph of cities is perfectly connected (representing an upper limit on problem complexity), how does the number of solutions change as the number of nodes increases?

We can start with a trivial graph, and count the number of possible paths through the entire graph, starting with a specific node. This is equivalent to counting permutations of a string that start with a specific character.

ABCDE
ABCED
ABDCE
ABDEC
ABECD
ABEDC


For a string of length $$N$$, the string has $$(N-1)!$$ possible permutations that start with a specific character. Therefore, as the number of nodes $$N$$ increases, the number of possible solutions increases as $$(N-1)!$$, making the complexity class of the problem $$O(N!)$$.

## Solution: Recursive Backtracking

Here is pseudocode for a recursive backtracking method:

explore(neighbors):

if(no more unvisited neighbors):
# This is the base case.
if total distance is less than current minimum:
save path and new minimum

else:
# This is the recursive case.
for neighbor in unvisited neighbors:
visit neighbor
explore(new_neighbors)
unvisit neighbor


Care is needed to prevent infinite loops in which the traveling salesperson goes back and forth between two cities. As we traverse the graph, we can mark each node as visited, to ensure we don't revisit nodes and go in circles.

Nodes can be implemented as a Node object in Java, with each Node having a few characteristics:

• String label
• Container of Node pointers pointing to neighbors
• Boolean flag: have we visited this node already?

Likewise, the graph edges can be represented using integers or doubles.

## Solving the TSP with Java and Guava

Google Guava is a library of high-performance data containers in Java. The library provides some useful graph objects that we can use to easily solve the TSP on a graph.

### Basics of Guava

Install and use Guava by visiting the Guava project on Github, find the page for their latest release (here is version 21.0), and getting the latest .jar file.

To compile with the jar file, you can either utilize an IDE like Eclipse or IntelliJ, or you can compile from the command line, specifying the class path using the -cp flag:

$javac -cp '.:/path/to/guava/jars/guava-21.0.jar' TSP.java$ java -cp '.:/path/to/guava/jars/guava-21.0.jar' TSP


### Guava Graphs

Graph objects in Guava are implemented using a set of objects: Graphs, ValueGraphs, and Networks.

Graph objects treat edges as very simple and assumes they contain no information and simply link nodes.

ValueGraphs associate a single non-unique value with each edge on the graph. This can also be used to solve the TSP.

Network objects treat nodes and edges both as objects, and has the ability to deal with more complex edges: model multi-edges, repeated edges, directed edges, etc.

We will use a Network object and design our own graph Node and Edge objects.

Link to Guava wiki on how to build graph instances

Guava API Documentation: Network

Guava API Documentation: Graph

Guava API Documentation: ValueGraph

### Guava Mutable vs Immutable

Guava makes a distinction between mutable graphs, which can be modified, and immutable graphs, which cannot. Immutability provides some safety and assurances to programmers, and can make things faster.

When we construct the network, we need a mutable graph to modify (add nodes and edges). But once the network is constructed, it is finished: we don't need to modify the network while we're solving the problem.

Therefore, we construct a mutable network, assemble the graph for the given problem, and copy it into an immutable graph. We then use the immutable graph to access the graph while solving.

### Importing Guava

Starting with import statements, we'll use a couple of objects from the Java API, and from Google's Guava library:

import java.util.Set;
import java.util.Map;
import java.util.TreeMap;
import java.util.Arrays;



import java.util.*;


### TSP Class

Let's lay out the TSP class definition. This class is simple, and wraps a few pieces of data: the current route, the current distance, and the minimum distance. Note that we could also save the solution in a container, instead of printing it, by defining a static class to hold solutions, but we'll keep it simple.

/** This class solves the traveling salesman problem on a graph. */
class TSP {
// The actual graph of cities
ImmutableNetwork<Node,Edge> graph;
int graphSize;

// Storage variables used when searching for a solution
String[] route;         // store the route
double this_distance;   // store the total distance
double min_distance;    // store the shortest path found so far

/** Defaut constructor generates the graph and initializes storage variables */
public TSP() {
// TODO
}

/** This method actually constructs the graph. */
public ImmutableNetwork<Node,Edge> buildGraph() {
// TODO
}

/** Public solve method will call the recursive backtracking method to search for solutions on the graph */
public void solve() {
// TODO
}

/** Recursive backtracking method: explore possible solutions starting at this node, having made nchoices */
public void explore(Node node, int nchoices) {
// TODO
}

/** Print out solution */
public void printSolution() {
// TODO
}

/** Print out failed path */
public void printFailure() {
// TODO
}


### Node Class (Cities)

Now we can define the Node class to represent cities on the graph.

class Node {
public String label;
public boolean visited; // Helps us to keep track of where we've been on the graph
public Node(String name){
this.label = name;
this.visited = false;
}
public void visit(){
this.visited = true;
}
public void unvisit() {
this.visited = false;
}
}


Like a lined list node, we want to keep graph nodes simple. Note that Nodes don't need to store information about their neighbors. That's what we'll use Google Guava for!

Edge classes are even simpler, wrapping a single integer:

class Edge {
public int value;
public String left, right; // For convenience in construction process. Not necessary.
public Edge(String left, String right, int value) {
this.left = left;
this.right = right;
this.value = value;
}
}


Note that left and right are used for convenience only during the graph construction process. Like the nodes, the edges don't need to know who their neighbors are, since that's what the Google Guava graph object will take care of.

## TSP Constructor and Building the Graph

### Constructor

The TSP class constructor should do a few things:

• Construct a graph, with a given set of cities and distances.
• Initialize arrays and cumulative variables that will be used by the backtracking method.

The actual graph construction process is put into another function called buildGraph(), so really the constructor just calls a function and then does #2.

    /** Defaut constructor generates the graph and initializes storage variables */
public TSP() {
// Build the graph
this.graph = buildGraph();
this.graphSize = this.graph.nodes().size();

// Initialize route variable, shared across recursive method instances
this.route = new String[this.graphSize];

// Initialize distance variable, shared across recursive method instances
this.this_distance = 0.0;
this.min_distance = -1.0; // negative min means uninitialized
}


### Build Graph Method

Now we actually use Guava's Immutable Network object, which takes two templated types, T1 and T2, which correspond to the node types and the edge types.

We use a NetworkBuilder object to build the Network (an example of the factory template). This returns a MutableNetwork of Node and Edge objects, which we can then connect up using some built-in methods.

Here are some built-in methods available for a MutableNetwork:

    addEdge(node1, node2, edge)
removeEdge(edge)
removeNode(node)


Now here is the construction of the graph, using the Google Guava library. There are two loops here: one for cities, and one for edges. In the loop over each city,we create a new node and add it to the graph. To be able to easily retrieve the Nodes we have created, we also store references to the nodes in a map called all_nodes.

When we construct edges, we use the map of all nodes all_nodes to get references to the Node objects that correspond to a label. That way, if an edge connects "A" with "B" at a distance of 24, we can turn "A" and "B" into references to the Node objects A and B.

    /** This method actually constructs the graph. */
public ImmutableNetwork<Node,Edge> buildGraph() {

// MutableNetwork is an interface requiring a type for nodes and a type for edges

// Construct Nodes for cities,
// and add them to a map
String[] cities = {"A","B","C","D","E"};
Map<String,Node> all_nodes = new TreeMap<String,Node>();
for(int i = 0; i<cities.length; i++) {
Node node = new Node(cities[i]);
all_nodes.put(cities[i], node);

}

// and add them to a map
String[] distances = {"A:B:24","A:C:5","A:D:20","A:E:18","B:C:10","B:D:20","C:D:4","C:E:28","D:E:3"};
Map<String,Edge> all_edges = new TreeMap<String,Edge>();
for(int j = 0; j<distances.length; j++) {
// Parse out (city1):(city2):(distance)
String[] splitresult = distances[j].split(":");
String left = splitresult[0];
String right = splitresult[1];
String label = left + ":" + right;
int value = Integer.parseInt(splitresult[2]);

Edge edge = new Edge(left, right, value);
all_edges.put(label, edge);

}

// Freeze the network

}


## Solving and Exploring with Recursive Backtracking

### Solve Method

The structure of some recursive backtracking problems is to create a public and a private interface, with the public interface taking no parameters or a single parameter that the user will know, and the private method taking a parameter specific to the implementation. That's the pattern we use here.

The solve method sets up the problem by picking a starting node (in this case, an arbitrary starting node). It then gets a reference to that node on the graph, and calls the recursive explore() method, which begins the recursive backtracking method.

    /** Public solve method will call the recursive backtracking method to search for solutions on the graph */
public void solve() {
/** To solve the traveling salesman problem:
* Set up the graph, choose a starting node, then call the recursive backtracking method and pass it the starting node.
*/

// We need to pass a starting node to recursive backtracking method
Node startNode = null;

// Grab a node, any node...
for( Node n : graph.nodes() ) {
startNode = n;
break;
}

// Visit the first node
startNode.visit();

// Add first node to the route
this.route[0] = startNode.label;

// Pass the number of choices made
int nchoices = 1;

// Recursive backtracking
explore(startNode, nchoices);
}


### Explore (Backtrack) Method

And now, on to the recursive backtracking method.

The method takes as a parameter which node we are currently on and the number of cities we have visited. As multiple explore methods choose different paths, they pass references to different node objects in the graph, and they pass different values of nchoices.

The methods, when they do not encounter a solution, will choose a next node and call the explore method on it. Each instance of the explore method marks nodes as visited or unvisited on the same shared graph object. This allows instances of the function to share information about their choices with other instances of the function.

All recursive methods must consist of a base case and a recursive case:

Base case: - We've visited as many cities as are on the graph. - Check if this is a new solution (distance less than the current minimum).

Recursive case: - Make a choice (mark node as visited, add city to route). - Explore the consequences (recursive call). - Unmake the choice (mark node as unvisited, remove city from route). - Move on to next choice.

    /** Recursive backtracking method: explore possible solutions starting at this node, having made nchoices */
public void explore(Node node, int nchoices) {
/**
* Solution strategy: recursive backtracking.
*/

if(nchoices == graphSize) {
//
// BASE CASE
//
if(this.this_distance < this.min_distance || this.min_distance < 0) {
// if this_distance < min_distance, this is our new minimum distance
// if min_distance < 0, this is our first minimium distance
this.min_distance = this.this_distance;
printSolution();
} else {
printFailure();
}

} else {
//
// RECURSIVE CASE
//
for(Node neighbor : neighbors) {
if(neighbor.visited == false) {

int distance_btwn;

for( Edge edge : graph.edgesConnecting(node, neighbor) ) {
distance_btwn = edge.value;
}

// Make a choice
this.route[nchoices] = neighbor.label;
neighbor.visit();
this.this_distance += distance_btwn;

// Explore the consequences
explore(neighbor,nchoices+1);

// Unmake the choice
this.route[nchoices] = null;
neighbor.unvisit();
this.this_distance -= distance_btwn;
}
// Move on to the next choice (continue loop)
}
} // End base/recursive case
}


## Main and Utility Methods

Last but not least, add the method that actually calls the TSP object's solve method, and define what to do when we encounter a new solution. This program just prints out new solutions as they are found, but you could also add them to a map (map routes to distances), or quietly keep track of the shortest path and not print it until the end.

class TSP {
public static void main(String[] args) {
TSP t = new TSP();
t.solve();
}

...


Additionally, we may want to perform a certain action when we find a new minimum distance. Note that this method may be called multiple times during the solution procedure.

    /** Print out solution */
public void printSolution() {
System.out.print("@@@@@@@@@@\tNEW SOLUTION\t");
System.out.println("Route: "+Arrays.toString(this.route)
+"\tDistance: "+this.min_distance);
}

/** Do nothing with failed path */
public void printFailure() {
// Nope
}


## Program Output

### Initial Graph Structure and Solution

In the construction of the graph, we defined our graph as:

String[] distances = {"A:B:24","A:C:5","A:D:20","A:E:18","B:C:10","B:D:20","C:D:4","C:E:28","D:E:3"};


This is the graph that we're solving the TSP problem on. Here are the results when the program is compiled and run:

$javac -cp '.:/Users/charles/codes/guava/jars/guava-21.0.jar' TSP.java$ java -cp '.:/Users/charles/codes/guava/jars/guava-21.0.jar' TSP

@@@@@@@@@@  NEW SOLUTION    Route: [A, B, C, D, E]  Distance: 41.0
@@@@@@@@@@  NEW SOLUTION    Route: [A, C, B, D, E]  Distance: 38.0
@@@@@@@@@@  NEW SOLUTION    Route: [A, E, D, C, B]  Distance: 35.0


The answers given were satisfactory and correct, so we moved on to a more advanced graph construction process that utilized a static class to generate random, fully-connected graphs. This also implemented additional functionality to export to Dot format. This static RandomGraph class will be covered in later post.

Here is the resulting output of the random graph generator for a 6-node TSP problem, with the sequence of shortest routes found by the algorithm:

plain java -cp '.:/Users/charles/codes/guava/jars/guava-21.0.jar' TSP 6 ------------------- TSP ---------------------- !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 3, 2, 4, 5] Distance: 291.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 3, 5, 4, 2] Distance: 249.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 2, 4, 3, 5] Distance: 246.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 5, 3, 4, 2] Distance: 203.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 3, 5, 1, 4, 2] Distance: 178.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 2, 4, 1, 5, 3] Distance: 163.0 Done.

And for 12 nodes, a problem twice that size, here is the graph and corresponding output:

plain

Running TSP with 12 nodes with Java...

# Java times itself, we just have to pass it the size

java -cp '.:/Users/charles/codes/guava/jars/guava-21.0.jar' TSP 12 ------------------- TSP Version 2: The Pessimistic Algorithm ---------------------- !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 7, 4, 2, 9, 10, 5, 3, 6, 8, 11] Distance: 585.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 7, 4, 2, 9, 10, 5, 3, 8, 11, 6] Distance: 558.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 7, 4, 2, 9, 10, 5, 3, 8, 6, 11] Distance: 522.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 7, 4, 2, 9, 10, 6, 5, 3, 8, 11] Distance: 499.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 7, 4, 2, 9, 10, 11, 8, 6, 5, 3] Distance: 460.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 7, 4, 2, 9, 10, 11, 6, 8, 3, 5] Distance: 459.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 7, 4, 2, 9, 11, 8, 10, 6, 5, 3] Distance: 449.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 7, 4, 2, 9, 11, 10, 6, 8, 3, 5] Distance: 419.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 7, 4, 2, 9, 11, 10, 8, 6, 5, 3] Distance: 408.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 7, 4, 2, 9, 3, 5, 6, 8, 10, 11] Distance: 402.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 7, 4, 2, 3, 9, 11, 10, 8, 6, 5] Distance: 385.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 7, 4, 11, 10, 8, 6, 2, 9, 3, 5] Distance: 377.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 7, 4, 5, 3, 9, 11, 10, 8, 6, 2] Distance: 374.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 7, 4, 5, 6, 8, 10, 11, 9, 3, 2] Distance: 371.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 7, 9, 3, 5, 4, 11, 10, 8, 6, 2] Distance: 363.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 7, 9, 3, 2, 6, 8, 10, 11, 4, 5] Distance: 352.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 7, 3, 5, 4, 11, 9, 2, 6, 8, 10] Distance: 350.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 7, 3, 2, 9, 11, 4, 5, 6, 8, 10] Distance: 347.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 7, 3, 9, 2, 4, 11, 10, 8, 6, 5] Distance: 345.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 7, 3, 9, 2, 6, 8, 10, 11, 4, 5] Distance: 323.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 7, 5, 3, 9, 2, 6, 8, 10, 11, 4] Distance: 322.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 7, 5, 4, 2, 3, 9, 11, 10, 8, 6] Distance: 316.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 7, 5, 4, 2, 6, 8, 10, 11, 9, 3] Distance: 313.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 7, 5, 4, 11, 9, 3, 2, 6, 8, 10] Distance: 306.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 7, 5, 4, 11, 10, 8, 6, 2, 9, 3] Distance: 302.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 10, 11, 4, 5, 7, 3, 9, 2, 6, 8] Distance: 289.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 10, 8, 6, 2, 9, 3, 7, 5, 4, 11] Distance: 286.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 10, 8, 6, 2, 3, 9, 11, 4, 5, 7] Distance: 283.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 8, 6, 10, 11, 4, 2, 9, 3, 7, 5] Distance: 274.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 8, 6, 10, 11, 4, 5, 7, 3, 9, 2] Distance: 260.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 4, 2, 6, 8, 10, 11, 9, 3, 7, 5] Distance: 259.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 4, 11, 10, 8, 6, 5, 7, 3, 9, 2] Distance: 256.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 4, 11, 10, 8, 6, 2, 9, 3, 7, 5] Distance: 248.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 1, 4, 5, 7, 3, 9, 11, 10, 8, 6, 2] Distance: 245.0 !!!!!YAY!!!!!! NEW SOLUTION Route: [0, 7, 5, 4, 1, 8, 6, 10, 11, 9, 3, 2] Distance: 236.0 Done. 

Note that these random graphs are fully connected - every node connects to every other node. Among the many interesting aspects of the problem, one of them is the impact of connectivity on the solution time.

Another aspect of the problem is the topology of the network, and exploring how a larger number of unconnected nodes, or groups of nodes clustered together but isolated from one another, affect the final solution time and the final route.

But before get to that, we have some other things to work out.

## Next Steps: Timing and Profiling

This post described a working implementation of a recursive backtracking solution to the traveling salesperson problem on a graph. This is a naive solution, however, and in the next few posts about the traveling salesperson problem we'll focus on using timing and profiling tools for Java to profile this Guava program, identify bottlenecks, and speed it up. In fact, there is one small tweak we can make to the algorithm covered above that will improve the performance by orders of magnitude. But more on that in a future post.

## Sources

1. "Traveling Salesman Problem". Wikipedia, the Free Encyclopedia. The Wikimedia Foundation. Edited 12 March 2017. Accessed 23 March 2017. <https://en.wikipedia.org/wiki/Travelling_salesman_problem>

2. "Seven Bridges of Königsberg". Wikipedia, the Free Encyclopedia. The Wikimedia Foundation. Edited 11 March 2017. Accessed 23 March 2017. <https://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsberg>

3. "The structure and function of complex networks". M. E. J. Newman. <http://www-personal.umich.edu/~mejn/courses/2004/cscs535/review.pdf>

4. "Google Guava - Github". Alphabet, Inc. Released under the Apache 2.0 License. Accessed 20 March 2017. <https://github.com/google/guava>

5. "Guava". Charles Reid. Edited 23 March 2017. Accessed 23 March 2017.