# D3 Calendar Visualizations

Posted in Javascript

# Starting example

Let's begin with a D3 example. Mike Bostock provided a Calendar View block illustrating how to draw a very interesting visualization of large amounts of data over time:

You might recognize this type of graph from Github, whose activity graph shows the same visualization.

The data shown in this example consists of several years of stock market data. It is a simple but very large data set, with each data poit consisting of one date and one number (the percentage gain or loss).

The example also shows how to perform a simple calculation from multiple fields of the data to plot a derived quantity. In this case, the data consists of a high, low, and close, and the quantity being plotted is the percent change:

$$\mbox{% Change} = \dfrac{\mbox{Close} - \mbox{Open} }{\mbox{Open}}$$

# What needs to be changed

To change this calendar visualization to visualize our own data, we need to change two things:

• The data set being visualized
• The color map being used

We can leave the rest alone, or make small modifications as needed. Fortunately, these changes are straightforward to make for the calendar visualization.

# Formatting the data

To modify the calendar graph for our own data, we'll output data as a time series: one column of date/time stamps, and another column of data to plot.

Let's take a look at the original data:

Date,Open,High,Low,Close,Volume,Adj Close
2010-10-01,10789.72,10907.41,10759.14,10829.68,4298910000,10829.68
2010-09-30,10835.96,10960.99,10732.27,10788.05,4284160000,10788.05
2010-09-29,10857.98,10901.96,10759.75,10835.28,3990280000,10835.28
2010-09-28,10809.85,10905.44,10714.03,10858.14,4025840000,10858.14
2010-09-27,10860.03,10902.52,10776.44,10812.04,3587860000,10812.04


In the code, we can see where this data is actually being parsed:

  var data = d3.nest()
.key(function(d) { return d.Date; })
.rollup(function(d) { return (d[0].Close - d[0].Open) / d[0].Open; })
.object(csv);


So, to modify this to suit our own custom data set, we can output our data as:

date,series1series2
2010-10-01,1,150
2010-10-02,2,250
2010-10-03,3,350


and change the data parsing code to:

  var data = d3.nest()
.key(function(d) { return d.date; })
.rollup(function(d) {
// Change this depending on what you want to plot
return d[0].series1;
})
.object(csv);


Next, we discuss a few interesting applications of this visualization technique and how to generate the data sets.

## MediaWiki Edits

One of the applications of interest was scraping a MediaWiki wiki (charlesreid1.com/wiki to be precise) to determine the number of edits made to the wiki on a given date.

Fortunately, MediaWiki provides a rich API for interacting with wikis programmatically, and one of the best packages for doing it is pywikibot.

The way we compiled the data set for visualization was to scrape page histories for every page on the wiki, creating one observation for each edit on each page, and agglomerate the edits for each day into a final count.

The schema used was:

• _id - sha1 of text
• title - title of article
• timestamp - timestamp of edit
• count - number of characters in edit

The pseudocode used to extract the page edits was:

    get pages generator
for page in pages:
get page revisions generator
for revision in page revisions:
drop old doc from database
insert new doc into database
update record


Finally, a bit of pywikibot code:

    # Get the site
site = get_site()

# Get the iterator returning pages to process
page_generator = get_page_generator(site, N)

# Run the algorithm:
for page in page_generator:

page_title = page.title()

print("Now parsing page: %s"%(page_title))

rev_generator = page.revisions(content=count_chars)

for rev in rev_generator:

# Assemble the NoSQL document
doc = {}
doc['_id'] = rev.sha1
doc['title'] = page_title
doc['timestamp'] = rev.timestamp
doc['count'] = len(rev.text)

# Insert the new NoSQL document
page_history_collection.insert_one(doc)


The mechanisms to obtain the page generator

def get_site():
"""Get the Site object representing charlesreid1.com
"""
return pywikibot.Site()

def get_page_generator(s,max_items=0):
"""Get the generator that returns the Page objects
that we're interested in, from Site s.
"""
page_generator = s.allpages()
if(max_items>0):
page_generator.set_maximum_items(max_items)
return page_generator


## Git Commits

Another application of these types of calendars comes directly from Github's visualization of the number of commits made by a user on each day.

To extract this information, you will need a folder full of git repositories, which will allow you to use git status to extract commit information from the log of each repository and assemble it all into a time series for a calendar.

While there is a Python package for interfacing with the git API, git itself is extremely powerful and is capable of doing this just fine. We use Python's subprocess library to make a call to git status, and parse the results into a data structure for exporting to CSV.

Here is the code that was used to walk through each directory and extract information from a git status command:

import subprocess
from glob import glob
import os, re
import pandas as pd
import datetime

[clipped]

df = pd.DataFrame()
orgs = glob(root_dir+"/repositories/*")
for org in orgs:
base_org = os.path.basename(org)
repos = glob(org+"/*")
for repo in repos:

# Print out the org and repo name
base_repo = re.sub('.git','', os.path.basename(repo))
log_file = base_org + "." + base_repo + ".log"
print("%s : %s"%(base_org,base_repo))

# Get each commit
with open(status_dir + "/" + log_file, 'r', encoding="ISO-8859-1") as f:

for line in lines:
tokens = line.split(" ")
commit_id = tokens[0]
date = tokens[1]
time = tokens[2]
msg = tokens[4:]

df = df.append(
dict(
commit_id = tokens[0],
date = tokens[1],
time = tokens[2],
commits = 1,
msg = " ".join(msg)
),
ignore_index=True
)

ag = df.groupby(['date']).agg({'commits':sum})
ag['commits'] = ag['commits'].apply(int)
ag.to_csv('commit_counts.csv')


The last bit of code groups each commit by date, applying the sum function to the number of commits (1 for each commit), to yield the total number of commits for each date:

date,commits
2014-01-17,2
2014-03-26,11
2014-03-28,3
2014-04-01,4
2014-04-02,10
2014-04-03,4
2014-04-04,3


# Creating the color map

The best part of the process is picking a color map for the calendar. The ColorBrewer site has some good color palettes inspired by cartographic color needs. Python also provides some useful libraries and functionality for generating colormaps.

There are a number of options:

Here, we'll cover an alternative approach: defining a colormap that linearly interpolates between colors at particular locations on the interval 0 to 1.

We will also use the webcolors module in Python to convert between colors in various formats, and a function make_cmap() available from Chris Slocum:

Link to make_cmap.py

make_cmap.py:

def make_cmap(colors, position=None, bit=False):
'''
make_cmap takes a list of tuples which contain RGB values. The RGB
values may either be in 8-bit [0 to 255] (in which bit must be set to
True when called) or arithmetic [0 to 1] (default). make_cmap returns
a cmap with equally spaced colors.
Arrange your tuples so that the first color is the lowest value for the
colorbar and the last is the highest.
position contains values from 0 to 1 to dictate the location of each color.
'''
import matplotlib as mpl
import numpy as np
bit_rgb = np.linspace(0,1,256)
if position == None:
position = np.linspace(0,1,len(colors))
else:
if len(position) != len(colors):
sys.exit("position length must be the same as colors")
elif position[0] != 0 or position[-1] != 1:
if bit:
for i in range(len(colors)):
colors[i] = (bit_rgb[colors[i][0]],
bit_rgb[colors[i][1]],
bit_rgb[colors[i][2]])
cdict = {'red':[], 'green':[], 'blue':[]}
for pos, color in zip(position, colors):
cdict['red'].append((pos, color[0], color[0]))
cdict['green'].append((pos, color[1], color[1]))
cdict['blue'].append((pos, color[2], color[2]))

cmap = mpl.colors.LinearSegmentedColormap('my_colormap',cdict,256)
return cmap


Now, an example of how we can call this function: this code creates a colormap ranging from purple to orange.

def purple_to_orange():
"""Dark purple to bright orange."""

start_hex = "#4d2b4b"
mid1_hex  = "#8c6bb1"
mid2_hex  = "#fdae6b"
end_hex   = "#f16913"

start_color = [j/255 for j in webcolors.hex_to_rgb(start_hex)]
mid1_color   = [j/255 for j in webcolors.hex_to_rgb(mid1_hex)]
mid2_color   = [j/255 for j in webcolors.hex_to_rgb(mid2_hex)]
end_color   = [j/255 for j in webcolors.hex_to_rgb(end_hex)]

colors = [start_color, mid1_color, mid2_color, end_color]
position = [0, 0.5, 0.6, 1]
cm = make_cmap(colors, position=position)


Now, the following code will evaluate the colormap to create 9 separate hex colors:

    # Now just call cm(0.0) thru cm(1.0)
N = 9
hex_colorz = []
for i in range(N+1):
x = i/N
rgbd_color = cm(x)[0:3]
rgb_color = [int(c*255) for c in rgbd_color]
hex_color = webcolors.rgb_to_hex(rgb_color)
hex_colorz.append(hex_color)

print(hex_colorz)


# How the calendar code works

<!DOCTYPE html>
<body style="background: #272b30;">
<div id="calendar"></div>
<script src="https://d3js.org/d3.v4.min.js"></script>
<script>
/* D3 code goes here */
</script>
</body>


Next, the Javascript code that creates the calendar visualization. We'll walk through each part.

var width = 700,
height = 90,
cellSize = 12;

// big integers
var formatStuff = d3.format(",");

/*
TEH COLORRRZZZZ
*/
var realBackgroundColor = "#272b30";
var tileBackgroundColor = realBackgroundColor;//"#3a3a3a";
var tileStrokeColor     = "#3a3a3a";
var monthStrokeColor    = "#4a4a4a";

var color = d3.scaleQuantize()
.domain([0, 60])
.range(["#4d2b4b","#5a3961","#684777","#77558f","#8463a5","#cc9189","#fba25c","#f78e43","#f47b2b","#f16913"]);
// purple orange


The canvas goes on the div tag with id calendar:

/*
Make the canvas
*/
var svg = d3.select("div#calendar")
.selectAll("svg")
.data(d3.range(2010, 2019).reverse())
.enter().append("svg")
.attr("width", width)
.attr("height", height)
.append("g")
.attr("transform", "translate(" + ((width - cellSize * 53) / 2) + "," + (height - cellSize * 7 - 1) + ")");

/*
Write the years
*/
svg.append("text")
.attr("transform", "translate(-6," + cellSize * 3.5 + ")rotate(-90)")
.attr("font-family", "sans-serif")
.attr("font-size", 10)
.attr("fill", "#bbb")
.attr("text-anchor", "middle")
.text(function(d) { return d; });


The next two portions are the meat of the calendar visualization, drawing the tiles and outlines:

/*
Draw the tiles representing days of the year
(also draw tile outlines)
*/
var rect = svg.append("g")
.attr("fill",   tileBackgroundColor)
.attr("stroke", tileStrokeColor)
.selectAll("rect")
.data(function(d) { return d3.timeDays(new Date(d, 0, 1), new Date(d + 1, 0, 1)); })
.enter().append("rect")
.attr("width", cellSize)
.attr("height", cellSize)
.attr("x", function(d) { return d3.timeWeek.count(d3.timeYear(d), d) * cellSize; })
.attr("y", function(d) { return d.getDay() * cellSize; })
.datum(d3.timeFormat("%Y-%m-%d"));

/*
Draw outlines of groups representing months
*/
svg.append("g")
.attr("fill", "none")
.attr("stroke", monthStrokeColor)
.selectAll("path")
.data(function(d) { return d3.timeMonths(new Date(d, 0, 1), new Date(d + 1, 0, 1)); })
.enter().append("path")
.attr("d", pathMonth);


Now, the code that loads the data, filters it, performs any calculations, and draws colored rectangles on top of the baseline square grid:

/*
*/
d3.csv("page_edits.csv", function(error, csv) {
if (error) throw error;

/*
This is where you decide what values to plot
*/
var data = d3.nest()
.key(function(d) { return d.timestamp ; })
.rollup(function(d) {
return d[0].edits;
})
.object(csv);

rect.filter(function(d) { return d in data; })
.attr("fill", function(d) { return color(data[d]); })
.append("title")
.text(function(d) { return d + ": " + formatStuff(data[d]); });
});


Finally, the most mysterious bit of magic in this code is the code that draws the squares around the months. This has to use the coordinates of the beginning and end of the months to draw a complicated square path.

It's magic, it works, we're happy.

function pathMonth(t0) {
var t1 = new Date(t0.getFullYear(), t0.getMonth() + 1, 0),
d0 = t0.getDay(), w0 = d3.timeWeek.count(d3.timeYear(t0), t0),
d1 = t1.getDay(), w1 = d3.timeWeek.count(d3.timeYear(t1), t1);
return "M" + (w0 + 1) * cellSize + "," + d0 * cellSize
+ "H" + w0 * cellSize + "V" + 7 * cellSize
+ "H" + w1 * cellSize + "V" + (d1 + 1) * cellSize
+ "H" + (w1 + 1) * cellSize + "V" + 0
+ "H" + (w0 + 1) * cellSize + "Z";
}


# Final result

The finished product, visualizing edits to charlesreid1.com/wiki/ and commits to git.charlesreid1.com, can be seen at the following links:

# Project Euler Problem 172

Posted in Mathematics

# Overview: Problem 172

How many 18-digit numbers $$n$$ (without leading zeros) are there such that no digit occurs more than three times in $$n$$?

Link to Project Euler Problem 172

# Background

Project Euler Problem 172 is your classic Project Euler problem: short, simple, and overwhelmingly complicated.

To nail this one, it's important to start simple - very simple. What I'll do is walk through the process of breaking this problem down to find and generalize the patterns needed to count permutations of digits.

First, in combinatorics problems it is important to think about what is changing, and how to count possible outcomes one piece at a time. Then the overall pieces can be combined to get the total count. In this case, we can think about a case for each digit: the case of 3 occurrences, the case of 2 occurrences, the case of 1 occurrence, and the case of 0 occurrences. Depending on the case, we limit our choices for later digits.

Let's start with a similar, but much simpler, problem: how do we construct a binary number with N digits and no more than m 0s and no more than m 1s?

In fact, let's make it even easier: how do we construct a 10 digit binary number with no more than 5 0's and no more than 5 1's?

The answer is, there is only ONE way to choose no more than 5 0's and no more than 5 1's to form a 10 digit number, and that's by having exactly 5 0's and 5 1's. Now that we know exactly how many of each digit we have, we can count the number of permutations of the number 0000011111 (the number of permutations).

## Multiset Permutations

Note that multiset permutations are also discussed on the following wiki pages and blog posts:

If we are selecting from a group of $$N_1$$ things of type A, $$N_2$$ things of type B, and $$N_3$$ things of type C to form a total of $$N$$ things, this type of combinatorics problem is called a multiset permutation, and the total number of ways of arranging this set of 3 things is given by:

$$\binom{N}{N_1, N_2, N_3} = \dfrac{N!}{N_1! N_2! N_3!}$$

In fact, this generalizes, for $$k$$ classes of things we have a $$k$$-set permutation:

$$\binom{N}{N_1, \dots, N_k} = \dfrac{N!}{N_1! \dots N_k!}$$

## A Simple Problem (And Solution)

Back to the problem at hand: to count the number of ways of placing 5 0s and 5 1s to form a 10 digit number.

Once we place 5 digits into any of the 10 available slots, that fixes the locations of the remaining 5 digits. However, we still have to include two 5! values, to account for all possible duplicates if we exchanged all 5 of the 1s with one another, or all 5 of the 0s with one another. We use the expression:

$$\binom{10}{5} = \dfrac{10!}{5! 5!} = 10 \times 9 \times 8 \times 7 \times 6$$

## A slightly More Complicated Problem

To solve a slightly more complicated problem: suppose we have to assemble a 10-digit binary number from no more than 6 0s and no more than 6 1s?

Now we have 3 possible cases of numbers of 0s:

4 0s: 0000111111 - and its permutations

5 0s: 0000011111 - and its permutations

6 0s: 0000001111 - and its permutations


For each of these cases, we can think of it as the "bucket" of 0s containing 4 0s (5 and 6 0s, respectively) and the "bucket" of 1s containing 6 1s (5 and 4 1s, respectively). We still have a number of permutations that we can form using this given number of 0s and 1s, given by a multiset permutation expression.

For each case, we have a multiset permutation expression that tells us how many permutations we can form from the given number of 0s and 1s:

$$\binom{ N }{ N_0, N_1 }$$

So we have three possible outcomes, and the total number of arrangements is the sum of these three cases:

$$N_{perms} = \binom{ 10 }{ 6, 4} + \binom{ 10 }{ 5, 5 } + \binom{ 10 }{ 6 , 4 }$$

# Algorithm

We can generalize the process. Suppose we are forming a number of length N from a number of digits/classes $$k$$ labeled from $$0 \dots k-1$$, and each digit/class can only appear a maximum of $$m$$ times.

The number of combinations that can be formed for a given $$N, k, m$$ is given by the multiset permutation expression above. So the total number of permutations that can be formed is a sum of these multiset permutation expressions, over each possible combination of digits/classes into a number of length $$N$$.

In computer science terms, we can think of this as a nested for loop or dynamic program; in mathematical terms, we can think of a sequence of summations whose limits depend on the variables in the other summations.

$$\sum_{N_1} \sum_{N_2} \dots \sum_{N_k} \binom{N}{N_0, N_1, N_2, \dots, N_{k-1}}$$

where the limits of the summations are given by:

$$N_1 = \min \left(N - (k-1) m, 0 \right) \dots m$$
$$N_2 = \min \left( N - N_1 - (k-2) m, 0 \right) \dots m$$

etc...

$$N_{k-1} = \min \left( N - N_1 - N_2 - \dots - N_{k-2}, 0 \right) \dots m$$

these all fix the number of zeros N_0:

$$N_0 = N - N_1 - N_2 - N_3 - \dots - N_k$$

Notice that we ignore N_0 in the list of summations, because fixing the number of the first k-1 digits/classes (1s, 2s, 3s, ..., (k-1)s) will fix the number of 0s. Alternatively, we could count 0s and include a summation over $$N_0$$, and eliminate the last summation over $$k-1$$.

However, the multiset permutation expression includes ALL of the N's, from $$N_0$$ to $$N_{k-1}$$, since the choice of each variable leads to additional permutations.

Also note that any algorithm implementing this procedure can save time by checking if, for the preceding combinations of $$N$$, we have already reached the maximum possible digits that can be selected. (Alternatively, we could write the upper limit of the summations as expressions depending on the prior values of $$N_i$$, but we'll keep it simple.)

## Ignoring Numbers Starting with Zero

We have one last hurdle remaining, and that is how to ignore numbers that start with 0.

If we think about the problem as selecting the number of times each digit is repeated, then assembling that selection into all possible permutations, fixing the first digit as 0 is equivalent to removing one from the total length of the number that must be assembled, and removing one from the possible 0s that will go in the final number. Thus, if we are assembling an N digit number from $$N_0$$ 0s, $$N_1$$ 1s, $$N_2$$ 2s, $$N_3$$ 3s, on up to $$N_9$$ 9s, then the total number of permutations is given by:

$$\binom{ N }{N_0, N_1, \dots, N_9}$$

If we fix the first digit as 0, the remaining number of permutations is given by:

$$\binom{N-1}{ N_0-1, N_1, \dots, N_9 }$$

Therefore, the number of permutations, excluding those beginning with 0, is written:

$$\binom{ N }{N_0, N_1, \dots, N_9} - \binom{N-1}{ N_0-1, N_1, \dots, N_9 }$$

Also, it is important to note that if N_0 = 0 to begin with, there are no possible ways of assembling numbers that begin with 0 because there are no 0s in the number, so the second term becomes 0:

$$\binom{ N }{0, N_1, \dots, N_9} - 0$$

# Code

## Test Cases

### Test Case 1

Assemble two digits $$\{0,1\}$$ into a 10-digit number, if each digit $$\{0,1\}$$ can occur up to 5 times.

In this case, we know that 0 and 1 must occur exactly 5 times each. Now we are asking how we can assemble two sets of 5 things into 10 slots. This is a multiset permutation problem:

$$\binom{10}{5,5} = \dfrac{10!}{5! \cdot 5!} = \dfrac{10 \cdot 9 \cdot 8 \cdot 7 \cdot 6}{5 \cdot 4 \cdot 3 \cdot 2 \cdot 1} = 252$$

But wait! We also want to exclude numbers starting with 0, so we actually have:

$$\binom{10}{5, 5} - \binom{9}{4, 5} = 126$$

which is half of 252 - exactly what we would expect.

### Test Case 2

Assemble three digits $$\{[0, 1, 2\}$$ into a 6-digit number, if each digit $$\{0, 1, 2\}$$ can occur up to 3 times. No number should start with 0.

In the prior case, we had one outcome of number of 0s and 1s, but in this case, we have a larger number of outcomes that we might see.

Evaluating the expressions for the limits of $$N_i$$, we get:

$$\sum_{N_0 = 0}^{3} \sum_{N_1 = \max(0, 3 - N_0) }^{3} \binom{6}{N_0, N_1, (N-N_0-N_1)}$$

where $$N_2 = N - N_0 - N_1$$. Written out, this becomes the total number of possible 6-digit numbers,

$$a = \binom{6}{0,3,3} + \binom{6}{1,2,3} + \binom{6}{1,3,2} + \binom{6}{2,1,3} + \binom{6}{2,2,2} + \\ \binom{6}{2,3,1} + \binom{6}{3,0,3} + \binom{6}{3,1,2} + \binom{6}{3,2,1} + \binom{6}{3,3,0}$$

minus the number of 6-digit numbers starting with 0:

$$b = 0 + \binom{5}{0,2,3} + \binom{5}{0,3,2} + \binom{5}{1,1,3} + \binom{5}{1,2,2} + \\ \binom{5}{1,3,1} + \binom{5}{2,0,3} + \binom{5}{2,1,2} + \binom{5}{2,2,1} + \binom{5}{2,3,0}$$

Let $$a$$ be the first expression and $$b$$ be the second expression; then the total is:

In [40]: np.sum(a)
Out[40]: 510.0

In [41]: np.sum(b)
Out[41]: 170.0

In [42]: np.sum(a) - np.sum(b)
Out[42]: 340.0

$$a - b = 340$$

## Recursion

The essence of this problem is a nested for loop - but because we have 9 digits to deal with, a 9-level nested for loop would be a big headache and would not generalize well.

Instead, we can write a recursive method that is called for each of the $$k$$ (9) digits being selected to compose the final $$N$$- (18-) digit number.

The recursive method looks something like this:

global variable solution_count
global variable m
global variable N

def recursive_method( n_tuple, n) {
if(n==9) {
compute multiset permutation combinations
increment global solutions total
need N, N0, N1, N2, etc.
} else {
assemble choices for N_i
for(choice in choices) {
set N_i to choice
call recursive_method()
unset N_i
}
}
}


## Pseudocode

Computing the number of possible integers n that meet the specified criteria thus boils down to a long sequence of nested summations (nested loops).

The problem is posed for $$N = 18, k = 10, m = 3$$. For this case, the final expression for the total number of permutations is:

$$\sum_{N_1} \sum_{N_2} \sum_{N_3} \sum_{N_4} \sum_{N_5} \sum_{N_6} \sum_{N_7} \sum_{N_8} \sum_{N_9} \binom{N}{N_0, N_1, N_2, \dots, N_9} - \binom{N-1}{N_0-1, N_1, N_2, \dots, N_9}$$

where the limits of summation are given by:

$$N_1 = \max \left( N - (10-1) m, 0 \right) \dots m$$
$$N_2 = \max \left( N - N_1 - (10-2) m, 0 \right) \dots m$$
$$N_3 = \max \left( N - N_1 - N_2 - (10-3) m, 0 \right) \dots m$$
$$N_4 = \max \left( N - N_1 - N_2 - N_3 - (10-4) m, 0 \right) \dots m$$

etc...

$$N_9 = \max \left( N - N_1 - N_2 - \dots - N_7 - N_8, 0 \right) \dots m$$

and from these, $$N_0$$ is determined by:

$$N_0 = N - N_1 - N_2 - \dots - N_8 - N_9$$

## Python Code

Link to Problem 172 Python Code at git.charlesreid1.com

To implement the solution to Problem 172 in Python, we used recursion, as mentioned above. THe only tricky part of implementing this recursive method was the usual challenge with recursive methods: keeping track of the total number of solutions found via a global variable.

To do this in Python, we declare a variable outside the scope of a given function, and we use that variable as a global variable by declaring it with the global keyword.

import numpy as np

# Real problem:
k = 10
m = 3
N = 18

solution_count = 0
factorials = {}


Now we have a main() driver method to call the recursive method:

def main():
global solution_count
n_tuple = [None,]*k
recursive_method(n_tuple,1)
print("Total number of permutations:")
print("%d"%(solution_count))


We have the recursive backtracking method that constructs all combinations of $$k$$ digits into $$N$$-digit numbers:

def recursive_method( n_tuple, ni ):
"""
Use recursive backtracking to form all possible
combinations of k digits into N-digit numbers
such that the number of digits is m or less.

(n_tuple is actually a list.)

ni = current class step 1..(k-1)
n_tuple = list of number of digits for each class 0 through k
"""
global solution_count, k, m, N
if(ni==k):

# N_1 through N_(k-1) have been set,
# now it is time to set N_0:
# N_0 = N - N_1 - N_2 - N_3 - .. - N_{k-1}
sum_N = np.sum([n_tuple[j] for j in range(1,k)])
n_tuple[0] = max(0, N-sum_N)

# Compute multiset permutation
solution_count += multiset(N,n_tuple) - multiset_0(N,n_tuple)

return

else:

# Problem: we are not stopping
# when the sum of digits chosen
# is greater than N

# Assemble the minimum and maximum limits for N_i:
# (Everything up to ni-1 should be defined, no TypeErrors due to None)
sum_N = np.sum([n_tuple[j] for j in range(1,ni)])
ktm = (k - ni)*m
expr = N - sum_N - ktm
minn = int(max( 0, expr ))

# Note: previously this was just maxx=m.
# This required a check around each call to
# recursive_method to see if the sum of n_tuple
# was already maxed out. Now we just do it here.
maxx = min(m, N-sum_N)

for N_i in range(minn,maxx+1):

# Set
n_tuple[ni] = N_i

# Explore
recursive_method(n_tuple, ni+1)

# Unset
n_tuple[ni] = None

return


We have a multiset() method that evaluates the multiset permutation count formula:

$$\binom{N}{N_1, \dots, N_k} = \dfrac{N!}{N_1! \dots N_k!}$$
def multiset(N, n_tuple):
"""
Number of multiset permutations
"""
r = factorial(N)/(np.product([factorial(j) for j in n_tuple]))
return r

def multiset_0(N, n_tuple):
"""
"""
if(n_tuple[0]>0):
r = factorial(N-1)/(np.product([factorial(j-1) if(i==0) else factorial(j) for i,j in enumerate(n_tuple)]))
return r
else:
return 0


And finally, we have a factorial() method:

def factorial(n):
"""
Factorial utility
"""
if(n<0):
raise Exception("Error: negative factorials not possible")
if(n==1 or n==0):
return 1
else:
return n*factorial(n-1)


At the bottom of the file, we ensure that the driver is run when the funtion is run directly through Python:

if __name__=="__main__":
main()


Setting the correct parameters should result in the following result:

$$P = 227,485,267,000,992,000$$

# 4x4 Rubik's Cube: Part 4: Sequence Order

Posted in Rubiks Cube

This is Part 4 of a 4-part blog post on the mathematics of the 4x4 Rubik's Cube, its relation to algorithms, and some curious properties of Rubik's Cubes.

See Part 1 of this blog post here: Part 1: Representations

See Part 2 of this blog post here: Part 2: Permutations

See Part 3 of this blog post here: Part 3: Factoring Permutations

You are currently reading Part 4 of this blog post: Part 4: Sequence Order

# Introduction

## Order of a Sequence

As a reminder of our overarching goal: starting with a 4x4 Rubik's Revenge cube, an arbitrary sequence of moves will scramble the faces of the cube; but if that move sequence is repeatedly applied, eventually the cube will return to its solved state.

The simplest example is rotating a single face: after applying the rotation move four times to any face of a solved cube, the cube will return back to the solved state.

This is also true of more complicated move sequences, such as U R U' R', which returns the cube back to its original state after 6 applications, or the move sequence U R, which must be applied 105 times before the cube returns back to its original solved state.

Our goal is to predict this number: given a move sequence, how many times must that move sequence be applied to a solved cube to return the cube back to its solved state?

This number is called the order of a sequence.

## What We Have Covered So Far

In prior posts, we have covered a number of key topics that this post will synthesize.

We started Part 1 by discussing ways of representing the Rubik's Revenge cube, and we settled on a 96-tuple representation indicating which faces had moved to what locations.

That led us to Part 2, in which we discussed the two-row notation for the 96-tuple representing the cube, and demonstrated the utility of this representation by showing how moves and move sequences would lead to permutations that could be written as 96-tuples using the two-row notation.

In Part 3, we covered some key theoretical results following Donald Knuth's Art of Computer Programming which allowed us to develop a permutation algebra to describe the effects moves have on the cube. We concluded the previous post with an algorithm for factoring permutations into their intercalation products, and hinted that these permutation factors were central

# Factoring Rubik's Cube Permutations

## Factoring Permutations: A Review

In Part 3 of this series of blog posts, we looked at an example multiset permutation of characters. Here it is written using the two-row notation:

$$\pi = \bigl(\begin{smallmatrix} a & a & b & b & b & b & b & c & c & c & d & d & d & d & d \\ d & b & c & b & c & a & c & d & a & d & d & b & b & b & d \end{smallmatrix}\bigr)$$

We covered a technique for factoring this permutation into independent cycles of faces,

$$\pi = \alpha \top \beta \top \dots \top \gamma$$

and shared Python code to perform this operation. The resulting factored permutation was:

$$\pi = \bigl( \begin{smallmatrix} a & d & d & b & c & d & b & b & c \\ d & d & b & c & d & b & b & c & a \end{smallmatrix} \bigr) \top \bigl( \begin{smallmatrix} a & b \\ b & a \end{smallmatrix} \bigr) \top \bigl( \begin{smallmatrix} b & c & d \\ c & d & b \end{smallmatrix} \bigr) \top \bigl( \begin{smallmatrix} d \\ d \end{smallmatrix} \bigr)$$

## Factoring Rubik's Cube Permutations

To factor a Rubik's Cube permutation, we apply Algorithm A from the prior post to the two-row 96-tuple representation of the Rubik's Cube after it has had the move sequence applied once.

(Note that we only need to apply the sequence to the cube once, even if the order of that sequence is in the tens of thousands.)

Let's look at a few move sequences for some examples:

## Computing the Order of Sequence R

We begin with the solved state, and apply the move R to the cube. The result is the two-line representation:

(01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96)
(01 02 03 36 05 06 07 40 09 10 11 44 13 14 15 48 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 84 37 38 39 88 41 42 43 92 45 46 47 96 61 57 53 49 62 58 54 50 63 59 55 51 64 60 56 52 16 66 67 68 12 70 71 72 08 74 75 76 04 78 79 80 81 82 83 77 85 86 87 73 89 90 91 69 93 94 95 65)


Now, we can carry out the Algorithm A procedure on this two-row representation. When we do that, we will find that there are a large number of one-element independent factors; these are the faces that do not move during the move sequence R.

Here is a list of factors that are found by Algorithm A:

Factor sizes: {1, 4}
Factors:
[36, 84, 77, 4]
[40, 88, 73, 8]
[44, 92, 69, 12]
[48, 96, 65, 16]
[61, 64, 52, 49]
[57, 63, 56, 50]
[53, 62, 60, 51]
[58, 59, 55, 54]
Independent Faces: [1, 2, 3, 5, 6, 7, 9, 10, 11, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 37, 38, 39, 41, 42, 43, 45, 46, 47, 66, 67, 68, 70, 71, 72, 74, 75, 76, 78, 79, 80, 81, 82, 83, 85, 86, 87, 89, 90, 91, 93, 94, 95]
Least common multiple: 4


The largest set of faces that are exchanged is 4, and the smallest is 1. No other groups of faces being exchanged have any other sizes. This means that if we apply the sequence 4 times, each of those groups of faces being interchanged will have returned to their original state.

This tells us what we already knew: that if we apply the sequence "R", it rotates groups of pieces in a sequence of 4 moves each, so overall the order of this permutation is 4 - if we apply the sequence R to a solved 4x4 Rubik's Revenge cube 4 times, the cube will return to the solved state.

To formalize this, if we have cycles with arbitrary lengths, we must apply the sequence a number of times equal to the least common multiple of each factor's size. (For example, if we had a cycle of length 3 above, the cycle order would have been 12 - because the sequence must be applied 12 times before the 4-cycle face exchanges "sync up" with the 3-cycle face exchanges.)

Let's look at a slightly more complicated move sequence to illustrate this point.

### Computing the Order of Sequence U R U' R'

As before, we begin by applying the move sequence once to a solved cube to generate the two-row n-tuple representation:

(01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96)
(01 02 03 77 05 06 07 73 09 10 11 69 16 12 08 20 17 18 19 36 21 22 23 24 25 26 27 28 29 30 31 32 49 50 51 33 37 38 39 40 41 42 43 44 45 46 47 48 13 56 60 64 53 54 55 34 57 58 59 35 61 62 63 04 96 66 67 68 14 70 71 72 15 74 75 76 65 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 52)


Next, we factor this permutation using Algorithm A:

Factor sizes: {1, 3, 6}
Factors:
[77, 65, 96, 52, 64, 4]
[73, 15, 8]
[69, 14, 12]
[16, 20, 36, 33, 49, 13]
[50, 56, 34]
[51, 60, 35]
Independent Faces: [1, 2, 3, 5, 6, 7, 9, 10, 11, 17, 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 53, 54, 55, 57, 58, 59, 61, 62, 63, 66, 67, 68, 70, 71, 72, 74, 75, 76, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95]
Least common multiple: 6


This time, we get a couple of cycles with different lengths. We have four cycles of length 3, and two cycles of length 6, plus many cycles of length 1 (the unpermuted faces).

The LCM of 3 and 6 is 6, so the overall order of the move sequence U R U' R' is 6.

### Computing the Order of Sequence U R

The last sequence we'll look at is the move sequence UR.

This particular permutation represents an interesting corner case: in Part 1 of this post, when we came up with our tuple representation for the cube, we treated each face as being non-interchangeable, by giving each face a unique number. This means that, for example, we cannot swap two arbitrary red faces, since they are attached to other faces via a double edge or a corner piece.

This assumption does not hold for faces in the center of the cube. Because center faces are not attached to any other faces (mechanically speaking), the four distinct integers representing four colored faces can actually be interchanged.

This plays out with the sequence U R as follows:

(01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96)
(13 09 05 01 14 10 06 02 15 11 07 03 48 44 40 36 33 34 35 84 21 22 23 24 25 26 27 28 29 30 31 32 61 57 53 49 37 38 39 88 41 42 43 92 45 46 47 96 16 66 67 68 62 58 54 50 63 59 55 51 64 60 56 52 17 18 19 20 12 70 71 72 08 74 75 76 04 78 79 80 81 82 83 77 85 86 87 73 89 90 91 69 93 94 95 65)


We can factor this tuple as follows:

Factor sizes: {1, 3, 4, 7, 15}
Factors:
[13, 48, 96, 65, 17, 33, 61, 64, 52, 68, 20, 84, 77, 4, 1]
[9, 15, 40, 88, 73, 8, 2]
[5, 14, 44, 92, 69, 12, 3]
[10, 11, 7, 6]
[36, 49, 16]
[34, 57, 63, 56, 50, 66, 18]
[35, 53, 62, 60, 51, 67, 19]
[58, 59, 55, 54]
Independent Faces: [21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 37, 38, 39, 41, 42, 43, 45, 46, 47, 70, 71, 72, 74, 75, 76, 78, 79, 80, 81, 82, 83, 85, 86, 87, 89, 90, 91, 93, 94, 95]
Least common multiple: 420


However, the adventurous cuber will find, when actually carrying out this move sequence, that the order is in fact 105, and not 420.

The reason the predicted cube order is 4 times larger than expected is because, after 105 applications of the move sequence, the cube has not actually returned to its original state, but the only remaining faces that are scrambled are center faces, which are in fact interchangeable.

Note this group of 4 faces that are permuted:

[10, 11, 7, 6]


These are the four center squares from the U face. If we exclude this group (treating 10, 11, 7, and 6 as perfectly interchangeable), the length of all factors no longer contains 4:

Factor sizes: {1, 3, 7, 15}


LCM(1,3,4,7,15) = 420

but excluding the 4, we get:

LCM(1,3,7,15) = 105

Systematically, we can search for any groups that contain only faces from the center, and treat 1 such group of length n as n groups of length 1 (not contributing to the order of the move sequence).

This provides an interesting contrast between the 4x4 Rubik's Revenge cube, in which any center faces may be interchanged with any other center faces, and the 3x3 Rubik's Cube, in which the center faces always remain fixed in relation to one another.

### Computing the Order of Sequence Uw Rw

We mentioned in Part 1 that the move notation Uw or Dw indicates a quarter clockwise turn of two layers of a face, not one. We can write the permutation that results from the move sequence Uw Rw as:

(01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96)
(13 09 05 01 14 10 06 02 47 43 39 35 48 44 40 36 33 34 83 84 37 38 87 88 25 26 27 28 29 30 31 32 61 57 53 49 62 58 54 50 41 42 91 92 45 46 95 96 16 15 67 68 12 11 71 72 63 59 55 51 64 60 56 52 17 18 19 20 21 22 23 24 08 07 75 76 04 03 79 80 81 82 78 77 85 86 74 73 89 90 70 69 93 94 66 65)


Factoring this permutation, we get:

Factor sizes: {1, 3, 15}
Factors:
[13, 48, 96, 65, 17, 33, 61, 64, 52, 68, 20, 84, 77, 4, 1]
[9, 47, 95, 66, 18, 34, 57, 63, 56, 72, 24, 88, 73, 8, 2]
[5, 14, 44, 92, 69, 21, 37, 62, 60, 51, 67, 19, 83, 78, 3]
[10, 43, 91, 70, 22, 38, 58, 59, 55, 71, 23, 87, 74, 7, 6]
[39, 54, 11]
[35, 53, 12]
[40, 50, 15]
[36, 49, 16]
Independent Faces: [25, 26, 27, 28, 29, 30, 31, 32, 41, 42, 45, 46, 75, 76, 79, 80, 81, 82, 85, 86, 89, 90, 93, 94]
Least common multiple: 15


Several groups of 3 faces and of 15 faces, respectively, are permuted, giving an LCM of 15. Thus, the order of move sequence Uw Rw is 15.

We'll look at the factoring of one last sequence: U Rw.

### Computing the Order of Sequence U Rw

Here is the permutation representing the permutation resulting from the sequence U Rw:

(01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96)
(13 09 05 01 14 10 06 02 47 43 39 35 48 44 40 36 33 34 83 84 21 22 23 24 25 26 27 28 29 30 31 32 61 57 53 49 37 38 87 88 41 42 91 92 45 46 95 96 16 15 67 68 62 58 54 50 63 59 55 51 64 60 56 52 17 18 19 20 12 11 71 72 08 07 75 76 04 03 79 80 81 82 78 77 85 86 74 73 89 90 70 69 93 94 66 65)


Factoring this permutation, we get:

Factor sizes: {1, 3, 4, 10, 15, 16}
Factors:
[13, 48, 96, 65, 17, 33, 61, 64, 52, 68, 20, 84, 77, 4, 1]
[9, 47, 95, 66, 18, 34, 57, 63, 56, 50, 15, 40, 88, 73, 8, 2]
[5, 14, 44, 92, 69, 12, 35, 53, 62, 60, 51, 67, 19, 83, 78, 3]
[10, 43, 91, 70, 11, 39, 87, 74, 7, 6]
[36, 49, 16]
[58, 59, 55, 54]
Independent Faces: [21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 37, 38, 41, 42, 45, 46, 71, 72, 75, 76, 79, 80, 81, 82, 85, 86, 89, 90, 93, 94]
Least common multiple: 240


The order of the move sequence U Rw is 240.

# Code

The code that forms the permutation tuple for a given move sequence and performs the factoring of that tuple is in sequence_order.py.

The sequence_order.py file utilizes the dwalton76/rubiks-cube-NxNxN-solver library from Github to apply the move sequence once to a cube to determine the resulting permutation tuple. It then factors this tuple into products and finds the LCM of their lengths.

The code in sequence_order.py is grouped into functions, with the key funtion being factor_permutation(top, bottom), which takes the top and bottom rows of the two-row representation of a move sequence's permutation.

The method then performs the factoring procedure covered in Part 3.

Here is the body of the method:

def factor_permutation(perm_top,perm_bot):
"""
Factor a permutation into its lowest terms
"""
MAX = 96

# Need a way to also mark them as used... bit vector
used_vector = [0,]*len(perm_top)

i = 0
start = perm_top[0]
used_vector[0] = 1
factors = []

# If we still have values to pick out:
while(0 in used_vector):

factor = []

while(True):
used_vector[i] = 1
follower = perm_bot[i]

i = perm_top.index(follower)
while(used_vector[i]==1):
i += 1
if(i>=MAX):
break

if(i>=MAX):
break
elif(follower==start):
break
else:
factor.append(follower)

factor.append(start)

factors.append(factor)
try:
#import pdb; pdb.set_trace()
i = used_vector.index(0)
start = perm_top[i]
except ValueError:
break

return factors


This was called by the method applying move sequences to the Rubik's Cube to obtain the two-row permutation corresponding to the move sequence of interest.

# Project Conclusions

In addition to being interesting, this project led to some deep insights into the workings of the Rubik's Cube and ways to think about move sequences.

More than that, the Rubik's Cube is a toy that provides real insight into combinatorics and group theory. The concept of order, and the process of thinking through different representations of the cube and their consequences for the implemetation of the final algorithm, provide good practice for problems in other, related domains.

This project began with a simple question. While playing with the Rubik's Cube, we discovered this property of cycles (it is actually difficult to miss, even when learning the beginner method, as many of the move sequences involved in the beginner method have small orders, so it is easy to see them repeat.) The question we set out to answer was, given an arbitrary sequence, can we determine the order of that sequence?

The key to answering this question ultimately lies in the representation of the permutations; the right representation makes finding the order possible. but it took some trial and error with different representations before discovering the right approach.

To anyone who has played with the Rubik's Cube before, it seems natural that there would be some way to represent moves applied to the cube in some kind of algebraic terms. The intercalation product was the key concept for developing a permutation algebra. Knuth's Algorithm A was the key concept for factoring permutations into their respective independent cycles.

Once an algorithm to factor permutations was developed, the rest was a straightforward calculation of the LCM of the lengths of each factor.

The project was computationally challenging; recursion was required to implement Algorithm A, the Rubik's Cube solver had to be modified, and there were many bugs along the way.

The procedure we used here can be applied to other problems. Our procedure was:

• Find a proper, convenient representation for the system state
• Break down the variations of the system into simple cases or steps
• Move away from the specific system, and keep the approach mathematially general. This is by far the the most important step!
• Study the literature and solutions to problems, to become familiar with different ways of representing a problem. Different problems lend themselves well to different representations, so the more familiar you are with different representations, the more problems you'll be able to tackle.
• The only way to get familiar with different problem-solving approaches is through practice. It helps to start with easier problems, both because you can score some quick points and feel more confident, and also because combinatorics and group theory problems often tend to appear simple, but deceptively so. The devil is in the details.

# References

1. "Rubik's Cube". Charlesreid1.com wiki, Charles Reid. Edited 25 January 2017. Accessed 25 January 2017. <https://charlesreid1.com/wiki/Rubiks_Cube>

2. "Rubik's Revenge". Charlesreid1.com wiki, Charles Reid. Edited 25 January 2017. Accessed 25 January 2017. <https://charlesreid1.com/wiki/Rubiks_Revenge>

3. "Rubik's Cube/Tuple". Charlesreid1.com wiki, Charles Reid. Edited 25 January 2017. Accessed 25 January 2017. <https://charlesreid1.com/wiki/Rubiks_Cube/Tuple>

4. "Rubik's Cube/Permutations". Charlesreid1.com wiki, Charles Reid. Edited 25 January 2017. Accessed 25 January 2017. <https://charlesreid1.com/wiki/Rubiks_Cube/Permutations>

5. "Github - dwalton76/rubiks-cube-NxNxN-solver". dwalton76, Github Repository, Github Inc. Accessed 11 January 2017. <https://github.com/dwalton76/rubiks-cube-NxNxN-solver>

6. "Rubik's Cube NxNxN Solver". Git repository, git.charlesreid1.com. Charles Reid. Updated 25 January 2017. <https://charlesreid1.com:3000/charlesreid1/rubiks-cube-nnn-solver>

7. "Rubiks Cube Cycles". Git repository, git.charlesreid1.com. Charles Reid. Updated 25 January 2017. <https://charlesreid1.com:3000/charlesreid1/rubiks-cube-cycles>

# Appendix

That concludes our discussion of computing the order of move sequences on a Rubik's Cube. There are many move sequences, and many orders, ranging from 1 or 2 up to nearly 100,000. We plan to assemble a web site to help readers explore some move sequences and their orders - so check back soon...