The sunburst chart on this page displays a distribution of data along two
independent dimensions, X and Y.
To generate this data, random samples were drawn from a joint PDF of two
random variables, one normally distributed, and one distributed according
to a gamma distribution.
I used Python to do this:
import numpy as np
from numpy.random import randn
import pandas as pd
from scipy import stats
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
# Generate discrete joint distribution
fig = plt.figure()
ax = fig.add_subplot(111)
x = np.random.normal(3,1,1500)
y = stats.gamma(3).rvs(1500)
H,xedges,yedges = np.histogram2d(y,x,[10,10])
X, Y = np.meshgrid(xedges,yedges)
mpl.rc("figure", figsize=(6, 6))
pcolormesh(X,Y,H);
ax = gca();
ax.set_xlim([0,10])
ax.set_ylim([0,6])
ax.set_xlabel('X');
ax.set_ylabel('Y');
mpl.rc("figure", figsize=(6, 6))
ax = sns.kdeplot(x,y,shaded=True)
ax.set_xlim(0,10)
ax.set_ylim(0,10)
ax.set_xlabel('X')
ax.set_ylabel('Y')
which results in the joint distribution shown in the plot below:
Next, we map the bins of each dimension, X and Y, to a set of variables.
In this case, we generated 10 bins for X and 10 bins for Y.
This is easily done with some code calling the Numpy
histogram2d
function.
This results in a binned, 10x10 grid:
This data is displayed in the sunburst chart, with the x dimension represented in the
inner ring, and the y dimension represented in the outer ring (applied to each arc).
Because the sunburst chart groups things categorically, we are converting
the quantitative X and Y scales to groups according to bins. We arbitrarily
label the bins, but maintain their order (which is important).
The final data is an array that looks like this:
{
'x' :
'y' :
'value' :
}
The value is provided by the matrix
H
of counts per bin,
returned by
np.histogram2d
.