dang-sunburst


The sunburst chart on this page displays a distribution of data along two independent dimensions, X and Y.

To generate this data, random samples were drawn from a joint PDF of two random variables, one normally distributed, and one distributed according to a gamma distribution.

I used Python to do this:
import numpy as np
from numpy.random import randn
import pandas as pd
from scipy import stats
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

# Generate discrete joint distribution

fig = plt.figure()
ax = fig.add_subplot(111)

x = np.random.normal(3,1,1500)
y = stats.gamma(3).rvs(1500)

H,xedges,yedges = np.histogram2d(y,x,[10,10])
X, Y = np.meshgrid(xedges,yedges)

mpl.rc("figure", figsize=(6, 6))
pcolormesh(X,Y,H);
ax = gca();
ax.set_xlim([0,10])
ax.set_ylim([0,6])
ax.set_xlabel('X');
ax.set_ylabel('Y');

mpl.rc("figure", figsize=(6, 6))
ax = sns.kdeplot(x,y,shaded=True)
ax.set_xlim(0,10)
ax.set_ylim(0,10)
ax.set_xlabel('X')
ax.set_ylabel('Y')

which results in the joint distribution shown in the plot below:

Next, we map the bins of each dimension, X and Y, to a set of variables. In this case, we generated 10 bins for X and 10 bins for Y. This is easily done with some code calling the Numpy histogram2d function. This results in a binned, 10x10 grid:

This data is displayed in the sunburst chart, with the x dimension represented in the inner ring, and the y dimension represented in the outer ring (applied to each arc).

Because the sunburst chart groups things categorically, we are converting the quantitative X and Y scales to groups according to bins. We arbitrarily label the bins, but maintain their order (which is important).

The final data is an array that looks like this:
{
    'x' : 
    'y' : 
    'value' : 
}
The value is provided by the matrix H of counts per bin, returned by np.histogram2d.