Boxplots – Data Visualisation

Box and Whisker Plots or boxplots, are a hugely useful data visualisation tool to clearly compare algorithm configuration performance results (or experiment data with multiple dimensions). However, using Python’s Matplotlib library to implement them suitably for comparisons by groups used to be tough. To make them attractive and clear you had to stitch together documentation and examples and more examples and grids and line colours and axis labels and some very hacky legend use case, etc.. each taken from across the matplotlib site and beyond. So I wrote a couple of scripts to simplify grouped boxplots that can be directly reused..

Here’s the Grouped Boxplot (on left) and Ungrouped Boxplot (on right):

The code to create the plots is below.

Notes:

  • The red line shows data centrality using the independent median statistic.
  • The upper and lower borders of the box show data spread using the relatively independent interquartile range (IQR) statistics, at median + 25% and at median – 25%.
  • The whiskers (typically) show up to the upper most data point value, that is within 1.5 to 1.75 times the IQR from the median. That is extended in both upper and lower quartile directions.
  • The red + (plus symbol) show the outliers, the data points that fall outside of the quartile plus whisker distance.

Independent vs Dependent Summary Statistics

Box plots are independent, in the sense that as the plotted box visualisation of the data medians and interquartile range measures will not factor wild outliers into the summary. Though outliers are not ignored. This is as opposed to dependent plots that show mean and standard deviation; in these cases outliers are incorporated into the plotted visualisation. Which you choose depends on what you want to show.

Code

You will need Matplotlib in Python 2.7.x. The Python 2.7 Anaconda distribution has everything you need to make boxplots using this code.

You can download the code files from github.

Grouped Boxplots: (Python 2.7/ 3 Code)

import numpy as np 
import matplotlib.pyplot as plt

# --- Your data, e.g. results per algorithm:
data1 = [5,5,4,3,3,5]
data2 = [6,6,4,6,8,5]
data3 = [7,8,4,5,8,2]
data4 = [6,9,3,6,8,4]

# --- Combining your data:
data_group1 = [data1, data2]
data_group2 = [data3, data4]

# --- Labels for your data:
labels_list = ['a','b']
xlocations = range(len(data_group1))
width = 0.3
symbol = 'r+'
ymin = 0
ymax = 10

ax = plt.gca()
ax.set_ylim(ymin,ymax)
ax.set_xticklabels( labels_list, rotation=0 )
ax.grid(True, linestyle='dotted')
ax.set_axisbelow(True)
ax.set_xticks(xlocations)
plt.xlabel('X axis label')
plt.ylabel('Y axis label')
plt.title('title')

# --- Offset the positions per group:
positions_group1 = [x-(width+0.01) for x in xlocations]
positions_group2 = xlocations


plt.boxplot(data_group1,
sym=symbol,
labels=['',''],
positions=positions_group1,
widths=width,
# notch=False,
# vert=True,
# whis=1.5,
# bootstrap=None,
# usermedians=None,
# conf_intervals=None,
# patch_artist=False,
)

plt.boxplot(data_group2,
labels=['a','b'],
sym=symbol,
positions=positions_group2,
widths=width,
# notch=False,
# vert=True,
# whis=1.5,
# bootstrap=None,
# usermedians=None,
# conf_intervals=None,
# patch_artist=False,
)

plt.savefig('boxplot_grouped.png')
plt.savefig('boxplot_grouped.pdf') # when publishing, use high quality PDFs
#plt.show() # uncomment to show the plot.

Ungrouped Boxplots: (Python 2.7/3 Code)

import numpy as np
import matplotlib.pyplot as plt

# --- Your data, e.g. results per algorithm:
data1 = [5,5,4,3,3,5]
data2 = [6,6,4,6,8,5]
data3 = [7,8,4,5,8,2]
data4 = [6,9,3,6,8,4]

# --- Combining your data:
data        = [data1, data3, data2, data4]

# --- Labels for your data:
labels_list = ['a','b','c','d']
width       = 0.3
symbol      = 'r+'
ymin        = 0
ymax        = 10

ax = plt.gca()
ax.set_ylim(ymin,ymax)
ax.set_xticklabels( labels_list, rotation=0 )
ax.grid(True)
ax.set_axisbelow(True)
plt.xlabel('X axis label')
plt.ylabel('Y axis label')
plt.boxplot(data, widths=width)

# --- Save to file:
plt.savefig('boxplot.png')
plt.savefig('boxplot.pdf')    # when publishing, use high quality PDFs
#plt.show()                   # uncomment to show the plot.

hth.



If you’re interested to read more via occasional content/ project updates, etc, feel free keep in touch via email or contact me on social @pmdscully (below).


Processing…
Success! You're on the list.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.