Relationship to HoloViews
Many users of iqplot may also use HoloViews, an excellent high-level plotting package that can render plots using Bokeh. Users familiar with HoloViews understand the concept of a key dimension (kdims) and a value dimension (vdims), which are central to building HoloViews elements. Here, I will demonstrate how the q
and cats
arguments for the iqplot functions relate to kdims and vdims in HoloViews for the one quantitative/n categorical types of data
sets.
I will again use the automobile data set. For all plots, I will take the origin of the car to be the sole categorical variable and the miles per gallon (mpg) to be the quantitative variable.
First, we’ll do necessary imports and load in the data set.
[1]:
import numpy as np
import holoviews as hv
import colorcet
import iqplot
import bokeh.sampledata.autompg
import bokeh.io
bokeh.io.output_notebook()
hv.extension('bokeh')
df = bokeh.sampledata.autompg.autompg_clean
df['cyl'] = df['cyl'].astype(str)
To abstract away the styling of the plots and focus only on defining HoloViews plotting elements, we define the styling here and also write a function for displaying the plots.
[2]:
def no_ygrid_hook(plot, element):
"""Hook for disabling x-grid lines."""
plot.handles["plot"].ygrid.grid_line_color = None
boxwhisker_opts = dict(
box_color="origin",
box_line_color="white",
box_line_width=1.5,
box_width=0.4,
cmap=colorcet.b_glasbey_category10,
frame_height=150,
frame_width=400,
hooks=[no_ygrid_hook],
invert_axes=True,
invert_yaxis=True,
outlier_line_alpha=0,
show_grid=True,
title="HoloViews",
)
boxstrip_opts = dict(
box_fill_alpha=0,
box_line_color="gray",
box_line_width=1,
box_width=0.4,
frame_height=150,
frame_width=400,
hooks=[no_ygrid_hook],
invert_axes=True,
invert_yaxis=True,
outlier_fill_alpha=0,
outlier_line_alpha=0,
show_grid=True,
title="HoloViews",
whisker_line_color="gray",
)
strip_opts = dict(
cmap=colorcet.b_glasbey_category10,
color="origin",
frame_height=150,
frame_width=400,
hooks=[no_ygrid_hook],
invert_axes=True,
invert_yaxis=True,
jitter=0.4,
show_grid=True,
show_legend=False,
size=4,
title="HoloViews",
)
ecdf_opts = dict(
color=hv.Cycle(colorcet.b_glasbey_category10),
frame_height=150,
frame_width=400,
legend_position="right",
show_grid=True,
size=4,
title="HoloViews",
)
hist_opts = dict(
color=hv.Cycle(colorcet.b_glasbey_category10),
fill_alpha=0.5,
frame_height=150,
frame_width=400,
line_alpha=0,
show_grid=True,
title="HoloViews",
)
bc_kwargs = dict(
align="end", frame_height=150, frame_width=400, title="iqplot",
)
def show_plots(p, phv):
bokeh.io.show(
bokeh.layouts.gridplot(
[p, bokeh.layouts.Spacer(height=30), hv.render(phv)], ncols=1
)
)
Plots with a categorical axis
Box-and-whisker and strip plots in iqplot feature a categorical axis, with the other axis being quantitative. In this case the q
variable corresponds to a HoloViews value dimension and the cats
variable(s) correspond to HoloViews key dimensions.
Box plots
Building a box-and-whisker plot with HoloViews is very similar to building one with iqplot. We just have to remember that for box plot, HoloViews defines the categorical variable to be a key dimension and the quantitative variable to be a value dimension.
[3]:
p = iqplot.box(data=df, q="mpg", cats="origin", **bc_kwargs)
phv = hv.BoxWhisker(data=df, kdims="origin", vdims="mpg",).opts(**boxwhisker_opts)
show_plots(p, phv)
Strip plots
A strip plot is made in HoloViews using a Scatter element where the categorical variable is a key dimension.
[4]:
p = iqplot.strip(data=df, q="mpg", cats="origin", spread="jitter", **bc_kwargs)
phv = hv.Scatter(data=df, kdims="origin", vdims="mpg").opts(**strip_opts)
show_plots(p, phv)
Note that as of July 2022, hv.Scatter
does not support multiple categorical variables, though this feature is likely coming soon. iqplot does have this capability for strip plots.
Strip-box plots
Strip-box plots are easily implemented in HoloViews because of the composibility of its elements.
[5]:
p = iqplot.stripbox(data=df, q="mpg", cats="origin", spread="jitter", **bc_kwargs)
ps_hv = hv.Scatter(data=df, kdims="origin", vdims="mpg").opts(**strip_opts)
pb_hv = hv.BoxWhisker(data=df, kdims="origin", vdims="mpg",).opts(**boxstrip_opts)
show_plots(p, pb_hv * ps_hv)
Plots without a categorical axis
When making ECDFs and histograms, both axes on the plot are quantitative and the categorical variable(s) are expressed using color. For these plots, we need to do groupby operations in HoloViews and make overlays, resulting in a significant difference between how we build the plots with iqplot versus HoloViews.
ECDFs
To make a plot of an ECDF, we need to create another column in the data frame containing the value of the ECDF for each value of the quantitative variable. This is accomplished using Pandas.
[6]:
df["ECDF"] = df.groupby("origin")["mpg"].transform(
lambda data: data.rank(method="first") / len(data)
)
To build the plot of the ECDF with the data represented as dots, we use a Scatter element. The quantitative variable (in this case mpg) is a key dimension (in contrast to box-and-whisker and strip plots, where it was a value dimension). We have to specify two or more value dimensions, one for the value of the ECDF and the others are for the categorical variables (in this case the origin of the car). To encode the categorical variable(s) with color, we need to to perform a groupby operation on the categorical variable(s) and then make an overlay.
[7]:
p = iqplot.ecdf(data=df, q="mpg", cats="origin", **bc_kwargs)
phv = (
hv.Scatter(data=df, kdims=["mpg"], vdims=["ECDF", "origin"])
.groupby("origin",)
.opts(**ecdf_opts)
.overlay()
)
show_plots(p, phv)
Note that the ordering of the HoloViews plot is different from that of the iqplot plot. This is because we cannot control the order in which HoloViews does its groupby operation. We could instead build the ECDFs one-by-one using a groupby operation in Pandas and then constructing the overlay.
[8]:
p = iqplot.ecdf(data=df, q="mpg", cats="origin", **bc_kwargs)
ecdfs = [
hv.Scatter(data=group, kdims="mpg", vdims="ECDF", label=label).opts(**ecdf_opts)
for label, group in df.groupby("origin", sort=False)
]
phv = hv.Overlay(ecdfs)
show_plots(p, phv)
Note that in HoloViews, there is no simple way to make a “staircase” ECDF, since that requires creating a new data frame with twice the number of data points in order to connect the lines in the staircase plot. There is also no direct way of making the bootstrap confidence interval.
Histograms
Histogams in HoloViews require specification of the counts and edges for each bin, as would be produced using np.histogram()
. To make a histogram in HoloViews, then, we need to explicitly compute the bins. As we did for the ECDF example above, we do this by iterating over a Pandas GroupBy object. The kdims
and vdims
specification of the Histogram element are then simply axis labels, where the quantitative dimension is the key dimension.
[9]:
p = iqplot.histogram(
data=df,
q="mpg",
cats="origin",
arrangement="overlay",
style="step_filled",
**bc_kwargs
)
def hist(data):
"""Compute bins and edges for histogram using Freedman-Diaconis rule"""
h = 2 * (np.percentile(data, 75) - np.percentile(data, 25)) / np.cbrt(len(data))
bins = int(np.ceil((data.max() - data.min()) / h))
return np.histogram(data, bins=bins)
hists = [
hv.Histogram(hist(group["mpg"]), kdims="mpg", vdims="count", label=label).opts(
**hist_opts
)
for label, group in df.groupby("origin", sort=False)
]
phv = hv.Overlay(hists)
show_plots(p, phv)