How To Make Stripplot with Jitter in Altair Python?

Stripplot Altair with jitter
Stripplot Altair with jitter

In this post we will learn how to make strip plots using Altair in Python. A simple strip plot is plotting the data as points and may not that useful. One way to make the simple striplplot more meaningful is to add random jitter. Making a stipplot with jitter in Altair is slightly tricky. Luckily, came across this sample code to make stripplot in Altair website. This post is step-by-step illustration and walk through of the example that starts with simple stripplot and progressively iterate through a better stripplot with jitter.

Let us first import Altair and Pandas to make stripplot with Altair.

import altair as alt
import pandas as pd

We will use gapminder data to make stripplots of lifeExp across all continents.

data_url = 'http://bit.ly/2cLzoxH'
gapminder = pd.read_csv(data_url)
gapminder.head()

country	year	pop	continent	lifeExp	gdpPercap
0	Afghanistan	1952	8425333.0	Asia	28.801	779.445314
1	Afghanistan	1957	9240934.0	Asia	30.332	820.853030
2	Afghanistan	1962	10267083.0	Asia	31.997	853.100710

Simple Stripplot with Altair

Let us start with simple jitter plot in Altair. We can make stripplot using mark_circle() as shown below.

alt.Chart(gapminder).mark_circle().encode(
    x='continent:O',
    y='lifeExp:Q',
    color="continent"
).properties(width=500)

We get a simple stripplot like this.

Basic Stripplot Altair in Python

One can immediately see the problem of overplotting with most of the points overlapping on each other. We will begin with the simple stripplot and work towards making it better. At first let us remove the legend in the stripplot above as it is the same as the values on x-axis, therefore redundant.

How to Remove Legend in Altair?

We can remove the legend in Altair using legend=None argument inside alt.Color() function.

alt.Chart(gapminder).mark_circle().encode(
    x='continent:O',
    y='lifeExp:Q',
    color=alt.Color('continent:N', legend=None),
).properties(width=600)

We get the same stripplot, but now it is without legend for the colors for values on x-axis.

Stripplot Altair No Legend

Stripplot with Jitter in Altair: First Try

We kind of know that we need to add jitter to the variable on x-axis in the strippplot. As a first try, let us use alt.X() with jitter option. In addition to jitter, we also specify other options with in alt.X(). In Altair, we also have to specify a function to compute how much random noise to add in the jittered stripplot. Here, we have used transform_calcluate() function with a function for jitter as argument.

stripplot =  alt.Chart(gapminder, width=60).mark_circle(size=8).encode(
    x=alt.X(
        'jitter:Q',
        title=None,
        axis=alt.Axis(values=[0], ticks=True, grid=False, labels=False),
        scale=alt.Scale(),
    ),
    y=alt.Y('lifeExp:Q'),
    color=alt.Color('continent:N', legend=None),
  ).transform_calculate(
    # Generate Gaussian jitter with a Box-Muller transform
    jitter='sqrt(-2*log(random()))*cos(2*PI*random())'
)
stripplot

Let us check how does the stripplot looks like now. It seems we have successfully added jitter to data points. However, we don’t have each of continent separate in the jittered stripplot.

Stripplot with jitter: First try

Stripplot with Jitter in Altair: Second Try with alt.Column()

Altair has the option of specifying which variable should be arranged in separate column with alt.Column(). Here, the continent values should be on separate columns. With alt.Column(), we also specify header and label information and locations with header argument.

stripplot =  alt.Chart(gapminder, width=60).mark_circle(size=8).encode(
    x=alt.X(
        'jitter:Q',
        title=None,
        axis=alt.Axis(values=[0], ticks=True, grid=False, labels=False),
        scale=alt.Scale(),
    ),
    y=alt.Y('lifeExp:Q'),
    color=alt.Color('continent:N', legend=None),
    column=alt.Column(
        'continent:N',
        header=alt.Header(
            labelAngle=0,
            titleOrient='top',
            labelOrient='bottom',
            labelAlign='center',
            labelPadding=10,
        ),
    ),
).transform_calculate(
    # Generate Gaussian jitter with a Box-Muller transform
    jitter='sqrt(-2*log(random()))*cos(2*PI*random())'
)
stripplot

Now we get a much better looking stripplot with jitter. And each continent’s distribution is in a separate column with a heading.

Stripplot with jitter Altair: Second Try

Stripplot with Jitter in Altair: Remove Space Between Columns

Although the stripplot with jitter looks great, we can notice that the spacing between the columns makes the plot looks more like a facet plot. With Altair, we can adjust the spacing with “configure_facet(spacing=0)” option.

stripplot =  alt.Chart(gapminder, width=120).mark_circle(size=8).encode(
    x=alt.X(
        'jitter:Q',
        title=None,
        axis=alt.Axis(values=[0], ticks=True, grid=False, labels=False),
        scale=alt.Scale(),
    ),
    y=alt.Y('lifeExp:Q'),
    color=alt.Color('continent:N', legend=None),
    column=alt.Column(
        'continent:N',
        header=alt.Header(
            labelAngle=0,
            titleOrient='top',
            labelOrient='bottom',
            labelAlign='center',
            labelPadding=10,
        ),
    ),
).transform_calculate(
    # Generate Gaussian jitter with a Box-Muller transform
    jitter='sqrt(-2*log(random()))*cos(2*PI*random())'
).configure_facet(
    spacing=0
)
stripplot

There is no spacing between the columns of the stripplot with jitter now. However, we have a verticle lines separating each column.

Stripplot Altair with jitter: Fourth try

Stripplot with Jitter in Altair: Remove Vertical Lines Between Columns

Let us remove the vertical lines separating the columns using configure_view(stroke=None).

stripplot =  alt.Chart(gapminder, width=120).mark_circle(size=8).encode(
    x=alt.X(
        'jitter:Q',
        title=None,
        axis=alt.Axis(values=[0], ticks=True, grid=False, labels=False),
        scale=alt.Scale(),
    ),
    y=alt.Y('lifeExp:Q'),
    color=alt.Color('continent:N', legend=None),
    column=alt.Column(
        'continent:N',
        header=alt.Header(
            labelAngle=0,
            titleOrient='top',
            labelOrient='bottom',
            labelAlign='center',
            labelPadding=10,
        ),
    ),
).transform_calculate(
    # Generate Gaussian jitter with a Box-Muller transform
    jitter='sqrt(-2*log(random()))*cos(2*PI*random())'
).configure_facet(
    spacing=0
).configure_view(
    stroke=None
)
stripplot

We have now removed the vertical lines and our stripplot with jitter looks perpect except for the font sizes of axis ticks and labels.

Stripplot Altair with jitter: Fourth try

Stripplot with Jitter in Altair: Increase the font size

We can increase font sizes with configure_axis() in Altair.

stripplot =  alt.Chart(gapminder).mark_circle(size=14).encode(
    x=alt.X(
        'jitter:Q',
        title=None,
        axis=alt.Axis(values=[0], ticks=True, grid=False, labels=False),
        scale=alt.Scale(),
    ),
    y=alt.Y('lifeExp:Q',
           scale=alt.Scale(
            domain=(20,90))),
    color=alt.Color('continent:N', legend=None),
    column=alt.Column(
        'continent:N',
        header=alt.Header(
            labelFontSize=16,
            labelAngle=0,
            titleOrient='top',
            labelOrient='bottom',
            labelAlign='center',
            labelPadding=25,
        ),
    ),
).transform_calculate(
    # Generate Gaussian jitter with a Box-Muller transform
    jitter='sqrt(-2*log(random()))*cos(2*PI*random())'
).configure_facet(
    spacing=0
).configure_view(
    stroke=None
).configure_axis(
    labelFontSize=16,
    titleFontSize=16
).properties(height=400, width=100)
stripplot

And finally we have a nice stripplot with jitter as we originally intended to. Hurray!

Stripplot Altair with jitter
Exit mobile version