How to Make Overlapping Histograms in Python with Altair?

Overlapping Histogram From Wide Data
Overlapping Histogram From Wide Data

In this post, we will learn how to make multiple overlapping histograms in Python using Altair. Using Altair, we can make overlapping histograms or layers histograms from data that is in wide form and long tidy form.

We will see an example of making overlapping histograms from data in tidy form first and then from wide form data.

import pandas as pd
import altair as alt
import numpy as np

Let us generate data using NumPy’s random module in wide form first and then convert to tidy/long form.

np.random.seed(42)
# Generate random Data
df = pd.DataFrame({
    'Africa': np.random.normal(40, 10, 1000),
    'Asia': np.random.normal(55, 10, 1000),
    'Americas': np.random.normal(80, 10, 1000)
})

The Pandas dataframe we created is in wide form with each continent’s lifeExp values in separate columns.

df.head()

Africa	Asia	Americas
0	33.286265	56.708735	78.586568
1	32.862005	55.122554	68.174800
2	54.254059	50.688449	78.802943
3	58.644137	54.974733	81.479956
4	30.711175	59.908416	85.188345

Let us use Pandas’ melt() function to convert the data in wide form to long tidy form.

df_long = df.melt(var_name="continent", value_name="lifeExp")

Now we have data in both wide and tidy form. Let us go ahead and see how to make multiple histograms on the same plot from both of these type of data.

Multiple Overlapping Histograms in Altair Using Tidy/Long data

To make histogram with Altair, we need to use mark_area() function. Here we specify transparency level with opacity argument. And the key argument that makes histogram is interpolate=’step’. Without that the histogram would look like area chart from Altair.

Then we specify the variables and the number of bins. To make overlapping histograms, we need to specify alt.Color() variable with the categorical variable corresponding to multiple histograms.

alt.Chart(df_long).mark_area(
    opacity=0.5,
    interpolate='step'
).encode(
    alt.X('lifeExp:Q', bin=alt.Bin(maxbins=100)),
    alt.Y('count()', stack=None),
    alt.Color('continent:N')
).properties(
    title='Overlapping Histograms from Tidy/Long Data'
)

Now we have made multiple overlapping histograms from wide data. Without alt.Color() variable, we will get a single histogram using all the data.

How To Make Multiple Overlapping Histograms With Altair From Tidy Data?

Multiple Overlapping Histograms in Altair Using Wide data

Often you might start with data that is in wide form. Altair has trasnform_fold() function that can convert data in wide for to tidy long form. This allows us not to use Pandas’ melt() function and lets us transfor the data within Altair.

Let us use Altair’s transform_fold() to reshape wide data to tidy/long data and make overlapping histograms in Python.

We specify the variables names that need to reshaped and names for new variables in the tidy data within transfor_fold() first, as shown below.

alt.Chart(df).transform_fold(
    ['Africa', 'Asia', 'Americas'],
    as_=['Continent', 'LifeExp']
).mark_area(
    opacity=0.5,
    interpolate='step'
).encode(
    alt.X('LifeExp:Q', bin=alt.Bin(maxbins=100)),
    alt.Y('count()', stack=None),
    alt.Color('Continent:N')
).properties(
    title='Overlapping Histograms from Wide Data'
)

We get the same overlapping histogram, but this time with wide data using Altair’s transform_fold() function.

How To Make Multiple Overlapping Histograms With Altair From Wide Data?
Exit mobile version