Hierarchically-clustered Heatmap in Python with Seaborn Clustermap

Hierarchical clustered heatmap with Seaborn Clustermap python
Hierarchical clustered heatmap with Seaborn Clustermap python

In this post, we will learn how to make hierarchically clustered heatmap in Python. We will use Saeborn’s Clustermap function to make a heat map with hierarchical clusters. Seaborn’s Clustermap is very versatile function, but we will showcase the use of the function with just one example.

Let us load Pandas, Seaborn and matplotlib.pyplot to make the clustered heatmap.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

We will use the breast cancer data from Scikit Learn’s datasets. Let us load the data sets from sklearn.datasets. It contains quantitative data with features regarding breast cancer.

from sklearn.datasets import load_breast_cancer
bc_data = load_breast_cancer()

Let us store the data as Pandas datadrame. We will also add the target feature of the data corresponding to whether sample is benign or cancer.

data = pd.DataFrame(bc_data.data, columns=bc_data.feature_names)
data['target']=bc_data.target
data.iloc[0:3,0:3]
	mean radius	mean texture	mean perimeter
0	17.99	10.38	122.8
1	20.57	17.77	132.9
2	19.69	21.25	130.0

We now have the data ready to make heatmap with Seaborn’s clustermap. Let us try to make hierarchically clustered heatmap.

sns.clustermap(data)
plt.savefig('hierarchical_clustered_heatmap_with_Seaborn_clustermap_python_1st_try.png',dpi=150)

The clustered heatmap we got looks really bad. Let us dissect what went wrong and improve.

Hierarchical Clustered Heatmap with Seaborn Clustermap python: 1st Try

By default, Seaborn’s clustermap uses distance metric to make heatmap. Let us change the metric to correlation by using metric=”correlation.

sns.clustermap(data, metric="correlation")
plt.savefig('hierarchical_clustered_corr_heatmap_with_Seaborn_clustermap_python_2nd_try.png',dpi=150)

We seem to have not made any improvement with the metric choice.

Seaborn Clustermap: 2nd Try

If we take a look at that data, our features are different scale. Let us standardise the column features using “standard_scale=1” and make the heatmap.

sns.clustermap(data,
               metric="correlation",
               standard_scale=1)
plt.savefig('hierarchical_clustered_scaled_heatmap_with_Seaborn_clustermap_python_3rd_try.png',dpi=150)

Out third try at making clustered heatmap after transforming each feature on the same scale has helped greatly and our clustered heatmap looks much better.

Seaborn Clustermap: 3rd Try

In the breast cancer data, we also have group identity, if the same is benign or cancer. Let us add that to our heatmap. The basic idea is to assign color each group and add a column with that color.

Let us first create a color dictionary mapping group to a color. And then use Pandas ability to create a new variable with the dictionary using map function. This creates a new color variable using the dictionary and the disease group variable.

color_dict=dict(zip(np.unique(bc_data.target),np.array(['g','skyblue'])))
target_df = pd.DataFrame({"target":bc_data.target})
row_colors = target_df.target.map(color_dict)

Now we can feed that color variable to the argument row_colors in Seaborn’s clustermap function.

sns.clustermap(data,
               metric="correlation",
               standard_scale=1,
               row_colors=row_colors)
plt.savefig('hierarchical_clustered_heatmap_with_Seaborn_clustermap_python.png',dpi=150)

This gives us clustered heatmap with column for target, i.e. the group level information for each sample, We can see that some of the members of the group nicely clusters together, while the others don’t with our chosen clustering metric.

Hierarchical clustered heatmap with Seaborn Clustermap python

Let us remove the tick labels on the y-axis using yticklabels=False.

sns.clustermap(data, 
               metric="correlation",
               standard_scale=1,
               row_colors=row_colors,
               yticklabels=False)
plt.savefig('hierarchical_clustered_heatmap_2_with_Seaborn_clustermap_python.png',dpi=150)
Hierarchical clustered heatmap with Seaborn Clustermap python
Exit mobile version