• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Data Viz with Python and R

Learn to Make Plots in Python and R

  • Home
  • Python Viz
  • Seaborn
  • Altair
  • R Viz
  • ggplot2
  • About
    • Privacy Policy
  • Show Search
Hide Search

How to Visualize Missing Values in a dataframe as heatmap

datavizpyr · October 18, 2022 ·

In this post, we will learn how to visualize a dataframe with missing values represented as NAs as a heatmap. A quick visualization of missing values in the data is useful in analyzing the data. We will use mainly tidyverse approach, first to create a toy dataframe with missing values, then use ggplot2’s geom_tile() function to make the heatmap and add specific color to represent NAs using scale_fill_continuous() function.

Let us get started by loading tidyverse and palmer penguins dataset.

library(tidyverse)
library(palmerpenguins)
theme_set(theme_bw(16))

We will consider only the numeric columns and select just the few rows for illustration.

penguins <- penguins  %>%
  select(where(is.numeric)) %>%
  select(-year) %>%
  drop_na() %>%
  head()

Our toy data looks like this, six rows with no missing values.

penguins

# A tibble: 6 × 4
  bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
           <dbl>         <dbl>             <int>       <int>
1           39.1          18.7               181        3750
2           39.5          17.4               186        3800
3           40.3          18                 195        3250
4           36.7          19.3               193        3450
5           39.3          20.6               190        3650
6           38.9          17.8               181        3625

Adding NAs randomly to a dataframe using tidyverse

Let us add NAs to the toy dataframe randomly. We use across() function in combination with mutate() to introduce NAs in each column. We introduce NAs probabilistically with 20% chance for missing value at each element.

set.seed(42)
df <- penguins%>%
  mutate(across(where(is.numeric),
                ~ifelse(sample(c(TRUE, FALSE),
                               size = n(),
                               replace = TRUE, 
                               prob = c(0.8, 0.2)),
                        ., NA)))

Our data with missing values look like this.

df 

# A tibble: 6 × 4
  bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
           <dbl>         <dbl>             <int>       <int>
1           NA            18.7                NA        3750
2           NA            17.4               186        3800
3           40.3          18                 195          NA
4           NA            19.3                NA        3450
5           39.3          20.6                NA          NA
6           38.9          17.8               181          NA

Visualizing Dataframe with NAs as heatmap

To make a heatmap, we will use ggplot2’s geom_tile() function. First wee need to reshape the data into tidy long form using pivot_longer() function. We also add unique row number to help us make the heatmap.

df_tidy <- df %>%
  scale() %>%
  as.data.frame() %>%
  mutate(row_id=row_number()) %>%
  pivot_longer(-row_id, names_to="feature", values_to="values") 

Our tidy data with NAs look like this

df_tidy

# A tibble: 24 × 3
   row_id feature            values
    <int> <chr>               <dbl>
 1      1 bill_length_mm    NA     
 2      1 bill_depth_mm      0.0566
 3      1 flipper_length_mm NA     
 4      1 body_mass_g        0.440 
 5      2 bill_length_mm    NA     
 6      2 bill_depth_mm     -1.05  
 7      2 flipper_length_mm -0.188 
 8      2 body_mass_g        0.704 
 9      3 bill_length_mm     1.11  
10      3 bill_depth_mm     -0.538 
# … with 14 more rows

Now we can go ahead and make a heatmap using geom_tile() function. Here x-axis will be the penguin features and y axis is each penguin. We fill each tile by the values of the features. Note that we have scaled the values of each column as they were on very different scales.

df_tidy %>%
  ggplot(aes(x=feature, y=row_id, fill=values))+
  geom_tile()

geom_tile() recognizes NA values in our data and colors them as grey.

Visualizing dataframe with NAs as heatmap
Visualizing dataframe with NAs as heatmap

If needed we can customise the tiles for NAs with a color of choice using scale_fill_continuous().

df_tidy %>%
  ggplot(aes(x=feature, y=row_id, fill=values))+
  geom_tile()+
  scale_fill_continuous(na.value = 'red')

Visualizing Data with missing data as heatmap
Customizing heatmap for visualizing Data with missing data

Also note that when we reshape our data from wide to long, the column order on the heatmap is different from the original column order. The heatmap’s column order is sorted alphabetically and it is the same order as shown below.

df %>% 
  select(sort(names(.)))

# A tibble: 6 × 4
  bill_depth_mm bill_length_mm body_mass_g flipper_length_mm
          <dbl>          <dbl>       <int>             <int>
1          18.7           NA          3750                NA
2          17.4           NA          3800               186
3          18             40.3          NA               195
4          19.3           NA          3450                NA
5          20.6           39.3          NA                NA
6          17.8           38.9          NA               181

Related posts:

Customizing Mean mark to boxplot with ggplot2How To Show Mean Value in Boxplots with ggplot2? Scatterplot with marginal multi-histogram with ggExtraHow To Make Scatterplot with Marginal Histograms in R? ggforce geom_circle(): Annotate with a circleHow To Annotate a Plot with Circle in R Default ThumbnailHow to Make Axis Text Bold in ggplot2

Filed Under: ggplot2, R Tagged With: visualizing data with NAs as heatmap

Primary Sidebar

Tags

Altair barplot Boxplot boxplot python boxplot with jiitered text labels Bubble Plot Color Palette Countplot Density Plot Facet Plot gganimate ggplot2 ggplot2 Boxplot ggplot2 error ggplot boxplot ggridges ggtext element_markdown() Grouped Barplot R heatmap heatmaps Histogram Histograms Horizontal boxplot Python lollipop plot Maps Matplotlib Pandas patchwork pheatmap Pyhon Python R RColorBrewer reorder boxplot ggplot Ridgeline plot Scatter Plot Scatter Plot Altair Seaborn Seaborn Boxplot Stock Price Over Time Stripplot UpSetR Violinplot Violin Plot World Map ggplot2

Buy Me a Coffee

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version