• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Data Viz with Python and R

Learn to Make Plots in Python and R

  • Home
  • Seaborn
  • Matplotlib
  • ggplot2
  • Altair
  • About
    • Privacy Policy
  • Visualizing Activation Functions in Neural Networks
  • Confusion Matrix Calculator
  • Visualizing Dropout Rate in Neural Network
  • Visualizing Loss Functions in Neural Networks
  • Show Search
Hide Search

How to Visualize Missing Values in a dataframe as heatmap

datavizpyr · October 18, 2022 ·

Last updated on August 22, 2025

In this post, we will learn how to visualize a dataframe with missing values represented as NAs as a heatmap. A quick visualization of missing values in the data is useful in analyzing the data. We will use mainly tidyverse approach, first to create a toy dataframe with missing values, then use ggplot2’s geom_tile() function to make the heatmap and add specific color to represent NAs using scale_fill_continuous() function.

👉 Want more? Explore the full Seaborn Tutorial Hub with 35+ examples, code recipes, and best practices.

Let us get started by loading tidyverse and palmer penguins dataset.

library(tidyverse)
library(palmerpenguins)
theme_set(theme_bw(16))

We will consider only the numeric columns and select just the few rows for illustration.

penguins <- penguins  %>%
  select(where(is.numeric)) %>%
  select(-year) %>%
  drop_na() %>%
  head()

Our toy data looks like this, six rows with no missing values.

penguins

# A tibble: 6 × 4
  bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
           <dbl>         <dbl>             <int>       <int>
1           39.1          18.7               181        3750
2           39.5          17.4               186        3800
3           40.3          18                 195        3250
4           36.7          19.3               193        3450
5           39.3          20.6               190        3650
6           38.9          17.8               181        3625

Adding NAs randomly to a dataframe using tidyverse

Let us add NAs to the toy dataframe randomly. We use across() function in combination with mutate() to introduce NAs in each column. We introduce NAs probabilistically with 20% chance for missing value at each element.

set.seed(42)
df <- penguins%>%
  mutate(across(where(is.numeric),
                ~ifelse(sample(c(TRUE, FALSE),
                               size = n(),
                               replace = TRUE, 
                               prob = c(0.8, 0.2)),
                        ., NA)))

Our data with missing values look like this.

df 

# A tibble: 6 × 4
  bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
           <dbl>         <dbl>             <int>       <int>
1           NA            18.7                NA        3750
2           NA            17.4               186        3800
3           40.3          18                 195          NA
4           NA            19.3                NA        3450
5           39.3          20.6                NA          NA
6           38.9          17.8               181          NA

Visualizing Dataframe with NAs as heatmap

To make a heatmap, we will use ggplot2’s geom_tile() function. First wee need to reshape the data into tidy long form using pivot_longer() function. We also add unique row number to help us make the heatmap.

df_tidy <- df %>%
  scale() %>%
  as.data.frame() %>%
  mutate(row_id=row_number()) %>%
  pivot_longer(-row_id, names_to="feature", values_to="values") 

Our tidy data with NAs look like this

df_tidy

# A tibble: 24 × 3
   row_id feature            values
    <int> <chr>               <dbl>
 1      1 bill_length_mm    NA     
 2      1 bill_depth_mm      0.0566
 3      1 flipper_length_mm NA     
 4      1 body_mass_g        0.440 
 5      2 bill_length_mm    NA     
 6      2 bill_depth_mm     -1.05  
 7      2 flipper_length_mm -0.188 
 8      2 body_mass_g        0.704 
 9      3 bill_length_mm     1.11  
10      3 bill_depth_mm     -0.538 
# … with 14 more rows

Now we can go ahead and make a heatmap using geom_tile() function. Here x-axis will be the penguin features and y axis is each penguin. We fill each tile by the values of the features. Note that we have scaled the values of each column as they were on very different scales.

df_tidy %>%
  ggplot(aes(x=feature, y=row_id, fill=values))+
  geom_tile()

geom_tile() recognizes NA values in our data and colors them as grey.

Visualizing dataframe with NAs as heatmap
Visualizing dataframe with NAs as heatmap

If needed we can customise the tiles for NAs with a color of choice using scale_fill_continuous().

df_tidy %>%
  ggplot(aes(x=feature, y=row_id, fill=values))+
  geom_tile()+
  scale_fill_continuous(na.value = 'red')

Visualizing Data with missing data as heatmap
Customizing heatmap for visualizing Data with missing data

Also note that when we reshape our data from wide to long, the column order on the heatmap is different from the original column order. The heatmap’s column order is sorted alphabetically and it is the same order as shown below.

df %>% 
  select(sort(names(.)))

# A tibble: 6 × 4
  bill_depth_mm bill_length_mm body_mass_g flipper_length_mm
          <dbl>          <dbl>       <int>             <int>
1          18.7           NA          3750                NA
2          17.4           NA          3800               186
3          18             40.3          NA               195
4          19.3           NA          3450                NA
5          20.6           39.3          NA                NA
6          17.8           38.9          NA               181

Explore the Complete ggplot2 Guide

35+ tutorials with code: scatterplots, boxplots, themes, annotations, facets, and more—tested and beginner-friendly.

Visit the ggplot2 Hub → No fluff—just code and visuals.

Related posts:

Customizing Mean mark to boxplot with ggplot2How To Show Mean Value in Boxplots with ggplot2? Scatterplot with marginal multi-histogram with ggExtraHow To Make Scatterplot with Marginal Histograms in R? Default Thumbnail4 Ways to have Plots Side by side in ggplot2 Default ThumbnailHow to make rolling mean line plot of stock data

Filed Under: ggplot2, R Tagged With: visualizing data with NAs as heatmap

Primary Sidebar

Python & R Viz Hubs

  • Seaborn Guide & Cookbook
  • ggplot2 Guide & Cookbook
  • Matplotlib Guide & Cookbook
  • Confusion Matrix Calculator
  • Visualizing Activation Functions
  • Visualizing Dropout
  • Visualizing Loss Functions

Buy Me a Coffee

Copyright © 2026 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version