In this post, we will learn how to turn off the “missing values” warning message from ggplot2, when making a scatterplot with data containing missing values. geom_point() in ggplot2 gives a warning when it drops missing values from from the dataset it is plotting. Here is example of the warning when geom_point() drops 2 data points while plotting.
Removed 2 rows containing missing values (`geom_point()`)
We will see two examples of how to turn off the warning message. Let us get started by loading tidyverse and palmer penguin dataset for making plots.
library(tidyverse) library(palmerpenguins) theme_set(theme_bw(16))
Palmer penguin dataset has missing values. When we try to make a scatter plot as shown below
penguins %>% ggplot(aes(x=body_mass_g, y = flipper_length_mm, color=species))+ geom_point()+ scale_color_brewer(palette ="Dark2" ) ggsave("remove_missing_values_dropped_warning_ggplot.png")
We get the following warning
## Warning: Removed 2 rows containing missing values (`geom_point()`).
Drop NAs from data to avoid the warning message
One approach to get around the warning message “Removed 2 rows containing missing values” is to drop rows containing missing values before plotting using drop_na() function in tidyr.
drop_na() function by default removes a row if there is any NA value. Therefore we will not see the warning message.
penguins %>% drop_na() %>% ggplot(aes(x=body_mass_g, y = flipper_length_mm, color=species))+ geom_point()+ scale_color_brewer(palette ="Dark2" )
use na.rm in geom_point() to avoid the warning message
We can actually turn off the warning message that “rows containg missing values have been dropped” by specifying na.rm=TRUE as argument to geom_point() function.
penguins %>% drop_na() %>% ggplot(aes(x=body_mass_g, y = flipper_length_mm, color=species))+ geom_point(na.rm=TRUE)+ scale_color_brewer(palette ="Dark2" )
When wsing na.rm=TRUE within geom_point(), ggplot2 takes care of the rows with missing values insterad of us dropping the rows with missing values in the whole dataframe.