In this post, we will learn how to turn off the “missing values” warning message from ggplot2, when making a scatterplot with data containing missing values. geom_point() in ggplot2 gives a warning when it drops missing values from from the dataset it is plotting. Here is example of the warning when geom_point() drops 2 data points while plotting.
1 | Removed 2 rows containing missing values (` geom_point ()`) |
We will see two examples of how to turn off the warning message. Let us get started by loading tidyverse and palmer penguin dataset for making plots.
1 2 3 | library (tidyverse) library (palmerpenguins) theme_set ( theme_bw (16)) |
Palmer penguin dataset has missing values. When we try to make a scatter plot as shown below
1 2 3 4 5 6 7 | penguins %>% ggplot ( aes (x=body_mass_g, y = flipper_length_mm, color=species))+ geom_point ()+ scale_color_brewer (palette = "Dark2" ) ggsave ( "remove_missing_values_dropped_warning_ggplot.png" ) |
We get the following warning
1 | ## Warning: Removed 2 rows containing missing values (`geom_point()`). |
Drop NAs from data to avoid the warning message
One approach to get around the warning message “Removed 2 rows containing missing values” is to drop rows containing missing values before plotting using drop_na() function in tidyr.
drop_na() function by default removes a row if there is any NA value. Therefore we will not see the warning message.
1 2 3 4 5 6 7 | penguins %>% drop_na () %>% ggplot ( aes (x=body_mass_g, y = flipper_length_mm, color=species))+ geom_point ()+ scale_color_brewer (palette = "Dark2" ) |
use na.rm in geom_point() to avoid the warning message
We can actually turn off the warning message that “rows containg missing values have been dropped” by specifying na.rm=TRUE as argument to geom_point() function.
1 2 3 4 5 6 7 | penguins %>% drop_na () %>% ggplot ( aes (x=body_mass_g, y = flipper_length_mm, color=species))+ geom_point (na.rm= TRUE )+ scale_color_brewer (palette = "Dark2" ) |
When wsing na.rm=TRUE within geom_point(), ggplot2 takes care of the rows with missing values insterad of us dropping the rows with missing values in the whole dataframe.