Sinaplot vs Violin plot: Why Sinaplot is better than Violinplot

Sinaplot and Violinplot
Sinaplot and Violinplot

In this post, we will learn how to make Sinaplot in R and show why it is a better way visualize numerical data from multiple categories. In an earlier post, we discussed the benefits of making Violinplot than making a boxplot. This is mainly due to the fact that Boxplot relies only five summary stats from the data and it can be not a good option when the data we have is multi-modal. In such a situation adding density of the data to boxplot aka a Violin plot is a better option.

In this post, we will use the simple illustration from the paper by Hintze and Nelson introducing Violinplot to show why Violinplot+Sinaplot is better than Sinaplot alone and Sinaplot is better than Violinplot alone

Let us load tidyverse and ggforce. ggforce is a R “package aimed at providing missing functionality to ggplot2 through the extension system introduced with ggplot2”

library(tidyverse)
theme_set(theme_bw(16))

We will create data set from three known distributions. The first one is a bimodal distribution constructed from two normals with different means. Second distribution is uniform distribution and the third one is normal distribution.

bimodal <- c(rnorm(100,4),rnorm(100,8))
uniform <- c(runif(200,min=4,max=8))
normal <- c(rnorm(200,6,sd=3))

Boxplot does not reveal the pattern in the data

Let us create a dataframe with three groups of data.

df <- data.frame(bimodal=bimodal,
                 uniform=uniform,
                 normal=normal)
head(df)

##    bimodal  uniform    normal
## 1 4.664331 4.946468  6.574739
## 2 4.153938 4.917738  6.591718
## 3 3.285484 7.184626  5.219606
## 4 3.886365 6.963298 13.033837
## 5 4.168054 4.613306  8.303880
## 6 3.490347 6.343641  9.975178

And convert the data in wide form to long tidy form using pivot_longer() from tidyr package.

f_tidy <- df %>% 
  pivot_longer(cols=bimodal:normal,values_to = "obs", names_to = "grp")

Let us make boxplot, which uses five summary statistics from the data. And we can see that all three distributions look the same. Boxplots basically masks the pattern in the data.

df_tidy %>%
  ggplot(aes(x=grp,y=obs, fill=grp))+
  geom_boxplot()+
  theme(legend.position = "none")
ggsave("boxplot_bimodal_normal_uniform_ggplot2.png")
Sinaplot > Violinplot > Boxplot

Violinplot is better than Boxplot

One of the solutions is to use Violin plot, which adds density layer to the boxplot it and captures the pattern in the data. In our example, Violinplot captures the bimodal nature of the first group and normal/uniform distribution of other two groups nicely.

df_tidy %>% 
  ggplot(aes(x=grp,y=obs, fill=grp))+
  geom_violin()+
  theme(legend.position="none")
ggsave("violinplot_bimodal_normal_uniform_ggplot2.png")
Violin plot with geom_violin

Violinplot with jittered data points is better than Violinplot alone

Violinplot still relies on summary stats and it does not show the actual data behind the plot. It is always good to show the data. Therefore, a better alternative is to add jitted data points to Violin plot.

df_tidy %>% 
  ggplot(aes(x=grp,y=obs, fill=grp))+
  geom_violin()+
  geom_jitter(width=0.1,alpha=0.5)+
  theme(legend.position="none")
ggsave("violinplot_with_jittered_data_points_ggplot2.png")
Violin plot with jittered points

Sinaplot is better than Violinplot

Violin plot with jittered data points show the data and the density pattern. However, the jittered data points can still overlap with each other and can not fully see the data. A solution is Sinaplot. Sinaplot combines the goodness of jittered data points (or strippplot) and the violin plot.

By letting the normalized density of points restrict the jitter along the x-axis the plot displays the same contour as a violin plot, but resemble a simple strip chart for small number of data points. In this way the plot conveys information of both the number of data points, the density distribution, outliers and spread in a very simple, comprehensible and condensed format.

We can make sinaplot with ggforce’s geom_sina() function.

df_tidy %>% 
  ggplot(aes(x=grp,y=obs, color=grp))+
  geom_sina()+
  theme(legend.position="none")
ggsave("sinaplot_with_geom_sina_ggforce.png")
Sinaplot with geom_sina() in ggforce

Sinaplot with Vioplot is better than Sinaplot alone

Although sinaplot nices show the data and their density pattern nicely, adding violinplot to it makes it look much better.

df_tidy %>% 
  ggplot(aes(x=grp,y=obs, fill=grp))+
  geom_violin()+
  geom_sina(alpha=0.5)+
  theme(legend.position="none")
ggsave("sinaplot_with_violinplot_ggforce.png")
Sinaplot and Violinplot

This post was long due, but thanks to the tweet by @strnr that rekindled the interest again.

Exit mobile version