Skip to content Skip to sidebar Skip to footer

Boxplot for Continuous and Discrete Data

In the last tutorial, I showed you how to visualizetwo continuous variables (X, Y) inR. In this tutorial, I will show you how to visualize two variables (Discrete X, Continuous Y).

Generally we want to visualize two variables (Discrete X, Continuous Y) to show the statistical difference between two or more groups. It is used to compare groups over period of time to track any changes.

For data visualization we will useggplot2 package inR. You can learn more aboutggplot2 in my previous tutorial One variable tutorial.

We will be using the same data set from the previous tutorial ( Cirrhosis data from Mayo Clinic ) to demonstrate how to visualize two variables (Discrete X, Continuous Y).

First we need to install and loadggplot2 intoR (If you installed it before just load it into R).

                                                            #install ggplot2 package                                            install.packages("ggplot2")                                                              # load ggplot2 package                                            library(ggplot2)                                    

Now we need todownload andload the data intoR, copy the commands below to yourR script and run them, you should be able to see the whole table.

                                          #load the data in RStudio                              # Specify URL url = "https://vincentarelbundock.github.io/Rdatasets/csv/survival/pbc.csv"                              # Specify destination where file should be saved                              destfile = "A:/R/Series/data.csv"                              # Apply download.file function in R                              download.file(url, destfile)                              #Read csv file                              data = read.csv("A:/R/Series/data.csv"); View(data) View(data)          

Data Visualization:Two Variable (Discrete X, Continuous Y).

First we need to initialize a ggplot object by usingggplot() function. This function only specify where the data will come from and which variables will be used.

Before we start the analysis I want to clean up the table by deleting any row which has NA (not available data) by typing this command below

            data <- na.omit(data)          

Now we need to choose two variables one should be discrete and one should be continuous. You can check Data Visualization _Tutorial 1 for more information about the difference between continuous and discrete variables.

Blow, I specified that the variables will come fromdata (we named our table above; data) and I specified stage (stage of the disease from 1 to 4 ) to be our Discrete variable X and ast (Aspartate aminotransferase (U/mL)) to be our Continuous variable Y. But before that we need to convert stage to factor to be able to use is as discrete variable.

                                          #converting "stage" from numeric to factor(categorical)                              data$stage <- as.factor(data$stage)                              #nitializing a ggplot object                                            a <- ggplot(data, aes(stage, ast))                      

We only need to create one ggplot object. After that we can create from it different types of plot. In this tutorial, I will show you how to do:

  1. Boxplot.
  2. Dotplot.
  3. Barplot.

So, let's start

1. Boxplot

Boxplot is used to visualize data through their quartiles. We can create boxplots by using geom_boxplot() function.

We chose where the data will come from and which variables we will be using and assigned them toa (shown in the code above). Now we need to create the boxplot by simply typing

                                          #plot a boxplot                              a + geom_boxplot()          

And we will get something like that.

It does not look nice so we need to add some colors to make it look better. First lets give a different color for each stage

                                          #color by stages                              a + geom_boxplot(                aes(color = stage)              )          

Also, we can fill the boxplot if we want, and remove the gray background and add proper labels to x and y.

Now the plot looks better, we can add another information to the plot, for example grouping according to sex (male and female)

                                          #add another variable to the plot                              a + geom_boxplot(                aes(fill = sex)              )          

2. Dotplot.

This is a very simple way to visualize the distribution of the data. We will use geom_jitter() function. We will use the same ggplot object we create above a . We will use position argument to position all the dots in the middle for each group (stage).

                                          #create a dotplot                              a + geom_jitter(position=position_jitter(0.1))                      

Let's color the dots according to sex (male and female) and remove the gray background and add proper labels to x and y.

                                          #color the dots according to sex                            a +  geom_jitter(position=position_jitter(0.1),                                  aes(color = sex)              ) +     theme_minimal() + labs(x = "Stages", y = "Aspartate aminotransferase (U/mL)")          

We can change the shape and color of the dots as shown below

                                          #change color and shape according to stages and sex                                            a +  geom_jitter(position=position_jitter(0.1),                              aes(color = stage, shape = sex)              ) + theme_minimal() + labs(x = "Stages", y = "Aspartate aminotransferase (U/mL)")          

We can combine both boxplot with dotplot in on graph as shown below

                                          #combine dotplot with boxplot                                            a +  geom_boxplot()+ geom_jitter(aes(color = stage, shape = stage),                                   position=position_jitter(0.15))  +     theme_minimal() + labs(x = "Stages", y = "Aspartate aminotransferase (U/mL)")          

Finally, lets see how can do a bar plot.

3. Barplot

We can choose a simple way to visualize our data by using geom_col() function to plot a barplot. We will use position = "dodge" argument so we can visualize the mean of the data instead of the sum.

                                          #creat a barplot to show the mean of each group                              a + geom_col(position = "dodge")          

Will get this graph

Now, let's enhance the graph by adding colors and removing the background.

                                          #add colors                              a + geom_col(aes(fill = stage), position = "dodge") + theme_minimal() +     labs(x = "Stages", y = "Aspartate aminotransferase (U/mL)")          

We can color each group according to sex (male and female)

                                          #grouping according to sex                              a + geom_col(aes(fill = sex), position = "dodge") + theme_minimal() +     labs(x = "Stages", y = "Aspartate aminotransferase (U/mL)")          

Finally, to save the plot that appears on your RStudio , you can type the command below and R will save it to your work directory.

            ggsave("plot.png", width = 5, height = 5)          

You can watch this tutorial on YouTube below. Next Tutorial will be about Visualizing error in R plots.

pierrewity1967.blogspot.com

Source: https://bioinfo4all.wordpress.com/2020/12/02/tutorial-3-two-variables-discrete-x-continuous-y-boxplot-dotplot-bar/

Post a Comment for "Boxplot for Continuous and Discrete Data"