Volcano Plot with R
By Saulo Gil
October 1, 2023
Volcano Plot with R
What’s that for??
A volcano plot is a type of scatter-plot that is used to identify changes in large data sets. It is very used by bioinformatic researches when analyzing data from “omics” technologies (e.g., genomics, proteomics, metabolomics, metagenomics, phenomics and transcriptomics). It plots significance versus fold-change on the y and x axes, respectively. It enables quick visual identification of genes with large fold changes that are also statistically significant.
Briefly, in a volcano plot, the most upregulated genes are towards the right, the most downregulated genes are towards the left, and the most statistically significant genes are towards the top.
The R, especially the dplyr and ggplot2 packages (both from Tidyverse), are very useful for creating beautiful and useful volcano plots.
Herein, I replicated the codes from a wonderful tutorial to create a volcano plot from Erika Duan which I strongly recommend see it for details click here to acess!!!
SO, LET’S TO DO IT
Packages required
library(tidyverse)
library(janitor)# Cleaning column names
library(scales)# Transform axis scales
library(ggrepel)# Optimise plot label separation
Import a test dataset
Firstly, we need to access the dataset!
The dataset used to create volcano plot is from study of Fu et at. (2025) intitle EGF-mediated induction of Mcl-1 at the switch to lactation is essential for alveolar cell survival. This study explored the role and regulation of Mcl-1 in mammopoiesis and the RNA-seq dataset can be accessed and downloaded on this site.
But…I don’t understand anything about the topic!!!!

No problem, neither do I but we don’t need to understand it to create a beautiful Volcano plot!
# Load dataset -----------------------------------------------------------------
samples <- read_delim("raw_data/limma-voom_luminalpregnant-luminallactate",
delim = "\t",
escape_double = FALSE,
trim_ws = TRUE)
# Clean column names -----------------------------------------------------------
# Convert columns names to snake case by default using clean_names()
samples <- clean_names(samples)
# Manually edit column names using rename()
samples <- samples |>
rename(entrez_id = entrezid, # from entrezid to entrez_id
gene_name = genename) # from genename to gene_name
Let’s see the dataset!
# Visualise the dataset as a table ---------------------------------------------
# round numeric values
samples |>
mutate(
log_fc = round(log_fc, digits = 2),
ave_expr = round(ave_expr, digits = 2),
t = round(t, digits = 2),
p_value = round(p_value, digits = 2),
adj_p_val = round(adj_p_val, digits = 2)
) |>
DT::datatable()
The data contains the following columns:
- Entrez ID - stores the unique gene ID;
- Gene symbol - stores the gene symbol associated with an unique Entrez ID;
- Gene name - stores the gene name associated with an unique Entrez ID;
- log2(Fold change) - stores the log2-transformed change in gene expression level between two types of tissue samples;
- Adjusted p-value - stores the p-value adjusted with a false discovery rate (FDR) correction for multiple testing.
Basic Volcano Plot
As before mentioned, the volcano plot display the significance data (i.e., adjusted p-value) versus the fold-change [i.e., log2(Fold change)] on the y and x axes, respectively.
Note: The transformation -log10(adj_p_val) allows points on the plot to project upwards as the fold change increases or decreases in magnitude.
Our first Volcano plot
# volcano plot -------------------------------------------------
vol_plot <-
samples |>
ggplot(aes(x = log_fc,
y = -log10(adj_p_val))) + # Transformation -log10(adj_p_val)
geom_point() +
theme_bw()
vol_plot # Visualise ggplot output

It appears very simple and uninformative! But there is good news, it can be improved a lot!
Add horizontal and vertical lines
It is very common to determine cutoffs for adjusted p-value and log_fc.
Let’s to insert these cutoffs:
- adj_p_value <= 0.05;
- log_fc <= -1 (i.e., downregulated);
- log_fc >= 1 (i.e., upregulated);
# Plot extra quadrants ---------------------------------------------------------
vol_plot +
geom_hline(yintercept = -log10(0.05), # horizontal line
linetype = "dashed") +
geom_vline(xintercept = c(log2(0.5), log2(2)), # vertical line
linetype = "dashed")

Now, we can observe that there are a lot of genes upregulated and downregulated according to the predetermined cutoff.
But, it still is possible to improve more!!!
Modify the x-axis and y-axis
Symmetrical axes are fundamental in this type of plot!
So, let’s set it!!!
# Identify the best range for xlim() -------------------------------------------
samples |>
select(log_fc) |>
min() |>
floor()
## [1] -10
samples |>
select(log_fc) |>
max() |>
ceiling()
## [1] 8
c(-10, 8) |>
abs() |>
max()
## [1] 10
# Modify xlim() ----------------------------------------------------------------
# Manually specify x-axis limits
vol_plot +
geom_hline(yintercept = -log10(0.05),
linetype = "dashed") +
geom_vline(xintercept = c(log2(0.5), log2(2)),
linetype = "dashed") +
xlim(-10, 10)

Furthermore, it is possible to change the limits of the x-axis with more ticks.
# Modify scale_x_continuous() --------------------------------------------------
vol_plot +
geom_hline(yintercept = -log10(0.05),
linetype = "dashed") +
geom_vline(xintercept = c(log2(0.5), log2(2)),
linetype = "dashed") +
scale_x_continuous(breaks = c(seq(-10, 10, 1)), # Modify x-axis tick intervals
limits = c(-10, 10)) # Modify xlim() range

NICE!!!
Now, we have a good volcano plot! Let’s to customize others details to make it more informative!
Colour, size and transparency
To add color, size, and transparency are useful strategies to visualize different gene groups. But, first, we need to categorize genes into different groups and store these categories as a new column of data.
- Genes with log_fc >= 1 & adj_p_val <= 0.05 as up;
- Genes with log_fc <= -1 & adj_p_val <= 0.05 as down;
- All other genes labelled as ns i.e. non-significant;
# Create new categorical column ------------------------------------------------
samples <-
samples |>
mutate(gene_type = case_when(log_fc >= 1 & adj_p_val <= 0.05 ~ "upregulated",
log_fc <= -1 & adj_p_val <= 0.05 ~ "downregulated",
TRUE ~ "ns"))
# Count gene_type categories ---------------------------------------------------
samples |>
count(gene_type) |>
DT::datatable(caption = "Table 2. Number of downregulated, upregulated and non-significant manner regulated genes")
Now, let’s to customize adding color, size, and transparency!
# Add colour, size and alpha (transparency) to volcano plot --------------------
cols <- c("upregulated" = "#ffad73",
"downregulated" = "#26b3ff",
"ns" = "grey")
sizes <- c("upregulated" = 2,
"downregulated" = 2,
"ns" = 1)
alphas <- c("upregulated" = 1,
"downregulated" = 1,
"ns" = 0.5)
samples |>
ggplot(aes(x = log_fc,
y = -log10(adj_p_val),
fill = gene_type,
size = gene_type,
alpha = gene_type)) +
geom_point(shape = 21, # Specify shape and colour as fixed local parameters
colour = "black") +
geom_hline(yintercept = -log10(0.05),
linetype = "dashed") +
geom_vline(xintercept = c(log2(0.5), log2(2)),
linetype = "dashed") +
scale_fill_manual(values = cols) + # Modify point colour
scale_size_manual(values = sizes) + # Modify point size
scale_alpha_manual(values = alphas) + # Modify point transparency
scale_x_continuous(breaks = c(seq(-10, 10, 1)),
limits = c(-10, 10))+
theme_bw()

Label points of interest - The most useful tool!
# Define another subset of interest from the original data ---------------------
sig_il_genes <-
samples |>
filter(symbol %in% c("Il15",
"Il34",
"Il24"))
up_il_genes <-
samples |>
filter(symbol == "Il24")
down_il_genes <-
samples |>
filter(symbol %in% c("Il15",
"Il34"))
# Modify legend labels by re-ordering gene_type levels -------------------------
samples <-
samples |>
mutate(gene_type = fct_relevel(gene_type, "upregulated", "downregulated"))
ggplot(data = samples,
aes(x = log_fc,
y = -log10(adj_p_val))) +
geom_point(aes(colour = gene_type),
alpha = 0.2,
shape = 16,
size = 1) +
geom_point(data = up_il_genes,
shape = 21,
size = 2,
fill = "firebrick",
colour = "black") +
geom_point(data = down_il_genes,
shape = 21,
size = 2,
fill = "steelblue",
colour = "black") +
geom_hline(yintercept = -log10(0.05),
linetype = "dashed") +
geom_vline(xintercept = c(log2(0.5), log2(2)),
linetype = "dashed") +
geom_label_repel(data = sig_il_genes, # Add labels last so they appear as the top layer
aes(label = symbol),
force = 2,
nudge_y = 1) +
scale_colour_manual(values = cols) +
scale_x_continuous(breaks = c(seq(-10, 10, 2)),
limits = c(-10, 10)) +
theme_bw() + # Select theme with a white background
theme(panel.border = element_rect(colour = "black", fill = NA, size= 0.5),
panel.grid.minor = element_blank(),
panel.grid.major = element_blank())

AMAZING VOLCANO PLOT!
So, that’s it!
Thanks again to Erika Duan for the amazing tutorial!

- Posted on:
- October 1, 2023
- Length:
- 271 minute read, 57706 words
- See Also: