1 of 17

Applied Bioinformatics 2025�Week 3 Session 2�Differential Protein Abundance and Volcano Plots

Natalie Turner, PhD

Postdoctoral Fellow – Yates Lab

Department of Molecular Medicine

naturner@scripps.edu

2 of 17

MSstats vignette/manual - page 11

3 of 17

MSstatsGroupComparison

Tests for significant changes in protein abundance between different conditions
Can be used for label and label-free workflows
Experimental design of single or repeated measures is automatically determined according to annotations and group levels

4 of 17

groupComparisonPlots:�Volcano Plot

Visualization of differential protein abundance

X axis = log₂ fold change

Y axis = -log₁₀ p value

Dotted lines indicate thresholds for significance

Colored dots depict more abundant (red), less abundant (blue), or not significantly different (grey) proteins

5 of 17

Go to your R Notebook

Run the code up until # Matrix creation

6 of 17

Creating a comparison matrix

First – determine the group order (normally alphanumerical)

Create comparison matrix:

🡪 Comparisons are denoted as ‘-1’ and ‘1’; if there are more than 2 groups, all other conditions are denoted as ‘0’

IMPORTANT: Name the rows appropriately; in this case:

row.names(comparison) 🡨 c("HM-CM")

# Matrix creation

levels(ProcessedData$ProteinLevelData$GROUP) # gets group levels/prints to console

comparison <- matrix(c(-1,1),nrow=1) #creates comparison matrix

7 of 17

MSstatsGroupComparison

Takes the output of MSstatsPrepareForGroupComparison(ProcessedData) and the comparison matrix as input
The resulting data is then used as input for Volcano Plot generation
Alternatively, use the groupComparison function for simplified analysis (see vignette for details)

8 of 17

Go to your R Notebook

Finish running the code and generate a Volcano Plot
Change some of the Volcano plot parameters and recreate the plot to see the difference
Save the ProteinResults$ComparisonResults data as a .csv file so you can manually inspect the proteins that are significantly different between groups.

= adj p value < 0.05; Log₂FC > +/- 0.58

9 of 17

Capstone Task Walkthrough

10 of 17

Milk-derived Extracellular Vesicles (EVs)

Milk EVs are released within the mammary gland into milk
Nano-sized (small EVs ~50 - 200 nm)
Involved in cell-cell signalling and communication
Relay important messages and molecules between mother and infant
Important for immunity, growth and development

11 of 17

Filtering, pre-processing, MSstats

MSstats and data formatting/pre-processing:

Perform differential protein abundance analysis and generate a volcano plot.
Replicate the volcano plot settings as per the published paper.
Remove trypsin and iRT standards from the dataframe before processing.
Attempt to filter the results to include homologous peptide sequences and plot these results as your final analysis.
If you are successful or unsuccessful, provide a brief summary of the various ways you tried to filter the results. Explain what you think the impact on the final results will be if all of the peptides (homologous and non-homologous) are used for quantification.
Create and save the volcano plot.

12 of 17

mixOmics and MSstats: Plot creation

PCA plot – include all 3 groups

Different to the in-class exercise: Normalization strategy (normalizeMedianValues instead of normalizeQuantiles)

13 of 17

DIANNtoMSstatsFormat settings

When using the DIANNtoMSstatsFormat function for the capstone task, use the following settings:

global_qvalue_cutoff = 0.01,

qvalue_cutoff = 0.01,

pg_qvalue_cutoff = 0.01,

useUniquePeptide = TRUE,

removeFewMeasurements = TRUE,

removeOxidationMpeptides = FALSE,

removeProtein_with1Feature = TRUE,

MBR = TRUE

For any other options in this function (mainly related to log information), you can use the default settings or whatever you prefer, as it will not alter the outcome of the data processing.

14 of 17

Volcano plot with updated MSstats

Homologous peptide sequences only
Volcano plot will not look identical to the publication

Use ‘all’ features
equalizeMedians normalization
Tukey’s Median Polish summary method
Min_feature_count = 2

Only plot proteins detected in both groups (i.e., remove if ‘oneConditionMissing’ or pvalue = NA)

(MSstats v4.12.1 and MSstatsConvert v1.14.0)

15 of 17

Data filtering and extraction

HM and CM only
Save ComparisonResults as a .csv file
Create a separate results file containing the names and quantities of all proteins for each sample and filter to include proteins identified in at least 2 replicates per group
Create a Venn diagram of these proteins and save to file (This will be different to the Venn in the paper) - and one entry per protein

16 of 17

EnrichR: qualitative analysis

Perform an enrichment analysis of the differentially abundant proteins (up- and down-regulated) with EnrichR using the Molecular Function 2018 library.

Must use gene names for this to work!

Display the top 20 enriched terms on the plots.
Make sure the entire name of the term is visible on the plot.
Save the plots!

17 of 17

Summary

Mass spectrometry-based proteomics is a powerful tool for understanding biological systems

The skills you have learned in this module are required for typical proteomics data analysis workflows

I hope you’ve enjoyed the module and please feel free to reach out if you have any questions