1 of 17

Applied Bioinformatics 2025�Week 3 Session 2�Differential Protein Abundance and Volcano Plots

Natalie Turner, PhD

Postdoctoral Fellow – Yates Lab

Department of Molecular Medicine

naturner@scripps.edu

2 of 17

MSstats vignette/manual - page 11

3 of 17

MSstatsGroupComparison

  • Tests for significant changes in protein abundance between different conditions
  • Can be used for label and label-free workflows
  • Experimental design of single or repeated measures is automatically determined according to annotations and group levels

4 of 17

groupComparisonPlots:�Volcano Plot

Visualization of differential protein abundance

X axis = log2 fold change

Y axis = -log10 p value

Dotted lines indicate thresholds for significance

Colored dots depict more abundant (red), less abundant (blue), or not significantly different (grey) proteins

5 of 17

Go to your R Notebook

  • Run the code up until # Matrix creation

6 of 17

Creating a comparison matrix

  • First – determine the group order (normally alphanumerical)

  • Create comparison matrix:

🡪 Comparisons are denoted as ‘-1’ and ‘1’; if there are more than 2 groups, all other conditions are denoted as ‘0’

  • IMPORTANT: Name the rows appropriately; in this case:

row.names(comparison) 🡨 c("HM-CM")

# Matrix creation

levels(ProcessedData$ProteinLevelData$GROUP) # gets group levels/prints to console

comparison <- matrix(c(-1,1),nrow=1) #creates comparison matrix

7 of 17

MSstatsGroupComparison

  • Takes the output of MSstatsPrepareForGroupComparison(ProcessedData) and the comparison matrix as input
  • The resulting data is then used as input for Volcano Plot generation
  • Alternatively, use the groupComparison function for simplified analysis (see vignette for details)

8 of 17

Go to your R Notebook

  • Finish running the code and generate a Volcano Plot
  • Change some of the Volcano plot parameters and recreate the plot to see the difference
  • Save the ProteinResults$ComparisonResults data as a .csv file so you can manually inspect the proteins that are significantly different between groups.

= adj p value < 0.05; Log2FC > +/- 0.58

9 of 17

Capstone Task Walkthrough

10 of 17

Milk-derived Extracellular Vesicles (EVs)

  • Milk EVs are released within the mammary gland into milk
  • Nano-sized (small EVs ~50 - 200 nm)
  • Involved in cell-cell signalling and communication
  • Relay important messages and molecules between mother and infant
  • Important for immunity, growth and development

11 of 17

Filtering, pre-processing, MSstats

  1. MSstats and data formatting/pre-processing:
    • Perform differential protein abundance analysis and generate a volcano plot.
    • Replicate the volcano plot settings as per the published paper.
    • Remove trypsin and iRT standards from the dataframe before processing.
    • Attempt to filter the results to include homologous peptide sequences and plot these results as your final analysis.
    • If you are successful or unsuccessful, provide a brief summary of the various ways you tried to filter the results. Explain what you think the impact on the final results will be if all of the peptides (homologous and non-homologous) are used for quantification.
    • Create and save the volcano plot.

12 of 17

mixOmics and MSstats: Plot creation

  • PCA plot – include all 3 groups

  • Different to the in-class exercise: Normalization strategy (normalizeMedianValues instead of normalizeQuantiles)

13 of 17

DIANNtoMSstatsFormat settings

When using the DIANNtoMSstatsFormat function for the capstone task, use the following settings:

global_qvalue_cutoff = 0.01,

qvalue_cutoff = 0.01,

pg_qvalue_cutoff = 0.01,

useUniquePeptide = TRUE,

removeFewMeasurements = TRUE,

removeOxidationMpeptides = FALSE,

removeProtein_with1Feature = TRUE,

MBR = TRUE

For any other options in this function (mainly related to log information), you can use the default settings or whatever you prefer, as it will not alter the outcome of the data processing.

14 of 17

Volcano plot with updated MSstats

  • Homologous peptide sequences only
  • Volcano plot will not look identical to the publication
    • Use ‘all’ features
    • equalizeMedians normalization
    • Tukey’s Median Polish summary method
    • Min_feature_count = 2
  • Only plot proteins detected in both groups (i.e., remove if ‘oneConditionMissing’ or pvalue = NA)

(MSstats v4.12.1 and MSstatsConvert v1.14.0)

15 of 17

Data filtering and extraction

  • HM and CM only
  • Save ComparisonResults as a .csv file
  • Create a separate results file containing the names and quantities of all proteins for each sample and filter to include proteins identified in at least 2 replicates per group
  • Create a Venn diagram of these proteins and save to file (This will be different to the Venn in the paper) - and one entry per protein

16 of 17

EnrichR: qualitative analysis

  • Perform an enrichment analysis of the differentially abundant proteins (up- and down-regulated) with EnrichR using the Molecular Function 2018 library.
    • Must use gene names for this to work!
  • Display the top 20 enriched terms on the plots.
  • Make sure the entire name of the term is visible on the plot.
  • Save the plots!

17 of 17

Summary

Mass spectrometry-based proteomics is a powerful tool for understanding biological systems

The skills you have learned in this module are required for typical proteomics data analysis workflows

I hope you’ve enjoyed the module and please feel free to reach out if you have any questions