Published using Google Docs
4-1_Maxquant and MSstats on a clinical cohort FAQ
Updated automatically every 5 minutes

Frequently Asked Questions

Module: Proteomics

Training session: Maxquant and MSstats on a clinical cohort

Presenter: Melanie Föll

-----

Example history: https://usegalaxy.eu/u/melanie-foell/h/maxquant-and-msstats-label-free-proteomics-training 

----

Q: My jobs are not running / I cannot see the history overview menu

A: Please make sure you are logged in. At the top menu bar, you should see a section labeled “User”. If you see “Login/Register” here you are not logged in.

Q: If we want to import a non-standard tool from the toolshed into Galaxy Eu, whom do we contact?

A: Generally one needs to contact the usegalaxy.eu team and mention the respective tool and maybe also some use case examples. Further details on tool (https://galaxyproject.org/tools/)

Q: How many proteins can be identified and quantified in shotgun proteomics?

A: This is depending on the sample, the used technique(s) and the mass spectrometer. Routinely most labs obtain ~4000 proteins, but with more effort > 10.000 proteins could be analyzed in a single run.

Q: What is the advantage of breaking down protein to peptides before mass spec?

A: Mass spectrometry works better for peptides: LC separation and ionization is working better on peptides than on proteins and proteins generate too complex and overlaying mass spectra due to their isotopes and their mass might be shifted due to posttranslational modifications or point mutations.

Q: Does MaxQuant in Galaxy support TMT, iTRAQ, etc.?

A: Yes, iTRAQ 4 and 8 plex;  TMT 2,6,8,10,11 plex; iodoTMT6plex

Q: What is the largest number of RAW files for which you run in MaxQuant?

A: 97 raw files (reanalysis of a published Covid-study)

https://covid19.galaxyproject.org/proteomics/PXD018117/

Took 23h 30 min to finish maxquant run on all files

Q: Does MaxQuant give as output possibility the PSMs and PEPs?

A: Many output options, evidence & msms contain e.g. PSM or feature level info

Q: Do you need to merge the databases? Because you can select multiple fasta files in MaxQuant.

A: For MaxQuant one does not need to merge the databases, also MaxQuant offers the function to add common contaminants to the provided fasta.

Q: Normally MaxQuant has a default contaminant fasta that we don’t have to input ourselves. Do we need this for MQ in galaxy?

A: MaxQuant in galaxy comes with the option to add contaminants automatically (one does not need to add contaminants to the fasta file)

Q: If you use a mqpar file, can you include modifications that are not in the Galaxy version? For instance, propionamide (Cys alkylation by acrylamide).

A: No, one is limited to the modifications which are installed in MaxQuant. The mqpar only contains more parameters / options than the GUI in galaxy.
Note: one must use an mqpar from the same version like MaxQuant!

Q: Related question: can we configure MQ with a new modification not existing in MQ?

A: This cannot be done by the Galaxy user and is also a not trivial procedure for Galaxy Tool developers due to how this is setup in the MaxQuant software. You can leave your modiciation wishes here in this document or in a github issue and we can try to install it (https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/maxquant)

Q: For the “quantitation method” what is the default if I just leave it as “None”? Label free?

A: It will report raw intensity (NON-normalized) values which were not normalized like e.g. the LFQ intensities.

Q: Are the MaxQuant output generated from galaxy compatible to use on MaxQuant on our computer for functions like inspecting MS/Ms spectra?

A: Generally yes, version to version compatibility, but probably the visualization function in the local maxquant does not work with search results from galaxy.

Q: Related question: but the output is enough to generate MS/MS spectra in another tool? Some journals always ask for the spectra….

A: Unfortunately only the maxquant outputs from the txt folder are currently available in galaxy.

Q: A more automated way of doing the visualization is to use LFQ Analyst: https://bioinformatics.erc.monash.edu/apps/LFQ-Analyst/. Takes the ProteinGroups file from MQ.

A: Thanks!

Q: Instead of making a collection can we do the MaxQuant analysis with single sample inputs?

A: Yes in MaxQuant it should give the same result if you choose multiple files here instead of a collection. But it makes the history more clean and easier to navigate.

Q: When can you use (or cannot use) Match between runs in MaxQuant?

A: No golden rule here. For quantitative comparison of different sample groups it can be valuable to use MBR to increase the number of identified + quantified proteins in all samples and then have more proteins that occur in most of the samples to compare them.

Q: We normally do statistics following Maxquant analysis using Perseus. Is this available in Galaxy?

A: Perseus is not available in Galaxy (is also freeware but not open source).

Q: Can MSstats also be applied after DIA?

A:Yes, MSstats can do statistics for DDA, DIA and SRM data.

Q: In the statistical analysis using MSstats: could you explain once more what ‘compare groups = yes’ means? And the comparison matrix is used to define the contrast between the 2 groups?

A: MSstats consists of three parts:

  1. Reading the input files and converting them into an MSstats compatible format, doing some processing of the data at the same time
  2. Data processing: such as protein inference (summary), log2 transformation, normalization and missing value imputation
  3.  compare groups = yes, means that the third step is performed, which is statistical analysis: Statistical modelling to find differentially abundant protein between different groups. The groups should be specified as “condition” in the annotation file and the group comparison matrix file specifies which groups to compare against each other. In the example this is quite simple because there are only 2 groups, with 3 or more groups the comparison matrix could become more complex.

Q: You mention to median normalise the LFQ values (‘before statistical analysis it is recommended to median normalize the LFQ intensities for each sample’). I am not sure what this exactly means, or does it refer to the log2 transformation? Besides log2 transformation, is there any other normalisation (or centering or scaling) that you can advise before doing statistical analyses?

A: Median normalization typically refers to subtracting the median of all intensities within one sample from all of the intensities (e.g. Intensity of Protein A - Median of all intensities from Sample 1) , to account for measurement variations. Before normalization log2 transformation is required since many statistical tests demand that the data is actually normal distributed. (Non log intensities show very high values but have a minimum (limit of quantification) leading to a somehow right skewed distribution, after log-transformation the intensity distribution is more like a gaussian distribution. Beside the median (or median-polish) normalization there is also other e.g. the quantile  normalization.

Additional online resources to learn about proteomic data analysis: