Scaffold 3.0
Quick Start Guide
Version 12.00000
www.proteomics.ucdavis.edu
What is Scaffold
What this Guide is for
What this Guide is not
First Steps
First thing you see
Here the filters are set to show you proteins identified with a probability greater than 95% and identified by one peptide with a probability greater than 95% . You can change these
This is very important. These stars are showing you that there is ambiguity associated with linking the peptides we identified back to a unique protein (remember we digest your protein into peptides) See Slide 10 for more details
This is also very important. This is your False Discovery Rate (FDR) based on the number of Decoy (reverse) sequences found in your search. These are highlighted in pink in the list. See Slide 22 for more details
This Drop down box can be changed to display # of unique peptides, spectra etc.. Give it a try!
Clicking on this �Brings you to this page
These are the peptides we identified for this protein with their individual probabilities and scores.
This is how the peptides we identified match to the protein sequence e.g. the sequence coverage. Green = a post-translational modification (PTM). We search for a few PTM’s by default but not all.
This is the sequence coverage for this protein across samples (if you have more than one
If you click on a peptide you can see the spectra used for the assignment
You should see lots of matching Y and B (Blue and Red) ions. Be careful, or ask us if you see an ID assigned that has lots of black (unassigned) lines
Here is an example of a spectra assigned a high confidence that is incorrect
What happened here. Good question? Sometimes even good software can give incorrect results (probably because the peptide was very short...)
Protein ambiguity �(when you see that little red star on the first page)
Here we see peptides we identified map to more than one protein. Which one was in the sample or are both in the samples? Sometime it’s very hard to say. This data implies that both proteins are in the sample
This peptide (and several others) maps to more than one protein.
Scaffold tries to group proteins into clusters of proteins that cannot be distinguished from one another. Here you see two clusters (red and green). The No Group means these are subsumed proteins and are a “similar” protein on the samples or proteins page.
This is a very complex issue and can take lots of time and patience to sort out. Here is a good paper to read on the subject.�Interpretation of Shotgun Proteomic Data
Things to keep in Mind
General Rule of thumb from Mike Myers concerning complex proteomic data
"The greater the coverage the more significant the match and the easier it will be to verify and perhaps make biological sense out of (due to the fact that we are dealing with a really complex system, it may be impossible to make "biological sense" out of even the most significant matches.) That said, if you already have the reagents in hand (antibodies, qPCR primers, etc) than it may be worthwhile chasing down a weak hit. If you have to generate new reagents, than maybe you should think twice (or even thrice)."
Frequently asked Questions
Frequently asked Questions
Frequently asked Questions
If I give my tandem mass spectrometry data to someone else (or another facility) will they ID the same proteins?
Frequently asked Questions
I gave you an identical sample 1 (week, month, year) ago and the results are different. How can this be right?
Actually this is normal, there are many many many variables that must be controlled to compare proteomics data of this type correctly. It can be done but you have to design the experiment correctly from the beginning.
Here are some nice papers on proteomics variability
The following slide details all the variables that should be controlled at each step of the experiment if you want to compare proteomics results across samples or experiments (curtesey of Proteome Software)
hormones
Disulphide bonds
Enzymes in sample
Protein Id Variability
Mass spectrometer
Peptide LC separation
Protein separation
Sample prep
Samples
Digestion
Ionization
Fragmentation
Search engine
Protein Database
Splice Variant
Protein not in DB
Polymorphism
Error in sequence
Wrong precursor charge
Ions considered
Modifications
Peak Detection
Noise Reduction artifacts
Fragment ion
de-isotoping errors
Precursor ion based on C13 not C12
Dynamic exclusion
Variable elution times
Peptide solubility
Auto switch to MS/MS
Saturation
Random fluctuations in ion intensity
Electronic noise
Poor fragmentation
Internal fragment ions
Dominant neutral loss
Side chain mediated fragmentation
Mass accuracy
Too few peaks
Charge more than +3
Ionization efficiency
Injection fluctuations
Detection range
Pump pressure fluctuations
Abundant protein depletion
Co-elution of peptides
Column overloading
Missed cleavages
Incomplete denaturing
Protein 3D structure
Trypsin self digestion
Non-tryptic peptides
Chymotrypsin
Gel spots incorrectly aligned
Mass or PI outside gel range
Protein solubility
Staining efficiencies
Carrier proteins
Chemical contaminants
Inaccurate peak finding
Degradation by freeze/thaw cycles
Incomplete purification
Stress of harvesting
Oxidation
Chemical contaminants
Cell cycle
Immune status
Inflammation
Biological variation
Cell environment
Poor centroiding
Peptide too short
Charge +1 of non-peptides
Single dominant peak
Chemical contaminants
Frequently asked Questions
I see a fair amount of Keratin contamination, is this from me or you?
Frequently asked Questions
Why does my 1d gel band contain 30 proteins? I expected only 1
Yes this is quite common; there can be a large number of proteins in any 1d gel band.
Frequently asked Questions
Do you offer classes to help with this
UC Davis Proteomics Core Class Schedule
False Discovery Rates (FDR)
These are easy to understand but also easy to misinterpret.
Essentially what we do is take the protein database we search and reverse all the sequences and then add them to the forward sequences. Database = (F sequences + R sequences).
Foward sequences = True Positives, Reverse sequences = False Positives.
Then when we ID a reverse (decoy sequence) we can estimate the percentage of your proteins on your protein list that may be false discoveries.
Scaffold calculates it's False Discovery Rate (FDR) as = Decoy/Target as discussed in JPR 2008 p45-6
Matrix science has a good tutorial that goes into this in more detail (FDR Tutorial)
Here is another Very good review tutorial
So if you see a protein FDR of 5% this means that about 5% of the proteins on the list may be false. It does not tell you which ones are false (this can be confusing). You can use it to verify the integrity of your list as a whole, not to verify the probability that any one protein on the list is identified correctly. This information is located in the lower left hand corner of scaffold and the decoy hits are highlighted in Pink
This example shows a decoy protein that scaffold says has a 98% probability of being correct, yet it is a decoy (scary huh) . Think of it as a double check on Scaffolds modeling statistics
Further Reading�You should really read these
Additional Protocols and Tutorials from UC Davis Proteomics
Final words
Please do not hesitate to contact us if you have any questions concerning the data distributed by this facility.��UC Davis Proteomics web site��Contact info��Brett S. Phinney�530-754-5298�