1 of 25

Scaffold 3.0

Quick Start Guide

Version 12.00000

www.proteomics.ucdavis.edu

2 of 25

What is Scaffold

  • Scaffold is program designed to help display the data and results associated with protein/peptide identification generated from tandem mass spectra

  • Data in scaffold format is easy to distribute and archive and a free viewer is available online

  • Scaffold's graphical interface makes it easier to explore the results, especially when comparing multiple runs or controls vs experiments

3 of 25

What this Guide is for

  • To get you off and running analyzing your data using Scaffold 3.0 quickly

  • You probably are not going to read the manual or users guide included in the free scaffold viewer, although you really should….

4 of 25

What this Guide is not

  • An explanation of the statistics behind Scaffold
  • The only thing you should need to read to fully understand your data

5 of 25

First Steps

  • Install scaffold (if you need detailed instructions please see click here: Install guide

  • Open you scaffold file (*.sf3) obtained from the facility by double clicking the file

  • You should be presented with an option to use the free viewer (select that)
    • The Free viewer is only limited in that you cannot add more data to your file. You can purchase scaffold if you would like to do that.

6 of 25

First thing you see

Here the filters are set to show you proteins identified with a probability greater than 95% and identified by one peptide with a probability greater than 95% . You can change these

This is very important. These stars are showing you that there is ambiguity associated with linking the peptides we identified back to a unique protein (remember we digest your protein into peptides) See Slide 10 for more details

This is also very important. This is your False Discovery Rate (FDR) based on the number of Decoy (reverse) sequences found in your search. These are highlighted in pink in the list. See Slide 22 for more details

This Drop down box can be changed to display # of unique peptides, spectra etc.. Give it a try!

7 of 25

Clicking on this �Brings you to this page

These are the peptides we identified for this protein with their individual probabilities and scores.

This is how the peptides we identified match to the protein sequence e.g. the sequence coverage. Green = a post-translational modification (PTM). We search for a few PTM’s by default but not all.

This is the sequence coverage for this protein across samples (if you have more than one

8 of 25

If you click on a peptide you can see the spectra used for the assignment

You should see lots of matching Y and B (Blue and Red) ions. Be careful, or ask us if you see an ID assigned that has lots of black (unassigned) lines

9 of 25

Here is an example of a spectra assigned a high confidence that is incorrect

What happened here. Good question? Sometimes even good software can give incorrect results (probably because the peptide was very short...)

10 of 25

Protein ambiguity �(when you see that little red star on the first page)

Here we see peptides we identified map to more than one protein. Which one was in the sample or are both in the samples? Sometime it’s very hard to say. This data implies that both proteins are in the sample

This peptide (and several others) maps to more than one protein.

Scaffold tries to group proteins into clusters of proteins that cannot be distinguished from one another. Here you see two clusters (red and green). The No Group means these are subsumed proteins and are a “similar” protein on the samples or proteins page.

This is a very complex issue and can take lots of time and patience to sort out. Here is a good paper to read on the subject.Interpretation of Shotgun Proteomic Data

11 of 25

Things to keep in Mind

  • A certain % of your matches may be incorrect. In Scaffold you can adjust this % at the top.

  • If you select a 95% protein threshold Scaffold is calculating that of these proteins listed at greater than 95% confidence, the least confident protein identified has a 5% chance of being false.

  • Scaffold in our hands tends to be conservative, try lowering the peptide threshold to 90% or 80%. Do you see anything interesting? If so just keep in mind that your interesting protein has a greater possibility of being a false positive.

12 of 25

General Rule of thumb from Mike Myers concerning complex proteomic data

"The greater the coverage the more significant the match and the easier it will be to verify and perhaps make biological sense out of (due to the fact that we are dealing with a really complex system, it may be impossible to make "biological sense" out of even the most significant matches.)   That said, if you already have the reagents in hand (antibodies, qPCR primers, etc) than it may be worthwhile chasing down a weak hit.  If you have to generate new reagents, than maybe you should think twice (or even thrice)." 

13 of 25

Frequently asked Questions

  • Can I publish this data included in scaffold?
    • Absolutely
  • Should I do this without contacting the facility?
    • Probably not, this is very complex data so it is best to run it by someone who has a lot of experience with tandem mass spectrometry

14 of 25

Frequently asked Questions

  • Are all my protein identifications Scaffold is reporting correct?
  • Not really, there is always a chance they are incorrect, scaffold helps you calculate this probability so you can report it to others. Using the False Discovery Rate (FDR) can help you determine how good your list of proteins is. 

  • Is the probability Scaffold is reporting correct?
  • Good question, Scaffold uses published algorithms to calculate this probability, but they are by no means the only ones out there. Other programs will probably output slightly different results.

  • Can’t you give me a yes or no answer about my identification?
  • Nope sorry, just probabilities.

15 of 25

Frequently asked Questions

If I give my tandem mass spectrometry data to someone else (or another facility) will they ID the same proteins?

  • Maybe... believe it or not that study was recently conducted. Different groups were given identical spectra and an identical database and told to identify the proteins in the sample. They could not all agree, although the results were pretty similar
  • See for yourself at http://www.abrf.org/iprg
  • The moral of this story is that this type of data is very tricky and it can be very tricky to identify proteins once you chopped them up into peptides, especially if there are protein isoforms. Different programs and people will report different results. The key is to document what you did so others can reproduce it.

16 of 25

Frequently asked Questions

I gave you an identical sample 1 (week, month, year) ago and the results are different. How can this be right?

Actually this is normal, there are many many many variables that must be controlled to compare proteomics data of this type correctly. It can be done but you have to design the experiment correctly from the beginning.

Here are some nice papers on proteomics variability

17 of 25

The following slide details  all the variables that should be controlled at each step of the experiment if you want to compare proteomics results across samples or experiments (curtesey of Proteome Software)

18 of 25

hormones

Disulphide bonds

Enzymes in sample

Protein Id Variability

Mass spectrometer

Peptide LC separation

Protein separation

Sample prep

Samples

Digestion

Ionization

Fragmentation

Search engine

Protein Database

Splice Variant

Protein not in DB

Polymorphism

Error in sequence

Wrong precursor charge

Ions considered

Modifications

Peak Detection

Noise Reduction artifacts

Fragment ion

de-isotoping errors

Precursor ion based on C13 not C12

Dynamic exclusion

Variable elution times

Peptide solubility

Auto switch to MS/MS

Saturation

Random fluctuations in ion intensity

Electronic noise

Poor fragmentation

Internal fragment ions

Dominant neutral loss

Side chain mediated fragmentation

Mass accuracy

Too few peaks

Charge more than +3

Ionization efficiency

Injection fluctuations

Detection range

Pump pressure fluctuations

Abundant protein depletion

Co-elution of peptides

Column overloading

Missed cleavages

Incomplete denaturing

Protein 3D structure

Trypsin self digestion

Non-tryptic peptides

Chymotrypsin

Gel spots incorrectly aligned

Mass or PI outside gel range

Protein solubility

Staining efficiencies

Carrier proteins

Chemical contaminants

Inaccurate peak finding

Degradation by freeze/thaw cycles

Incomplete purification

Stress of harvesting

Oxidation

Chemical contaminants

Cell cycle

Immune status

Inflammation

Biological variation

Cell environment

Poor centroiding

Peptide too short

Charge +1 of non-peptides

Single dominant peak

Chemical contaminants

19 of 25

Frequently asked Questions

I see a fair amount of Keratin contamination, is this from me or you?

  •  We do a large amount of work to decrease the Keratin contamination in our laboratory, having said that it is possible it is coming from us, but more often than not it comes from the submitted samples. Not to worry, most of the time Keratin can be ignored if it is not at a large level. We can work with you to decrease Keratin contamination on future samples.

20 of 25

Frequently asked Questions

Why does my 1d gel band contain 30 proteins? I expected only 1

Yes this is quite common; there can be a large number of proteins in any 1d gel band.

21 of 25

Frequently asked Questions

Do you offer classes to help with this

 

  • We have monthly classes to help people with their data analysis. To see the schedule Please Click here

UC Davis Proteomics Core Class Schedule

22 of 25

False Discovery Rates (FDR)

These are easy to understand but also easy to misinterpret. 

Essentially what we do is take the protein database we search and reverse all the sequences and then add them to the forward sequences. Database = (F sequences + R sequences).  

Foward sequences = True Positives, Reverse sequences = False Positives.

Then when we ID a reverse (decoy sequence) we can estimate the percentage of your proteins on your protein list that may be false discoveries. 

Scaffold calculates it's False Discovery Rate (FDR)  as = Decoy/Target as discussed in JPR 2008 p45-6

Matrix science has a good tutorial that goes into this in more detail  (FDR Tutorial)

Here is another Very good review tutorial

So if you see a protein FDR of 5% this means that about 5% of the proteins on the list may be false. It does not tell you which ones are false (this can be confusing). You can use it to verify the integrity of your list as a whole, not to verify the probability that any one protein on the list is identified correctly. This information is located in the lower left hand corner of scaffold and the decoy hits are highlighted in Pink 

This example shows a decoy protein that scaffold says has a 98% probability of being correct, yet it is a decoy (scary huh) . Think of it as a double check on Scaffolds modeling statistics

23 of 25

Further Reading�You should really read these

  • Proteome software questions and answers

  • Improving Sensitivity by Probabilistically Combining Results from Multiple MS/MS Search Methodologies

  • Scaffold Frequently asked questions (please read)

24 of 25

Additional Protocols and Tutorials from UC Davis Proteomics

25 of 25

Final words

Please do not hesitate to contact us if you have any questions concerning the data distributed by this facility.��UC Davis Proteomics web site��Contact info��Brett S. Phinney�530-754-5298�