Proteogenomics 2: Database Searching
Galaxy Training Network Smörgåsbord
February 18th, 2021
‘Omics Technologies In The Era Of Systems Biology
Mass Spectrometry-based Proteomics
Looking Beyond The Known Proteome
Mass spectrum
Reference Protein Database
from genomic annotation
Cancer / Disease related
Databases such as COSMIC,
IARC p53, OMIM…
Deep genome sequencing data from ICGC, TCGA and CPTAC
RNASeq data
(Customized OR Combined)
6-frame DNA sequences.
3-frame cDNA sequences.
Identification of peptides corresponding to novel proteoforms.
Proteogenomics Workflows in Galaxy
Database Searching
Using MS/MS data
Proteogenomics Database Search Workflow
Database Search Workflow: Input files
Input files for Database Searching
Created with Biorender.com
Input files for Database Searching
Created with Biorender.com
Input files for Database Searching
Created with Biorender.com
Input files for Database Searching
Created with Biorender.com
Database Search Workflow: SearchGUI
SearchGUI matches MS/MS spectra to peptide sequences
Created with Biorender.com
SearchGUI matches MS/MS spectra to peptide sequences
Database Search Workflow: Peptide Shaker
Peptide Shaker filters the results of Search GUI
Vaudel et al. Nat Biotechnol 33, 22–24 (2015).
Database Search Workflow: Data Filtration
Data Filtration
All PSMs identified in MS data
Novel PSMs
Novel Peptides
Novel Peptides
Contaminants
Normal Peptides
Database Search Workflow: Mz to SQLite
MZ to SQLite
Database Search Workflow: Tabular to FASTA
Tabular to FASTA
Now let’s go through how to set up this workflow in Galaxy…
Other Galaxy-P Tutorials in the GTN Smörgåsbord
Custom Database Creation
James Johnson
Novel Peptide Analysis
Subina Mehta
Metaproteomics
Pratik Jagtap
Introduction to Proteogenomics
Tim Griffin
Supplementary tutorials for proteogenomics can be found at the Galaxy Training Network ��https://training.galaxyproject.org/training-material/topics/proteomics/tutorials/proteogenomics-dbsearch/tutorial.html