Protein Structure Analysis �
Atul Nag
Kalinga Institute of Social Sciences
This Photo by Unknown Author is licensed under CC BY-SA
Outline
Structural Bioinformatics
Protein Primary Structures
Protein Secondary Structures
α Helix
β Sheet�
Loop or Coil
Protein Tertiary Structure
Protein Tertiary Structure
Protein Quaternary Structure
Biological Knowledge from Structures
X-Ray Crystallography
This Photo by Unknown Author is licensed under CC BY-SA
Nuclear Magnetic Resonance (NMR)
Protein Data Bank (PDB)
Growth of PDB
Access to Structures through NCBI
MMDB (Molecular Modeling Database):
Cn3D (“see in 3D”): NCBI’s 3-D protein structure viewer.
VAST (Vector Alignment Search Tool): for direct comparison of 3-D protein structures.
Ramachandran Plot
3D Visualization Tool - PyMol
https://pymol.org/2/
Cn3D: NCBI’s Structure Viewer
Other 3-D Visualization Tools
Protein Structure Comparison
SCOP
An Example of the SCOP Hierarchy
CATH
An Example of the CATH Hierarchy
Protein Structure Comparison
Protein Structure Alignment
How to Compare Structures?
DALI
VAST
Secondary Structure Prediction
Machine Learning Approach
PHDsec
PSIPRED
Prediction of 3-D Protein Structures
Sequence - Structure Relationship
80-residue stretch
(yellow) with 40%
sequence identity
Homology Modeling
Probably the most accurate method for protein structure prediction.
Five different steps:
Homology Modeling
Accuracy of structure prediction depends on the percent amino acid sequence identity shared between the query and template.
For >50% sequence identity, RMSD (Root Mean Square Deviation) is only 1 Å for main-chain atoms, which is comparable to the accuracy of a medium-resolution NMR structure or a low-resolution X-ray structure.
Homology modeling may not be used for predicting protein structures if the sequence identity is less than 30%.
Threading
Threading
Threading takes a query sequence and passes (threads) it through the 3-D structure of each protein in a fold database (known structures).
As a sequence is threaded, the fit of the sequence in the fold is evaluated using some functions of energy or packing efficiency.
Threading may find a common fold for proteins with essentially no sequence homology.
Structures predicted from threading techniques often are not of high quality (RMSD > 3 Å).
Ab Initio Structure Prediction
Ab initio prediction can be used when a protein sequence has no detectable homologues in PDB.
Protein folding is modeled based on global free-energy minimization.
Since the protein folding problem has not yet been solved, the ab initio prediction methods are still experimental and can be quite unreliable.
One of the top ab initio prediction methods is called Rosetta, which was found to be able to successfully predict 61% of structures (80 of 131) within 6.0 Å RMSD.
Comparing Structure Prediction Methods
Prediction of Solvent Accessibility
Solvent accessibility: the relative area of a residue’s surface that is exposed to the surrounding solvent.
The solvent-accessible residues may be part of an active site or a binding site, while the buried residues may play an important role in stabilizing the protein structure.
Predicting Transmembrane Segments
Transmembrane segments share common biophysical features (e.g., hydrophobicity).
PHDhtm (http://www.predictprotein.org/):
Signal Peptide Prediction
Extracellular proteins or proteins targeted to subcellular compartments contain short signal peptides (often at the N-terminal).
PSORT (http://psort.ims.u-tokyo.ac.jp/): A rule-based expert system for predicting subcellular localization of proteins from their amino acid sequences. The algorithm of k-nearest neighbors is used for reasoning.
SignalP (http://www.cbs.dtu.dk/services/SignalP/): predicts the presence and location of signal peptide cleavage sites using a combination of neural networks and HMMs.