Machine Learning for Biochemical Applications
Lecture 7: Protein Structure Prediction
October 25th, 2023�
Lecturers: Daryl Barth & Phillip Woolley
Why does protein structure matter?
https://byjus.com/biology/proteins-structure-and-functions/
Is amino acid sequence all you need?
https://en.wikipedia.org/wiki/Anfinsen's_dogma
Protein Structure Prediction Problem
CASP Competition drove protein structure prediction
https://predictioncenter.org/
CASP14 (2020) Results: Entrance of AlphaFold2
https://predictioncenter.org/
AlphaFold2: the dawn of a new age
https://www.nature.com/articles/s41586-021-03819-2
What is an MSA?
https://www.nature.com/articles/s41586-021-03819-2
MSA: Multiple Sequence Alignment
MSA captures evolutionary information for folding
https://www.nature.com/articles/s41586-021-03819-2
Template matching instantiates pair representation
https://www.nature.com/articles/s41586-021-03819-2
Pair representations are learned contact features
https://www.nature.com/articles/s41586-021-03819-2
Transformer-like module updates representations
https://www.nature.com/articles/s41586-021-03819-2
Structure module converts representation to structure
https://www.nature.com/articles/s41586-021-03819-2
Recycling iteratively refines structure
https://www.nature.com/articles/s41586-021-03819-2
How do we know it works?
https://phys.org/news/2020-11-ai-solution-year-old-protein.html
The 3Ps: pLDDT, PAE, and pTM
https://www.nature.com/articles/s41586-021-03819-2
pLDDT and PAE visually
pLDDT: predicted Local Distance Difference Test | PAE: Predicted Aligned Error
https://www.rbvi.ucsf.edu/chimerax/data/pae-apr2022/pae.html
Disadvantages to AlphaFold
RoseTTAFold
David Baker, still in the game.
AlphaFold2 was/is the SOTA, but not open source (the model weights were not publicly available).
Mimicked AlphaFold2, featuring…
Accuracy approached AlphaFold2 while allowing for modularity
It’s not over till it’s over, AlphaFold
RoseTTAFold: Under the Hood
Updated Disadvantages to AlphaFold
Updated Disadvantages to AlphaFold
Evolutionary Scale Modeling (ESM)
Meta project launched in 2022, cancelled in 2023… (revived in 2023 ~ EvolutionaryScale)
Aimed to approximate AlphaFold’s prediction accuracy while increasing speed.
ESM and ESM2 are extremely lightweight,
Super easy to setup and use by comparison.
ELI5: Replaced the AlphaFold2 multiple sequence alignment with an LLM (ESM1, later ESM2).
“leaner, simpler, cheaper”
Evolutionary Scale Modeling (ESM)
Evolutionary Scale Modeling (ESM)
Evolutionary Scale Modeling (ESM)
ESMAtlas
Updated Disadvantages to AlphaFold
ColabFold
Martin Steinegger enters the chat.
Why compromise with lower accuracy but faster prediction when you can have both?
Replaces the slow MSA in AF2 with zippy fast MMseqs2
Includes all AlphaFold2 models, including multimer
ELI5: AlphaFold2 but faster
Martin Steinegger! 😍
Updated Disadvantages to AlphaFold
Has protein folding been solved?
Let’s give it a try!
AlphaFold2 EvoFormer
AlphaFold2 Structure Model