1 of 38

AlphaFold2 – Use and Applications

Clinton Lau, Carter Lab

12/04/2022

Thanks to: Sami Chaaban, Jake Grimmett, Toby Darling, Takanori Nakane, George Ghanim, Andrew Carter, Conny Yu, Markus Höpfler, Aaron Lewis, Joe Yeeles, Sean Munro, Manu Hegde, Manu Derivery, Kelly Nguyen, Anna Howes, Yohei Ohashi, Ketan Malhotra, Stan Yatskevich, Eva Absmeier, Mike Jenkyn Bedford, Roger Williams, Chris Johnson, Stephen McLaughlin (and others!)

2 of 38

Outline

What is AlphaFold2?

How to use AlphaFold2

Tricky proteins and other considerations

3 of 38

AlphaFold2: another tool for our protein structure toolbox

  • Only 17% of the human proteome has experimental structural data
  • AlphaFold2 is a (sometimes scarily) accurate protein folding prediction software
    • Trained neural network that predicts protein structure from sequence
    • Monomers (great) and protein complexes (good)

Porto-Pardo et al 2022

4 of 38

Inputs

Protein sequence

5 of 38

AlphaFold2: Under the hood

Sequence

Predicted

Structure

‘Evoformer’

Module (48x)

‘Structure’

Module (8x)

Recycling (3x)

Adapted from Jumper et al. 2021

Multiple Sequence

Alignment

PDB Templates

Pairwise distance

plot

6 of 38

Key outputs – Overview

  • Multiple sequence alignment metrics

  • Structure (each run gives 5 ranked structures default)
    • Ranked by either average pLDDT or pTM

  • Per-residue confidence pLDDT

  • PAE plot (predicted alignment error)

7 of 38

Key outputs – Multiple sequence alignment metrics

  • Should have at least 30 sequences

8 of 38

Key outputs – Structures

  • Ranked by either average pLDDT (monomer) or pTM (multimer)
    • Higher pLDDT/pTM is better

Example: Dynein LIC

Ras-like domain

H

H

H

H

N

C

N

C

9 of 38

Key outputs – Per-residue confidence (stored in B-factor field)

  • pLDDT (considers local environment <4 Å around Cα)
  • Good predictor of secondary structure and disorder

pLDDT scoring

Ras-like domain

H

H

H

H

N

C

<50

Disordered/bad prediction

50-70

Low quality

70-90

Backbone probably correct

>90

High quality

N

C

10 of 38

Key outputs – Per-residue confidence (stored in B-factor field)

  • pLDDT (considers local environment <4 Å around Cα)
  • Good predictor of secondary structure and disorder

N

C

pLDDT scoring

Ras-like domain

H

H

H

H

N

C

<50

Disordered/bad prediction

50-70

Low quality

70-90

Backbone probably correct

>90

High quality

11 of 38

Key outputs – PAE plot (predicted alignment error)

Ras-like domain

H

H

H

H

N

C

N

C

  • If the predicted and true structures were aligned on residue 2, what is the error in the position of residue 1?
  • Lower is better

PAE plot

Predicted error (Å)

Residue 1

Residue 2

N

C

12 of 38

Outline

What is AlphaFold2?

How to use AlphaFold2

Tricky proteins and other considerations

13 of 38

How to use AlphaFold2 – Overview

  • The AlphaFold protein structure database
    • https://alphafold.ebi.ac.uk/

  • AlphaFold-related tools
    • Chimera X and Dali

  • Running your own sequence
    • AlphaFold2 or ColabFold
      • Online and Local implementations
      • AlphaFold2 vs AlphaFold-Multimer (v1 or v2)

14 of 38

How to use AlphaFold2 – the AlphaFold protein structure database

  • The AlphaFold protein structure database
    • https://alphafold.ebi.ac.uk/
    • Integrated with Uniprot
    • Only predicts monomers!

Arl3

BICDR1

15 of 38

How to use AlphaFold2 – AlphaFold assisted tools

  • UCSF Chimera X
    • Tools to use AlphaFold2
  • DALI
    • Structural homology search against AlphaFold predictions

Hook3 Hook domain

16 of 38

How to use AlphaFold2 – Running your own sequence

  • Deepmind AlphaFold vs ColabFold
    • ColabFold generates MSA much quicker

  • Local Colabfold implementation on our Unix cluster
    • Alternatively online implementations are available
    • Extra PAE visualisation tool: pointpae (Sami Chaaban)

>1

SEGVLASFFNSLLSKKTGSPGSP

>2

MENEIFTPLLEQFMTSPLVTWVKTFGPLAAGNGTNLDEYVALVDGVFLNQVMLQINPKLESQRVNKKVNNDASLRMHNLSILVRQIKFYYQETLQQLIMMSLPNVLIIGKNPFSEQGTEEVKKLLLLLLGCAVQCQKKEEFIERIQGLDFDTKAAVAAHIQEVTHNQE

17 of 38

What is AlphaFold2 useful for?

  • Predicting novel protein structures & complexes
  • Models for CryoEM and Crystallography
  • Investigations into intrinsically disordered proteins (IDPs)
  • Guidance for protein construct design
  • Oligomerisation prediction
    • Predict dimer, trimer, tetramer, etc. and look for pTMmax

Advanced uses

  • Predicting alternative protein conformations
  • Predicting effects of mutation

18 of 38

How “scarily accurate” can AlphaFold2 be?

7 Å map (Kai Zhang)

p150 ICD

Dynactin

AlphaFold2

19 of 38

Outline

What is AlphaFold2?

How to use AlphaFold2

Tricky proteins and other considerations

20 of 38

Trickier proteins

  • Large proteins due to memory issues
  • Coiled coils/elongated proteins
  • Proteins with disordered regions

21 of 38

Good example of a bad example: the whole of p150

p150

Cap-Gly

Coiled-coil (CC) 1A

CC1B

ICD

CC2

p150 dimer

projection

Dynactin

C terminal domain

22 of 38

Solution 1: Fragment-based approach

p150

1

3

2

4

1

2

3

4

Cap-Gly

Coiled-coil (CC) 1A

CC1B

ICD

CC2

C terminal domain

p150 projection

23 of 38

A second example: LIC/BICDR1

BICDR1

CC1

CC2

N

C

BICDR1

LIC

helix

Dynactin

Dynein

BICDR1

LIC helix

24 of 38

Solution 2: Experiment with “construct” length

BICDR1

LIC

PAE plot

BICDR1

CC1

CC2

N

C

Dynactin

Dynein

BICDR1

LIC helix

BICDR1

LIC

helix

25 of 38

Solution 2: Experiment with “construct” length

PAE plot

BICDR1

LIC

BICDR1

CC1

CC2

N

C

Dynactin

Dynein

BICDR1

LIC helix

BICDR1

LIC

helix

26 of 38

Trickier proteins

  • Large proteins due to memory issues
  • Coiled coils/elongated proteins
  • Proteins with disordered regions

  • Membrane proteins
    • Transmembrane proteins ok; peripheral proteins less so
  • Proteins with ligands/nucleic acids/post-translational modifications
    • Can work, with holes instead of ligands
  • Antibody/antigen interactions
    • ABlooper
  • Proteins with very low homology (sequence or structural)
    • Custom MSAs may help

“Construct” design

27 of 38

It’s complicated: different versions of AlphaFold2

  • AlphaFold2 (original): Used by default for monomers
    • Can trick it into predicting complexes
    • Low false positive rate, higher false negative rate
    • colabfold_legacy on our cluster
  • AlphaFold-multimer v1
    • Good update to predict complexes
    • Sometimes predicted proteins inside other proteins….
  • AlphaFold-multimer v2
    • Important update to fix issues in v1
    • Good reports from Derivery Lab and others
    • Some jobs are slightly worse than multimer v1

28 of 38

AlphaFold2 vs AlphaFold-Multimer

AlphaFold2 (original)

AlphaFold-Multimer

Adaptor on wrong side!

29 of 38

p150 through the versions

p150 dimer

projection

AlphaFold2 (original)

AlphaFold2-multimer v1

AlphaFold2-multimer v2

30 of 38

Problems – false positives in multimer?

AlphaFold2 (original)

AlphaFold2-multimer v1

AlphaFold2-multimer v2

Dynein LIC (yellow) and EB1 (blue) don’t bind

too far for interaction

?

??

31 of 38

What to believe when predicting structures?

  • PAE plots are a very good place to start
  • Combine with other knowledge if possible…
    • Biochemical/biophysics knowledge (not sponsored by Biophysics)
    • Structures of related proteins

BICD2

32 of 38

What to believe when predicting structures?

  • PAE plots are a very good place to start
  • Combine with other knowledge if possible…
    • Biochemical/biophysics knowledge (not sponsored by Biophysics)
    • Structures of related proteins

LIC1(yellow):Girdin

33 of 38

PAEs don’t solve all your problems

AlphaFold2

AlphaFold2-multimer v1

AlphaFold2-multimer v2

Dynein LIC (yellow) and EB1 (blue) don’t bind

34 of 38

A list of tips and tricks

  • Sometimes full proteins give better results, sometimes truncations give better results
    • For interactions, try to use interacting regions only
    • For elongated coiled coil structures: try breaking it into fragments
  • Try increasing the number of recycling steps (up to 15)
    • Add “--num_recycles=15”
  • Amber Relax (full atom optimisation) is not often useful, and greatly increases run time.
    • Add “--use_amber=True”
    • Useful for when building models for cryo-EM/crystallography

  • 5 runs (5 structures each) with random starting seeds (in development)
  • Provide templates or MSA (in development)

35 of 38

Summary

  • AlphaFold is a useful tool to help answer biological questions
    • Very low time cost

  • We can use AlphaFold using online or local implementations
    • Subscribe to alphafold-list for updates

  • Beware of the limitations, particularly with AlphaFold Multimer
    • AlphaFold2 has a low false positive rate
    • AlphaFold-Multimer may have a higher false positive rate?

36 of 38

Other resources

    • AlphaFold2 papers
    • Jumper et al. 2021; Evans et al. 2022
  • ColabFold
    • Mirdita et al. 2021

    • Talks/Seminars
    • “How to interpret AlphaFold structures” EMBL-EBI talk series
    • “ColabFold - Making protein folding accessible to all via Google Colab!”

  • Other software
    • RoseTTAFold (Minkyung et al. 2021) – available on ColabFold

37 of 38

Running locally – set up

  • You will need a SLURM account to run AlphaFold jobs on the cluster
  • Then ssh into the LMB unix systems (e.g. hal/hex/max)
  • Add the following lines to your startup config file (.cshrc)
    • source /cephfs/public/ColabFold/lmb/utils/sourcecolabfold.csh
  • Restart your terminal or source ~/.cshrc
  • Type “colabfold” for instructions

38 of 38

Alphafold overview

Sequence

Predicted

Structure

‘Evoformer’

module

‘Structure’

module

Recycling (3x)

Adapted from Jumper et al. 2021

Multiple Sequence

Alignment

PDB Templates

Pairwise distance

plot

Amber relaxation step