Case Study 1 : Protein Function Annotation
Presenters : Bishnu Sarker, Sayane Shome
Date: 17-18 July, 2023
Learning Objectives of the next two sessions
To expand the concepts we learnt in previous sessions into practical applications such as protein function prediction and metal binding site prediction in proteins.
2
Problem Definition
Given a protein sequence of length L,the objective is to assign functional terms such as Gene Ontologies or Enzyme commission number.
3
Gene Ontologies
4
Background
Manual Annotation
5
Curators
Background
Automatic Annotation
6
Protein Function Annotation
Input Data and Data Sources
7
Protein Function Annotation
Output Data and Data Sources
8
Protein Function Annotation
Approach
9
Obtaining pretrained embeddings for the protein sequence dataset from Uniprot
Using ML models for classifying the sequences with the GO IDs/EC IDs
Obtaining protein sequence dataset from Uniprot and associated GO IDs/EC IDs
Evaluating ML model performance using metrics
Protein Function Annotation
Future Challenges
10
Explainability
Computational Cost
Multi-omics Integration
03
01
02
Hands on Tutorial
Google colab notebook
11
Break !
We will reconvene in 15 mins.
Next in line : Hands-on tutorial on Metal-binding site prediction
12
Case Study 2 : Metal Binding Site Prediction
Presenters : Bishnu Sarker, Sayane Shome
Date: 17-18 July, 2023
Problem Definition
Given a protein sequence of length L and residue positions of the metal-binding sites in the protein,the objective is to find which metal ions will most likely bind to the sites.
We formulate this as a machine learning problem to be the focus of this hands-on tutorial.
14
Metal-Binding Site Prediction
Input/Output Data and Data Sources
Input Data
Output Data
15
Metal-Binding Site Prediction
Approach
16
Obtaining positional encodings for the residue positions encompassing the binding sites
Using ML models for predicting the metal ions binding at the sites
Obtaining protein sequence dataset from Uniprot and associated pretrained embeddings
Evaluating ML model performance using metrics
Metal-Binding Site Prediction
Approach
17
Sequence embedding
Positional encoding
Protein Sequence
Predicted
metal ions
Figure from : https://www.biorxiv.org/content/10.1101/2023.03.20.533488v1.full.pdf
Metal-Binding Site Prediction
Current and Future Challenges
18
Explainability
Computational Cost
Metal binding site integration
03
01
02
Hands-on Tutorial
Google colab notebook
19
Acknowledgements
20
Thank you for joining us !
For any correspondence regarding questions about the materials and related topics :
21
22