Amino acid transfer learning for antibody binding prediction
1
Minimize the size of a train dataset with transfer learning or other methods.
Background
Goal
Students:
Natalia Khotkina
(Bioinformatics Institute)
Supervisor:
Daria Balashova
(Amsterdam UMC)
Transfer learning is used in tasks where big amount of data cannot be collected for some reason. In this approach, knowledge from one task is transferred to the current task.
Multi-task models include several last layers and each last layer solves a different task.
2
Methods
Transfer learning
Multi-task
Training on train datasets of different sizes
3
ROC AUC score was increasing and then reached a plateau with growing size of train dataset. ROC AUC score reached a plateau at about half of the train size used in publication of Taft and colleagues. This finding suggests that the amount of data used for model training was redundant and could be shortened with no loss.
Results
Transfer learning from ACE2 dataset
The size of LY16 training dataset was shortened from 26K to 1K. The LSTM model (from J. M. Taft, et al., 2022) was pre-trained on ACE2 dataset first and trained afterwards on LY16 dataset.
Pre-training significantly increased the ROC AUC score compared to a model without pretraining. This finding suggests that some information is shared between datasets of ACE2 and neutralizing antibodies binding, and pretraining on one dataset can improve predictions for the other dataset.
4
Results
The scheme of pre-training
ROC AUC score in “basic” LSTM and in “pretrained” LSTM, n=40, paired t-test
Multi-task model
Multi-task model can predict binding to each of the neutralizing antibodies simultaneously.
For LY16 antibody with a training dataset of 1K, the ROC AUC was significantly higher with a multi-task approach, compared to a basic model from the publication of J. M. Taft, et al., 2022.
5
Multi-task model
Train strategy for multi-task model
ROC AUC score in “basic” LSTM and in “multi-task” LSTM, n=40, paired t-test
Results
Our results suggest that accurate prediction of antibody escape is possible with a smaller train datasets and can be improved by transfer learning.
6
Conclusion
Github