An Adaptable, Flexible Deep Learning Based Molecular Prediction Framework that Predicts Drug Affinity for Target Proteins in the Disease Pathway
Introduction
The accurate prediction of Drug-Target Interactions is vital for identifying new drug candidates. The advent of Artificial Intelligence (AI) and Machine Learning (ML) has disrupted this landscape. However, all the knowledge and tools that facilitate this process are either in the nascent stage or are restricted to the Pharmaceutical and Biotech industry. The purpose of my project is to simplify and democratize this process by creating a framework that allows minimal knowledge of AI/ML along with the platform on which a researcher can run his/her experiments. This framework will accelerate innovation and bring drugs to patients faster.The framework created uses an Encoder-Decoder Deep Learning Neural Network, made with different embeddings for each separate task. It takes the simplified molecular-input line-entry system(SMILES) and protein amino acid sequence as an input and runs through the deep molecular encoders. The encoders map out the compound and protein interaction and present it graphically. The accuracy of the Drug-Target interaction model has been tested on many benchmark datasets for quality assurance, such as DAVIS and KIBA, while subsequently comparing them with other state-of-the-art models. Gradio, an open-source python library, is used to build a Graphic User Interface (GUI) that runs on top of the AI/ML models. Additionally, the framework also supports drug-drug interaction for side effect prediction.
Background
- Discovering a new drug takes more than 10 years and costs higher than $2.6 billion [1].
- Recently, many AI startups for drug discovery have successfully applied deep learning techniques to aid drug discovery research and greatly shorten time/save cost [2,3].
- The frameworks existing today don't appeal to the larger demographic, hence creating a divide between the Industry and the common researcher.
- Existing toolkits are sparse and are not truly compatible with each other. This inhibits usage and innovation.
Figure 1[7] Figure 2[8]
The Drug Discovery Landscape
- A disease is generally attributed to a certain target protein in the disease pathway.
- To combat the disease, a drug needs to be discovered in order to render that target protein harmless, therefore removing the disease entirely.
- A protein is a “ lock” 🔒 and drug discovery is to find the right “key” 🔑 to unlock the target (i.e., the right drug to modulate the protein)[4].
- The Interaction as well as the binding of that drug to the target protein can be computed numerically, a number called the Binding Affinity Score. This is the basis behind the primary application of drug target interactions.
- There are two different types of Drug Screening using Machine learning. These two different types are Virtual Screening and Drug Repurposing.
Objective: Create a simple and accessible framework of machine learning models that can efficiently perform a wide variety of drug discovery tasks and, therefore, help get the drug to the market faster.
Conclusion
- Although the goal of this project has been accomplished, there are still a lot of improvements that can be made to add robustness to the framework and make it user friendly.
- In the future, this framework will be brought to the experienced professionals to get their true perspective. The Goal of this entire project is to make the drug discovery process easier, and any comments on the framework or architecture would be helpful.
- Many of the existing frameworks are incredibly hard to use and even less accurate than the models developed here.
- With this, the drug discovery process can become faster and scientists will be able to get drug on the market sooner, thus saving lives.
- This project proves that Machine Learning doesn't always have to be so complicated, and there are ways to get everyone involved.
- All in All, this molecular prediction toolkit is good for beginners, because it was made by a beginner.
References
[1] Mullard, A. New drugs cost US$2.6 billion to develop. Nature Reviews Drug Discovery (2014).
[2] Fleming, Nic. How artificial intelligence is changing drug discovery. Nature (2018).
[3] Smalley, E. AI-powered drug discovery captures pharma interest. Nature Biotechnology (2017).
[4] Gschwend DA, Good AC, Kuntz ID. Molecular docking towards drug discovery. Journal of Molecular Recognition: An Interdisciplinary Journal (1996).
[5] Mayr, Andreas, et al. Large-scale comparison of machine learning methods for drug target prediction on ChEMBL. Chemical science (2018).
[6] Öztürk H, Özgür A, Ozkirimli E. DeepDTA: deep drug-target binding affinity prediction. Bioinformatics (2018).
[7] Nguyen, Thin, Hang Le, and Svetha Venkatesh. GraphDTA: prediction of drug–target binding affinity using graph convolutional networks. BioRxiv (2019).
[8] Tsubaki, Masashi, Kentaro Tomii, and Jun Sese. Compound–protein interaction prediction with end-to-end learning of neural networks for graphs and sequences. Bioinformatics (2019).
[9] Lee, Ingoo, Jongsoo Keum, and Hojung Nam. DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences. PLoS computational biology (2019).
[10] Chen, Xing, et al. Drug–target interaction prediction: databases, web servers and computational models. Briefings in bioinformatics (2016).
[11] Huang K, Xiao C, Hoang T, Glass L, Sun J. CASTER: Predicting Drug Interactions with Chemical Substructure Representation. AAAI (2020).
Materials
For the training & execution of this Neural Network:
- An Apple Macbook Pro
- Google Colaboratory
- Tensorflow
- Pytorch
- Gradio
Methodology
- Therapeutics Data Commons (TDC) is the first unifying framework to systematically access, evaluate, and benchmark machine learning methods across the entire range of therapeutics. TDC supports the development of novel ML methods and theory, with a strong bent towards developing the foundations of which ML algorithms are most suitable for drug discovery applications and why[5].
- BindingDB is a public, web-accessible database of measured binding affinities, focusing chiefly on the interactions of proteins considered to be candidate drug-targets with ligands that are small, drug-like molecules[6].
- The framework takes as input the Simplified Molecular Input Line Entry System(SMILES) and the protein amino acid sequence pair. This is then fed into molecular encoders which specifies a deep transformation function that maps compounds and proteins numerically, which is then processed and fed into a python library called Pretty Table.
- For Drug Target Interaction, the Encoder Decoder system turns to a double convolution system, as both the input and the output are meant to be numerical values.
\
The Neural Networks
- The entire plan of this project is to build a neural network that is capable of outputting numerical Binding Affinity values that give the scientist or the working professional an Idea of how effective a drug is to whatever target protein that they are working on.
- On top of the Neural Networks, there is a feature in which the scientists can change the elements of a drug to make it respond more favorably.
- The currently supported embeddings for this framework are CNN-CNN and MPNN-CNN Encoder Decoder Systems.
Results
- This Machine Learning Framework has a lot of possible use cases. The algorithm is run on SARS-COV2 3Cl Protease, which is the main target protein in the coronavirus. The goal is to neutralize or inhibit the protein to render the virus harmless.
- If a Binding Score is close to 0, then that means the drug is completely effective for the neutralization of that target protein.
- The pretrained models were previously tested on over 9500 epochs.
- For patients with COVID-19, adding the antiviral agents sofosbuvir (SOF) and daclatasvir (DCV) to standard care demonstrated increased efficacy and safety, and the SOF-DCV combination decreased intensive care unit (ICU) admissions compared with standard care alone, according to results of a phase 3 study published in the Journal of Medical Virology[7]. Those 2 drugs were the top 2 choices and have been in clinical trials.