- Research Questions: Can Protein Transformers capture biological intelligence embedded in protein sequences?
- Contributions:
- Curate a scientific dataset with meaningful annotations, tailored specifically for protein function predictions
- Devise a new computation-efficient Protein Transformer, lifting the need of large-scale pre-training
- Develop a novel explainable AI (XAI) technique for decoding decision-making processes of Protein Transformers
- This work has explored the capabilities of Protein Transformers in capturing biological intelligence resided in protein sequences.
- We introduced a high-quality, expert-annotated Protein-FN dataset, a computation-efficient Protein Transformer, and an XAI technique for decoding decision-making processes of Protein Transformers.
- Our models are efficient and effective on protein function predictions, and our XAI technique can help reveal biological intelligence captured by Protein Transformers.
Do Protein Transformers Have Biological Intelligence?
1 University of Delaware, 2 Beijing University of Posts and Telecommunications, 3 Yale University, 4 University of Louisiana at Lafayette
Fudong Lin1, Wanrou Du2, Jinchan Liu3, Tarikul Milon4, Shelby Meche4, Wu Xu4, Xiaoqi Qin2, Xu Yuan1
- Amino Acid Embedding: Directly encode biologically meaningful features, lifting the requirement of extensive pre-training
- Flexible Positional Embedding: Capture proteins with post-translational modifications or disordered regions
Sequence Protein Transformers (SPT)
- Explainable AI (XAI): Decode the decision-making processes of deep neural networks (DNNs)
- Sequence Score: Given a decision of interest, our approach assigns each amino acid an importance score reflecting its actual contribution to that decision.
- Biological Intelligence: Discover meaningful biological patterns, which align with established domain knowledge
- Motif: Patterns of amino acids that share among different proteins
Interpret Biological Intelligence
Zinc-Binding Motif: “H94-H96-H119”
- Our models are efficient and effective on protein function predictions.
Comparison to Protein Transformers on our Protein-FN dataset
- Offer 9K annotated proteins, including their 1D amino acid sequences, 3D protein structures, and functional properties
- Useful for various biological tasks, e.g., protein function predictions, motif identification and discoveries, etc.