1 of 1

Research Questions: Can Protein Transformers capture biological intelligence embedded in protein sequences?
Contributions:
Curate a scientific dataset with meaningful annotations, tailored specifically for protein function predictions
Devise a new computation-efficient Protein Transformer, lifting the need of large-scale pre-training
Develop a novel explainable AI (XAI) technique for decoding decision-making processes of Protein Transformers

Motivation

This work has explored the capabilities of Protein Transformers in capturing biological intelligence resided in protein sequences.

We introduced a high-quality, expert-annotated Protein-FN dataset, a computation-efficient Protein Transformer, and an XAI technique for decoding decision-making processes of Protein Transformers.
Our models are efficient and effective on protein function predictions, and our XAI technique can help reveal biological intelligence captured by Protein Transformers.

Conclusion

Do Protein Transformers Have Biological Intelligence?

¹University of Delaware, ²Beijing University of Posts and Telecommunications, ³Yale University, ⁴University of Louisiana at Lafayette

Fudong Lin¹, Wanrou Du², Jinchan Liu³, Tarikul Milon⁴, Shelby Meche⁴, Wu Xu⁴, Xiaoqi Qin², Xu Yuan¹

Dataset

Code

Paper

Amino Acid Embedding: Directly encode biologically meaningful features, lifting the requirement of extensive pre-training
Flexible Positional Embedding: Capture proteins with post-translational modifications or disordered regions

Our SPT Models

Sequence Protein Transformers (SPT)

Three Model Variants

Explainable AI (XAI): Decode the decision-making processes of deep neural networks (DNNs)
Sequence Score: Given a decision of interest, our approach assigns each amino acid an importance score reflecting its actual contribution to that decision.

Our Sequence Score

Importance Weight:

Importance Score:

Normalization:

Equations

Biological Intelligence: Discover meaningful biological patterns, which align with established domain knowledge
Motif: Patterns of amino acids that share among different proteins

Interpret Biological Intelligence

Zinc-Binding Motif: “H94-H96-H119”

Experimental Results

Comparison to Protein Transformers on our Protein-FN dataset

Offer 9K annotated proteins, including their 1D amino acid sequences, 3D protein structures, and functional properties
Useful for various biological tasks, e.g., protein function predictions, motif identification and discoveries, etc.

Our Protein-FN Dataset

1D Sequence

3D Structure

Dataset Overview