1 of 1

  • Research Questions: Can Protein Transformers capture biological intelligence embedded in protein sequences?
  • Contributions:
  • Curate a scientific dataset with meaningful annotations, tailored specifically for protein function predictions
  • Devise a new computation-efficient Protein Transformer, lifting the need of large-scale pre-training
  • Develop a novel explainable AI (XAI) technique for decoding decision-making processes of Protein Transformers

Motivation

    • This work has explored the capabilities of Protein Transformers in capturing biological intelligence resided in protein sequences.
  • We introduced a high-quality, expert-annotated Protein-FN dataset, a computation-efficient Protein Transformer, and an XAI technique for decoding decision-making processes of Protein Transformers.
  • Our models are efficient and effective on protein function predictions, and our XAI technique can help reveal biological intelligence captured by Protein Transformers.

Conclusion

Do Protein Transformers Have Biological Intelligence?

1 University of Delaware, 2 Beijing University of Posts and Telecommunications, 3 Yale University, 4 University of Louisiana at Lafayette

Fudong Lin1, Wanrou Du2, Jinchan Liu3, Tarikul Milon4, Shelby Meche4, Wu Xu4, Xiaoqi Qin2, Xu Yuan1

Dataset

Code

Paper

  • Amino Acid Embedding: Directly encode biologically meaningful features, lifting the requirement of extensive pre-training
  • Flexible Positional Embedding: Capture proteins with post-translational modifications or disordered regions

Our SPT Models

Sequence Protein Transformers (SPT)

Three Model Variants

  • Explainable AI (XAI): Decode the decision-making processes of deep neural networks (DNNs)
  • Sequence Score: Given a decision of interest, our approach assigns each amino acid an importance score reflecting its actual contribution to that decision.

Our Sequence Score

Importance Weight:

Importance Score:

Normalization:

Equations

 

  • Biological Intelligence: Discover meaningful biological patterns, which align with established domain knowledge
  • Motif: Patterns of amino acids that share among different proteins

Interpret Biological Intelligence

Zinc-Binding Motif: “H94-H96-H119”

  • Our models are efficient and effective on protein function predictions.

Experimental Results

Comparison to Protein Transformers on our Protein-FN dataset

  • Offer 9K annotated proteins, including their 1D amino acid sequences, 3D protein structures, and functional properties
  • Useful for various biological tasks, e.g., protein function predictions, motif identification and discoveries, etc.

Our Protein-FN Dataset

1D Sequence

3D Structure

Dataset Overview