Adversarial Use of Protein Language Models for Modeling Escape
Sayantani B. Littlefield and Roy H. Campbell
HealthSec’25, Honolulu, HI
1
About the Authors
Roy H. Campbell
Professor Emeritus Computer Science
University of Illinois Urbana-Champaign
Sayantani B. Littlefield
Computer Science PhD Candidate
University of Illinois Urbana-Champaign
On the job market!
2
Introduction
3
Overview of LLMs
4
Rives et al. PNAS 2019, Rao et al. biorXiv 2020, Lin et al. biorXiv 2022
Motivation - Security
5
Descriptive Statistics
6
Results (1)
Varied mutations in wildtype sequence and compared each mutated sequence with the original wildtype sequence
orange line = the average number of SARS-CoV-2 mutations
cosine distance (wt_seq, mut_i_seq)
7
Results (2)
8
Known COVID-19 mutations Alpha, Beta, Gamma, Delta, Omicron
Results (3)
9
Synthetically mutated sequences
Model with 3B parameters shows less noise compared to the model with 8M parameters
Results (4)
10
Synthetically mutated sequences
Model with 3B parameters shows less noise compared to the model with 8M parameters
Results (5) - Reviewer suggestions
11
Results (5) - Reviewer suggestions
12
Defensive Measures
The discussion of vulnerabilities comes with associated risks, as attackers may use such information to further exploit protein language models
13
Conclusion
14
Acknowledgment
15
Thank you!
16