Computer Science meets Immunology:
how computational analyses help to
study diseases
Yana Safonova
postdoctoral researcher, PhD
University of California San Diego
University of Louisville School of Medicine
@yana_safonova_
Newly emerging and re-emerging diseases
2
Newly emerging and re-emerging diseases
3
SARS-CoV-2
Response to (re)emerging diseases
4
Response to (re)emerging diseases
5
Introduction to molecular biology
6
Introduction to molecular biology
Human genome ~ 3Gbp
Introduction to molecular biology
8
The Library of Babel (H. L. Borges) contains all possible combinations of symbols, mostly meaningless ones
Human genome ~ 3Gbp
Genes in human genome take 1-2% of its length
Introduction to molecular biology
9
Hou and Lin, PLoS ONE, 2009
The Library of Babel (H. L. Borges) contains all possible combinations of symbols, mostly meaningless ones
Human genome ~ 3Gbp
Genes in human genome take 1-2% of its length
Introduction to molecular biology
10
Hou and Lin, PLoS ONE, 2009
The Library of Babel (H. L. Borges) contains all possible combinations of symbols, mostly meaningless ones
Human genome ~ 3Gbp
Genes in human genome take 1-2% of its length
The central dogma of biology
The birth of bioinformatics
11
Genome assembly
Human Genome Project
1990 – 2003
The birth of bioinformatics
12
Genome assembly
Gene prediction
The birth of bioinformatics
13
Genome assembly
Gene prediction
Gene expression
The birth of bioinformatics
14
Genome assembly
Gene prediction
Gene expression
Differential gene expression
Immune system = innate (or inherited) +
adaptive (or acquired) immune systems
15
Adaptive immune system
16
Antibodies are agents of the adaptive immune system
17
V
V
V
D
D
D
J
J
D
chr 14 (human genome)
Antibodies are agents of the adaptive immune system
18
V
V
V
D
D
D
J
J
D
J
D
V
V
J
antibody gene
chr 14 (human genome)
Antibodies are agents of the adaptive immune system
19
V
V
V
D
D
D
J
J
D
J
D
V
V
J
chr 14 (human genome)
J
D
V
V
J
Why are antibodies so versatile if there are only 55×23×6 VDJ recombinations?
20
Why are antibodies so versatile if there are only 55×23×6 VDJ recombinations?
21
Recombination process is imperfect and includes many random processes:
Why are antibodies so versatile if there are only 55×23×6 VDJ recombinations?
22
Recombination process is imperfect and includes many random processes:
Why are antibodies so versatile if there are only 55×23×6 VDJ recombinations?
23
Recombination process is imperfect and includes many random processes:
Why are antibodies so versatile if there are only 55×23×6 VDJ recombinations?
24
Recombination process is imperfect and includes many random processes:
Antibody repertoire sequencing (Rep-seq)
25
V
D
J
Length: ~360 nt
VDJ from DNA or RNA
Error-prone immunosequencing reads
sequencing read
Antibody repertoire is unique for an individual
26
Antibody repertoire is unique for an individual
frequent VDJ recombinations
Genomic variations
28
Genomic variations
29
mother ‘s DNA
father ‘s DNA
Genomic variations
30
mother ‘s DNA
father ‘s DNA
Single Nucleotide Polymorphism (SNP) is associated with a position in the genome:
Altered genes produce altered proteins
31
Single Nucleotide Polymorphism (SNP) is associated with a position in the genome:
Altered genes produce altered proteins
32
Single Nucleotide Polymorphism (SNP) is associated with a position in the genome:
Genome-wide association studies (GWAS)
33
| A | A/G | G |
Patients | 15 | 10 | 8 |
Controls | 2 | 6 | 25 |
P-value (Fisher exact probability test) = 0.000026
a single SNP
Genome-wide association studies (GWAS)
34
Variants of IGHV1-69 shape Ab response to flu
35
V
D
J
IGHV1-69
IGHV1-69*01
IGHV1-69*02
IGHV1-69*03
IGHV1-69*14
...
54: F
54: L
54: F
54: F
IGHV1-69*04
54: L
Successful binding to hemagglutinin
Loss of binding properties
Avnir et al., Sci Rep, 2016
Titers (= antibody counts) before and after immunization
GWAS of antibody responses
36
Challenges
Bioinformatics + immunology = immunoinformatics
37
Go I Know Not Whither and
Fetch I Know Not What
3D organization
amino acid content and PTMs
allelic
diversity
unusual VDJs
cross- and self-reactivity
antibody
dependent enhancement
non-coding variations
???
Bioinformatics + immunology = immunoinformatics
38
Go I Know Not Whither and
Fetch I Know Not What
Error-prone immunosequencing reads
Bioinformatics + immunology = immunoinformatics
39
Go I Know Not Whither and
Fetch I Know Not What
Error-prone immunosequencing reads
~ Flu ~
40
CDRs represent antigen-binding sites
41
V
D
J
CDR1
CDR2
CDR3
Anatomy of IGHV1-69-guided response to flu
42
Control Ab
Lindwood et al., Nature, 2012
IGHV1-69*01
IGHV1-69*02
IGHV1-69*03
IGHV1-69*14
...
54: F
54: L
54: F
54: F
IGHV1-69*04
54: L
Two flu-specific Abs:
Antibody titers vs IGHV1-69 genotype
43
Avnir et al., Sci Rep, 2016
Titers (= antibody counts) before and after immunization
The titer data suggests that genotype of IGHV1-69 shapes the response to flu and other V genes do not fully replace the “bad” variant of IGHV1-69
~20% of people have L/L variants of IGHV1-69
Bioinformatics analysis of flu response
44
Ke, Nouri, Safonova, et al., in preparation
Linked variations of V genes
45
Safonova and Watson, in preparation
| A | A/G | G |
C | 15 | 10 | 8 |
A/C | 6 | 8 | 3 |
A | 2 | 5 | 17 |
P-value = 0.00967
Blue group is a set of V genes associated with flu response
These genes have linked variations and perhaps close specificities
~ HIV ~
46
HIV and antibody response
47
HIV infects immune cells:
HIV and antibody response
48
HIV infects immune cells:
B cells producing antibodies are not affected by HIV but cannot fight it because of a high mutation rate of the HIV genome
Recognition of HIV
Verkoczy, Adv Immunol, 2017
Alter & Ackerman, Cell, 2014
CDRs represent antigen-binding sites
50
V
D
J
CDR1
CDR2
CDR3
Long CDR3s in response to HIV
51
Broadly neutralizing antibodies against HIV are characterized by extremely long CDR3s
Ultralong CDR3s in human antibodies present a result of VDDJ recombination:
Sok et al., Nature, 2017
Safonova and Pevzner, Front Immunol, 2019
Finding VDDJ recombinations
Safonova and Pevzner, Front Immunol, 2019
Finding VDDJ recombinations
Safonova and Pevzner, Front Immunol, 2019
Most D-D pairs follow ordering in IGH locus
14 datasets
Safonova and Pevzner, Front Immunol, 2019
14 individuals
Most D-D pairs follow ordering in IGH locus
14 datasets
Safonova and Pevzner, Front Immunol, 2019
14 individuals
Can we explain the most frequent pairs?
Recombination signal sequences
56
23
J
12/23
D
12
12
D
12/23
V
12
23
12
The 12/23 rule forbids:
12 nt = 1 turn of DNA
23 nt = 2 turns of DNA
Non-canonical recombination signals
57
| Pos 1 | Pos 2 | … |
A | = 11/26 | = 1/26 | … |
C | = 1/26 | = 25/26 | … |
G | = 0/26 | = 0/26 | … |
T | =14/26 | = 0/26 | … |
Non-canonical recombination signals
58
Safonova and Pevzner, Genome Res, 2020
| Pos 1 | Pos 2 | … |
A | = 11/26 | = 1/26 | … |
C | = 1/26 | = 25/26 | … |
G | = 0/26 | = 0/26 | … |
T | =14/26 | = 0/26 | … |
Prob(ACGTACGTA) =
= 11/26 * 25/26 * ...
Non-canonical recombination signals
59
Safonova and Pevzner, Genome Res, 2020
| Pos 1 | Pos 2 | … |
A | = 11/26 | = 1/26 | … |
C | = 1/26 | = 25/26 | … |
G | = 0/26 | = 0/26 | … |
T | =14/26 | = 0/26 | … |
Prob(ACGTACGTA) =
= 11/26 * 25/26 * ...
3 DNA turns
2 DNA turns
Non-canonical recombination signals
60
Safonova and Pevzner, Genome Res, 2020
3 DNA turns
2 DNA turns
V
23
J
1-turn/2-turn
D
12
23
34
D
12
23
1-turn/2-turn
12
12
23
34
1-turn/2-turn
1-turn/3-turn
Sequence alignment
61
Tandem repeats correlate with D-D fusions
62
Safonova and Pevzner, Genome Res, 2020
R1 | R2 | R3 | R4 |
– | D6-6 | D6-13 | D6-19 |
D1-1 | D1-7 | D1-14 | D1-20 |
D2-2 | D2-8 | D2-15 | D2-21 |
D3-3 | D3-9 D3-10 | D3-16 | D3-22 |
D4-4 | D4-11 | D4-17 | D4-23 |
D5-5 | D5-12 | D5-18 | D5-24 |
mouse
Common marmoset
pale spear-nosed bat
~ SARS-CoV-2 ~
63
Antibodies are subjects of fast evolution
Immune system mutates and amplifies a binding antibody
Mutation rate in antibody genes is 3-4 order of magnitude higher than in other genome
One antibody = one antigen
Antibody repertoire is a set of clonal lineages
Severity of COVID-19 and Ab response
68
Safonova, Tieri, et al., in preparation
non-naive only (IgG)
naive and non-naive (IgM)
Immunogenomics approach to vaccine design
69
Watson, Glanville, Marasco, Trends in Immunol, 2017
Data science approach to predicting the efficiency of antibody response
70
V
V
V
D
D
D
J
J
C
1
2
3
1
2
3
1
2
VDDJ | SHM rate | ... |
71
Acknowledgments
U of Louisville
William Gibson
Justin Kos
Oscar Rodriguez
David Tieri
Jun Yan
U of New South Wales
Andrew Collins
Katherine Jackson
UCSD
Massimo Franceschetti
Siavash Mirarab
Ramesh Rao
Andrey Bzikadze
Vinnu Bhardwaj
Chao Zhang
USDA
Tim Smith
Sung Bong Shin
Iowa State U
James Reecy
Luke Kramer
Scripps Institute
Raiees Andrabi
Vaughn Smider
Smithsonian Conservation Biology Institute
Klaus-Peter Koepfli
Institute for Bioorganic Chemistry
Ivan Zvyagin
Artem Mikelov
Mikhail Shugay
Pavel Pevzner
UCSD
Intersect fellowship for computational immunologists
Data Science postdoctoral fellowship
Corey Watson
U of Louisville
Harvard Medical School
Wayne Marasco
Hanzhong Ke
Yale University
Steven Kleinstein
Nima Nouri