Immunoinformatics:
immunology meets computing
Yana Safonova
Center for Information Theory and Applications
University of California at San Diego
2
Reconstructing Strings from Random Traces
3
Input. A collection of random subsequences (traces) of a string t, where each trace is obtained by deleting each symbol in the string with probability q
Output. The string t
Batu, Kannan, Khanna, McGregor, SODA 2004
Reconstructing Strings from Random Traces
4
Input. A collection of random subsequences (traces) of a string t, where each trace is obtained by deleting each symbol in the string with probability q
Output. The string t
Batu, Kannan, Khanna, McGregor, SODA 2004
T C G G G G G T T T T T
T C G G G G G T T T T T
T C G G G G G T T T T T
T C G G G G G T T T T T
T C G G G G G T T T T T
T C G G G G G T T T T T
T C G G G G G T T T T T
Reconstructing Strings from Random Traces
5
Input. A collection of random subsequences (traces) of a string t, where each trace is obtained by deleting each symbol in the string with probability q
Output. The string t
Batu, Kannan, Khanna, McGregor, SODA 2004
T G G G G T T T T
C G G G G T T T T
T G G G T T T T
G G G G T T T
T C G G G T T T
T G G G G G T T T
C G G G T T T T T
Reconstructing Strings from Random Traces
6
Input. A collection of random subsequences (traces) of a string t, where each trace is obtained by deleting each symbol in the string with probability q
Output. The string t
Batu, Kannan, Khanna, McGregor, SODA 2004
T G G G G T T T T
C G G G G T T T T
T G G G T T T T
G G G G T T T
T C G G G T T T
T G G G G G T T T
C G G G T T T T T
Reconstructing Strings from Random Traces
7
Input. A collection of random subsequences (traces) of a string t, where each trace is obtained by deleting each symbol in the string with probability q
Output. The string t
DNA sequencing
Information Theory Meets Cancer Biology
8
Input. A collection of random subsequences (traces) of a string t, where each trace is obtained by deleting, inserting, or substituting each symbol in the string according to a complex probabilistic model with poorly understood parameters.
Output. The string t
tumor
development
Blackburn & Langenau. Disease Models & Mechanisms, 2014
Information Theory Meets Immunology
9
Input. A collection of random subsequences (traces) derived from a set of strings T, where each trace is obtained by deleting, inserting, or substituting each symbol in one of the strings in T according to a complex probabilistic model with poorly understood parameters.
Output. The set of strings T
Information Theory Meets Immunology
10
Input. A collection of random subsequences (traces) derived from a set of strings T, where each trace is obtained by deleting, inserting, or substituting each symbol in one of the strings in T according to a complex probabilistic model with poorly understood parameters.
Output. The set of strings T
The reality is even more complex: traces are derived not from a set of strings, but rather from some concatenates of some strings in an unknown set of strings T.
Information Theory Meets Immunology
11
Input. A collection of random subsequences (traces) derived from a set of strings T, where each trace is obtained by deleting, inserting, or substituting each symbol in one of the strings in T according to a complex probabilistic model with poorly understood parameters.
Output. The set of strings T
The reality is even more complex: traces are derived not from a set of strings, but rather from some concatenates of some strings in an unknown set of strings T.
Time to learn Immunology 101!
12
Adaptive immune system
13
Antibodies
14
Specificity rule:
one antibody – one antigen
(not necessarily true)
Antibody
Antigen
Generation of antibodies
Before recombination, the genome of an antibody-producing cell (B cell) looks exactly like genomes of all other cells:
15
Immunoglobulin locus (Chr 14), length ~1.25 Mb
V
165-305 nt
avg. 291 nt
D
11-37 nt
avg. 24 nt
J
48-63 nt
avg. 54 nt
Selection of J segment...
16
Left cleavage of J segment...
17
Selection of D segment...
18
Right cleavage of D segment...
19
Concatenation of D and J segments...
20
Newly created unique genomic region
Left cleavage of DJ fragment...
21
Selection of V segment...
22
Right cleavage of V segment...
23
VDJ concatenation (variable region of antibody)
24
360 nt of VDJ + 1000 nt of constant region
instead of original 1.25 Mb
Constant region
Variable region of antibodies contains antigen binding sites
25
Constant region
360 nt of VDJ + 1000 nt of constant region
instead of original 1.25 Mb
Why are antibodies so diverse if there are only 55×23×6 VDJ recombinations?
26
Why are antibodies so diverse if there are only 55×23×6 VDJ recombinations?
27
Recombination process is imperfect and includes many random processes:
Why are antibodies so diverse if there are only 55×23×6 VDJ recombinations?
28
Recombination process is imperfect and includes many random processes:
Why are antibodies so diverse if there are only 55×23×6 VDJ recombinations?
29
Recombination process is imperfect and includes many random processes:
Why are antibodies so diverse if there are only 55×23×6 VDJ recombinations?
30
Recombination process is imperfect and includes many random processes:
Why are antibodies so diverse if there are only 55×23×6 VDJ recombinations?
31
Recombination process is imperfect and includes many random processes:
Recombined antibodies may undergo somatic mutagenesis
Why are antibodies so diverse if there are only 55×23×6 VDJ recombinations?
32
Recombination process is imperfect and includes many random processes:
Recombined antibodies may undergo somatic mutagenesis
33
From biological problems to computational challenges
34
VDJ classification problem. Given an antibody generated from a known set of V, D, and J segments, identify what specific V, D, and J segments generated this antibody
From biological problems to computational challenges
35
Model organisms in immunology with still unknown sets of V, D, and J segments
VDJ classification problem. Given an antibody generated from a known set of V, D, and J segments, identify what specific V, D, and J segments generated this antibody
From biological problems to computational challenges
36
VDJ reconstruction problem. Given antibodies generated from an unknown set of V, D, and J segments, reconstruct these sets
VDJ classification problem. Given an antibody generated from a known set of V, D, and J segments, identify what specific V, D, and J segments generated this antibody
From biological problems to computational challenges
37
VDJ reconstruction problem. Given error-prone reads representing antibodies generated from an unknown set of V, D, and J segments, reconstruct these sets
VDJ classification problem. Given an antibody generated from a known set of V, D, and J segments, identify what specific V, D, and J segments generated this antibody
VDJ classification problem is solved!
38
IMGT/V-QUEST
Brochet et al, Nucleic Acids Res, 2008
IgBlast
Ye et al, Nucleic Acids Res, 2013
iHMMune-align
Gaeta et al, Bioinformatics, 2007
Antibody graph
Bonissone and Pevzner, RECOMB 2015
VDJ classification problem. Given an antibody generated from a known set of V, D, and J segments, identify what specific V, D, and J segments generated this antibody
includes database of V, D, J segments
VDJ classification problem is solved!
39
VDJ classification problem. Given an antibody generated from a known set of V, D, and J segments, identify what specific V, D, and J segments generated this antibody
Only if VDJ segments do not vary widely between individuals!
How VDJ segments vary across population?
40
VDJ variants problem. Given reference V, D, and J segments and antibody repertoire from an individual, reconstruct how V, D, and J segments in this individual differ from the reference and discover new V, D, and J segments.
How accurate is the database of human
V, D, and J segments?
41
Watson et al, AJHG, 2013
Watson et al, Genes & Immunity, 2014
Finding novel V segments
42
Germline V segment
✤ ✺
✤ ✺
✤ ✺
Finding novel V segments
43
Germline V segment
✤ ✺
✤ ✺
✤ ✺
✤ ✺
✤ ✺
✪ ❃ ❇
✪ ❃ ❇
✤ ✺
Novel V segments:
Finding novel V segments
44
Germline V segment
✤ ✺
✤ ✺
✤ ✺
✤ ✺
✤ ✺
✪ ❃ ❇
✪ ❃ ❇
✪ ❃ ❇
✪ ❃ ❇
✪ ❃ ❇
✤ ✺
✪ ❃ ❇
Novel V segments:
Finding novel V segments
45
Germline V segment
✤ ✺
✤ ✺
✤ ✺
✤ ✺
✤ ✺
✪ ❃ ❇
✪ ❃ ❇
✪ ❃ ❇
✪ ❃ ❇
✪ ❃ ❇
✤ ✺
✪ ❃ ❇
Novel V segments:
Our analysis revealed:
Novel segments in rhesus macaques were discovered by
Corcoran et al., Nat Communications, 2017
Crossover causes genetic diversity
46
father chr.
mother chr.
child chr.
child chr.
Crossover causes genetic diversity
47
father chr.
mother chr.
Unequal crossover may produce new segments
48
father chr.
mother chr.
Unequal crossover may produce new segments
Length of Ig loci: ~1.25 Mbp
Frequency of crossing over: 1 point per 1 Mbp
49
father chr.
mother chr.
child chr.
child chr.
Variations in Ig loci: alternative method for population analysis?
50
Ig loci reconstruction problem. Given error-prone reads representing antibodies and reads sampled from the genome, assemble the Ig loci
51
Antibody repertoire sequencing (Rep-seq)
52
V
D
J
Length: ~360 nt
Left read
Right read
VDJ from DNA or RNA
Error-prone immunosequencing reads
Turchaninova et al, Nat Protocols, 2016
Repertoire construction problem
53
× 7
× 3
× 2
× 2
Antibody repertoire
× 1
Rep-seq reads
Antibody repertoire is the set of VDJ sequences with their abundances
Repertoire construction problem
54
× 7
× 3
× 2
× 2
Antibody repertoire
Antibody repertoire is the set of V(D)J sequences with their abundances
× 1
Rep-seq reads
Colors of reads and positions of errors are unknown!
Repertoire construction problem
55
Repertoire construction problem combines
clustering and error-correction
× 7
× 3
× 2
× 2
Antibody repertoire
× 1
Rep-seq reads
Repertoire construction problem
56
pRESTO
Vander-Heiden et al,
Bioinformatics 2014
MiXCR
Bolotin et al,
Nat Methods 2015
IgRepertoireConstructor
Safonova et al,
Bioinformatics 2015
× 7
× 3
× 2
× 2
Antibody repertoire
× 1
Rep-seq reads
What makes this problem difficult?
57
x 1 ~ 1018
Similarity of antibody sequences: different antibodies may share V
Difficult to distinguish between
errors ( ) and natural variations ( )
Number of clusters is unknown
Uneven distribution of abundances
Difficulty of clustering problem
58
Difficulty of clustering problem
59
Difficulty of clustering problem
60
Difficulty of clustering problem
61
Small antibody repertoire
Six antibodies with large abundances and many singleton antibodies (real data)
62
Each antibody forms a dense subgraph in the ideal Hamming graph
63
Reads derived from the same antibody differ from
each other due to sequencing errors (Hamming distance=3)
Real Hamming graph
107 nodes, 1426 edges
64
Similarity of natural variation and sequencing errors makes clusters highly connected
Actually, the real Hamming graph is not colored
65
Repertoire construction problem. Given error-prone Rep-seq reads representing antibodies, reconstruct sequences of antibodies
Repertoire construction problem
66
Repertoire construction problem. Given error-prone Rep-seq reads representing antibodies, reconstruct sequences of antibodies
Repertoire construction problem
67
Repertoire construction problem. Given a Hamming graph constructed from error-prone Rep-seq reads, find its dense subgraphs
Finding dense subgraphs in the Hamming graph
68
Hamming graph HG
Triangulated Hamming graph THG
triangulated Hamming graph
original Hamming graph
Each cycle > 3 in triangulated graph has a chord
Finding dense subgraphs in the Hamming graph
69
Hamming graph HG
Triangulated Hamming graph THG
Our Hamming graphs are “almost” triangulated
Minimum Triangulation Problem: find the minimum number of edge additions to convert a graph into a triangulated graph
triangulated Hamming graph
Finding dense subgraphs in the Hamming graph
70
Hamming graph HG
Triangulated Hamming graph THG
The minimum triangulation problem has effective approximate solutions, e.g., METIS by Karypis and Kumar, SIAM J Sci Comput, 1999
triangulated Hamming graph
Finding dense subgraphs in the Hamming graph
Maximal cliques in a triangulated graph can be computed in polynomial time (Galinier et al, LNCS, 1995)
71
Hamming graph HG
Triangulated Hamming graph THG
Maximal cliques in THG
triangulated Hamming graph
Finding dense subgraphs in the Hamming graph
72
Triangulated Hamming graph THG
Maximal cliques in THG
clique graph
triangulated Hamming graph
Hamming graph HG
Finding dense subgraphs in the Hamming graph
73
Triangulated Hamming graph THG
Maximal cliques in THG
clique graph
triangulated Hamming graph
Hamming graph HG
Finding dense subgraphs in the Hamming graph
74
Triangulated Hamming graph THG
Maximal cliques in THG
clique graph
triangulated Hamming graph
Hamming graph HG
Finding dense subgraphs in the Hamming graph
75
Triangulated Hamming graph THG
Maximal cliques in THG
clique graph
triangulated Hamming graph
Hamming graph HG
Finding dense subgraphs in the Hamming graph
76
Triangulated Hamming graph THG
Maximal cliques in THG
clique graph
triangulated Hamming graph
Hamming graph HG
Finding dense subgraphs in the Hamming graph
77
Hamming graph HG
Triangulated Hamming graph THG
Maximal cliques in THG
clique graph
triangulated Hamming graph
Finding dense subgraphs in the Hamming graph
78
Hamming graph HG
Triangulated Hamming graph THG
Maximal cliques in THG
clique graph
triangulated Hamming graph
Finding dense subgraphs in the Hamming graph
79
Hamming graph HG
Triangulated
Hamming graph THG
Maximal cliques in THG
Dense subgraphs in THG
clique graph
Finding dense subgraphs in the Hamming graph
80
Hamming graph HG
Triangulated
Hamming graph THG
Maximal cliques in THG
Dense subgraphs in THG
clique graph
Finding dense subgraphs in the Hamming graph
81
Hamming graph HG
Triangulated
Hamming graph THG
Maximal cliques in THG
Dense subgraphs in THG
clique graph
Finding dense subgraphs in the Hamming graph
82
Hamming graph HG
Triangulated
Hamming graph THG
Maximal cliques in THG
Dense subgraphs in THG
Dense subgraphs in HG
original Hamming graph
IgRepertoireConstructor in action
83
adjacency matrix
IgRepertoireConstructor in action
84
A closer look at a dense subgraph
85
Is it a single antibody or multiple similar antibodies?
A closer look at a dense subgraph
86
Is it a single antibody or multiple similar antibodies?
A closer look at a dense subgraph
87
Is it a single antibody or multiple similar antibodies?
A closer look at a dense subgraph
88
Is it a single antibody or multiple similar antibodies?
A closer look at a dense subgraph
89
Is it a single antibody or multiple similar antibodies?
Dense subgraphs in more details
Dense subgraphs correspond to identical or very similar antibodies
Detection of variations within dense subgraphs helps to construct subpartition dense subgraphs into individual antibodies
90
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTACCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTACCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTACCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTACCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTACCTATGAT
CCGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTACCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTACCTATGAT
CTGAGACTTTCCTGTTCAGCCTCTGGATTCACCTTCAGTAGCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTAGCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTAGCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTAGCTATGAT
CTGAGACTCTCCTGTTCTGCCTCTGGATTCACCTTCAGTAGCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTAGCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTAGCTATGAT
*.******.********.**********************.*******
Problem: how can we distinguish edges corresponding to variations from edges corresponding to sequencing errors?
Dense subgraphs in more details
91
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTACCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTACCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTACCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTACCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTACCTATGAT
CCGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTACCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTACCTATGAT
CTGAGACTTTCCTGTTCAGCCTCTGGATTCACCTTCAGTAGCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTAGCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTAGCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTAGCTATGAT
CTGAGACTCTCCTGTTCTGCCTCTGGATTCACCTTCAGTAGCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTAGCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTAGCTATGAT
*.******.********.**********************.*******
Each edge in the dense subgraph corresponds to either an error or to a variation. Errors correspond to spurious positions in the multiple alignment, while variations correspond to solid positions
Dense subgraph can combine similar antibodies
92
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTACCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTACCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTACCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTACCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTACCTATGAT
CCGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTACCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTACCTATGAT
CTGAGACTTTCCTGTTCAGCCTCTGGATTCACCTTCAGTAGCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTAGCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTAGCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTAGCTATGAT
CTGAGACTCTCCTGTTCTGCCTCTGGATTCACCTTCAGTAGCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTAGCTATGAT
CTGAGACTCTCCTGTTCAGCCTCTGGATTCACCTTCAGTAGCTATGAT
*.******.********.**********************.*******
Final decomposition of the Hamming graph
93
35 trivial and 7 non-trivial clusters
94
Tool for constructing antibody repertoire from Rep-seq reads:
Works on both barcoded and non-barcoded polyclonal datasets
Performs well for both
antibody and TCR repertoires
Available at
Illumina BaseSpace
Works slow on very complex Rep-seq datasets
IgRepertoireConstructor / IgReC
IgQUAST: quality assessment of antibody repertoires
95
2017
96
Antibody maturation
97
target antigen
mutation process
Antibody maturation
98
Antibody gains better specificity to the target antigen
target antigen
mutation process
Antibody maturation
99
target antigen
mutation process
Good! Positive selection
Antibody maturation
100
target antigen
mutation process
Antibody maturation
101
Antibody gains specificity to an antigen of the host (self-reactivity)
target antigen
mutation process
Antibody maturation
102
target antigen
mutation process
Bad: Negative selection
Antibody maturation
103
target antigen
mutation process
Antibody maturation
104
Antibody loses specificity to the target antigen
target antigen
mutation process
Antibody maturation
105
target antigen
mutation process
Bad: Negative selection
Turning antibodies into vertices...
A directed edge connects parental antibody and its mutated copy
Antibody expansion...
The higher is the specificity of an antibody, the more likely it is to be expanded
Antibody expansion...
Evolutionary tree!
109
Evolutionary tree = clonal tree
110
Evolutionary tree of antibodies = clonal tree
111
Evolutionary tree of antibodies = clonal tree
112
Evolution of antibody repertoire
Initially, antibody repertoire consists of many “naive” VDJ recombinations (without mutations)
113
V
D
J
V
D
J
V
D
J
V
D
J
V
D
J
V
D
J
V
D
J
V
D
J
Binding to antigens initiates selection
114
V
D
J
V
D
J
V
D
J
V
D
J
V
D
J
V
D
J
V
D
J
V
D
J
Clonal expansion and mutagenesis turn single antibodies into lineages
115
V
D
J
V
D
J
V
D
J
Clonal expansion and mutagenesis turn single antibodies into lineages
116
V
D
J
V
D
J
V
D
J
Each lineage is described by its clonal tree
117
V
D
J
V
D
J
V
D
J
Antibody repertoire is described by a set of clonal trees
Standard phylogenetics algorithms do not work here!
118
Two faces of clonal reconstruction
Clonal lineage assignment
Clonal tree construction
Change-O
Gupta et al, Bioinformatics 2015
Clonify
Briney et al,
Sci Rep 2016
partis
Ralph and Matsen,
PLOS CB 2016
repgenHMM
Elhanati et al,
Bioinformatics 2016
119
Given antibody repertoire, decompose it into clonal lineages
Given clonal lineage, construct its clonal tree
Can we apply existing phylogenetic tree construction algorithms to solve this problem?
AntEvolo: clonal tree constructor
120
Application of clonal analysis
121
HIV, 94th week, largest tree
Haynes et al., Nat Biotech, 2012
Liao et al., Nature, 2013
Laserson et al., PNAS, 2014
Galson et al., Genome Med, 2016
During immune response, antibodies specific to an active antigen are expanded into huge clonal trees
Antibodies gain mutations during maturation
122
V
D
J
Antibodies gain mutations during maturation
123
V
D
J
Antibodies gain mutations during maturation
124
98% – random substitutions; 2% – short random indels
V
D
J
VDJ classification helps to reveal mutations
125
V
D
J
V
D
J
Closest germline segments from database
Mutations
Who is the ancestor here?
126
V
D
J
V
D
J
Closest germline segments from database
V
D
J
Nested mutations define evolutionary direction
127
V
D
J
V
D
J
Closest germline segments from database
V
D
J
New mutations
Who is the ancestor here?
128
V
D
J
V
D
J
Closest germline segments from database
V
D
J
Who is the ancestor here?
129
V
D
J
V
D
J
Closest germline segments from database
V
D
J
Individual mutations 1
Individual mutations 2
Ancestral antibody is missing in the repertoire
130
V
D
J
V
D
J
Closest germline segments from database
V
D
J
Individual mutations 1
Individual mutations 2
V
D
J
What is the evolutionary tree?
131
Nested mutations define directions
132
Repetitive mutations complicates construction of clonal tree
133
Repetitive mutations complicates construction of clonal tree
134
Homoplasy in antibodies
What is the structure of evolutionary tree connecting a, b, and c?
135
Homoplasy in antibodies
136
Parallel evolution of
Homoplasy in antibodies
137
Parallel evolution of
Homoplasy in antibodies
Reverse of
We do not know which tree is correct
Reverse of
Parallel evolution of
Parallel evolution of
Analyzing homoplasy in antibodies using clonal graph
if mutations of a are nested into mutations of b
140
Connected component of clonal graph constructed from lymphoma repertoire
Analyzing homoplasy in antibodies using clonal graph
if mutations of a are nested into mutations of b
Connected component of clonal graph constructed from lymphoma repertoire
Counting parallel mutations
142
| # non-trivial SHMs | max frequency | # synonymous non-trivial SHMs | # RGYW/WRCY non-trivial SHMs |
Lymphoma | 172 (72.52%) | 20 | 47 | 50 |
HIV, 94th week | 544 (82.05%) | 79 | 120 | 132 |
Plasma cells, positive to flu | 90 (90%) | 7 | 37 | 15 |
Plasma cells, negative to flu | 3 (3.3%) | 2 | 3 | 1 |
Are non-trivial mutations important?
143
Hypothesis 1
Non-trivial mutations are fixed in antibodies after multiple encounters with antigens
Conclusion: Such mutations are very important for affinity and thus for drug design
Are non-trivial mutations important?
144
Hypothesis 2
Non-trivial mutations occurred during multiple cell divisions after a single encounter with antigen
Conclusion: Such mutations are important for mutagenesis modeling
Hypothesis 1
Non-trivial mutations are fixed in antibodies after multiple encounters with antigens
Conclusion: Such mutations are very important for affinity and thus for drug design
145
Largest tree for HIV, 94 week
146
Largest tree for HIV, 94 week
Probably, products of a single encounter with an antigen
147
Largest tree for HIV, 94 week
Probably, products of multiple encounters with antigens
148
Three types of antibody chains
Human antibody have one type of heavy chain (IGH) and two types of light chains:
149
VDJ recombination randomly selects between IGK and IGL
150
Resulting antibody may be self-reactive
151
In this case, immune system gives B cell producing self-reactive antibody a second chance
Receptor editing process is intended to fix self-reactive chains
152
Newly recombined antibody may be helpful
153
Sometimes receptor editing affect another light chain locus (isotypic inclusion)
154
B cell produces 2 antibodies
Self-reactive chain still works, but it is suppressed by immune system
155
Receptor editing may also affect alternative allele (allelic inclusion)
156
In the worst case, B cell may produce 6 different chains
157
It is still unknown how many antibodies can be produced by such B cell and how many of them are self-reactive
Multichain effect corrupts results of clonal analysis
158
Multichain effect corrupts results of clonal analysis
159
Multichain effect corrupts results of clonal analysis
160
Without information about correspondence between chains, clonal analysis may be useless
161
Single-cell RNA-seq of adaptive immune repertoires
162
McDaniel et al., Nat Protocols, 2016
Single-cell RNA-seq of adaptive immune repertoires
163
McDaniel et al., Nat Protocols, 2016
Single-cell RNA-seq of adaptive immune repertoires
164
McDaniel et al., Nat Protocols, 2016
Single-cell RNA-seq of adaptive immune repertoires
165
McDaniel et al., Nat Protocols, 2016
> 97% precision,
3% of collisions:
single cell barcode corresponds to several cells
Clonal analysis for paired antibody repertoire
166
AntEvolo → PairAntEvolo
167
PairAntEvolo
168
Information about correspondence between chains improves quality of clonal trees
Paired data & multi-chain effect
169
Cell collision
Paired data & multi-chain effect
170
Antibody tree construction problem. Given an antibody repertoire, mutagenesis model, and cell barcoding, resolve cell collisions and construct antibody trees
Paired data & multi-chain effect
171
Antibody prediction problem. Given an antibody tree, predict pairing of chains producing functional antibody
How accurate is the database of human
V, D, and J segments?
TGTGCGGGGGGTAGCAGTGGCTGGATTGACTACTGG
AGTGGCT
TAGCAGTGGCTGG
TAGCAG
172
V
J
D1
D2
D3
How accurate is the database of human
V, D, and J segments?
TGTGCGGGGGGTAGCAGTGGCTGGATTGACTACTGG
AGTGGCT
TAGCAGTGGCTGG
TAGCAG
173
V
J
D1
D2
D3
How accurate is the database of human
V, D, and J segments?
TGTGCGGGGGGTAGCAGTGGCTGGATTGACTACTGG
AGTGGCT
TAGCAGTGGCTGG
TAGCAG
174
V
J
D1
D2
D3
How accurate is the database of human
V, D, and J segments?
TGTGCGGGGGGTAGCAGTGGCTGGATTGACTACTGG
AGTGGCT
TAGCAGTGGCTGG
TAGCAG
175
V
J
D1
D2
D3
How accurate is the database of human
V, D, and J segments?
TGTGCGGGGGGTAGCAGTGGCTGGATTGACTACTGG
AGTGGCT
TAGCAGTGGCTGG
TAGCAG
176
V
J
Cropped versions of D2?
D1
D2
D3
Two approaches to finding new V, D, J segments
177
Antibody repertoires from 5 pairs of twins
Rubelt et al, Nat Communications, 2016
Genomes Antibodies (Rep-seq)
Watson et al, AJHG, 2013
Watson et al, Genes & Immunity, 2014
CDRs are antigen-binding sites of antibody
(complementarity determining regions)
CDR1
CDR2
CDR3
V
D
J
Heavy chain
Antibody
CDR3 plots
179
Dots correspond to consecutive 10-mers extracted from CDR3
Frequency of 10-mer – number of times this 10-mer appeared in unique CDR3s from 5 * 2 twins
10-mer position
10-mer frequency
CDR3 plots
180
Dots correspond to consecutive 10-mers extracted from CDR3
Frequency of 10-mer – number of times this 10-mer appeared in unique CDR3s from 5 * 2 twins
10-mer position
10-mer frequency
Points corresponding to 10-mers from D-segments are painted red
ATTACTATGG
TTACTATGGT
TACTATGGTT
ACTATGGTTC
CTATGGTTCG
TATGGTTCGG
Peaks in CDR3 plots reveal (parts of) D segments!
181
Dots correspond to consecutive 10-mers extracted from CDR3
Frequency of 10-mer – number of times this 10-mer appeared in unique CDR3s from 5 * 2 twins
10-mer position
10-mer frequency
Points corresponding to 10-mers from D-segments are painted red
ATTACTATGGTTCGG
Germline D: GTATTACTATGGTTCGGGGAGTTATTATAAC
Toward modeling recombination
182
Some fragments have tendency to be cropped on the right side
Toward modeling recombination process
183
Some fragments have tendency to be cropped on both sides
Toward modeling recombination process
184
Some fragments have tendency to be not cropped
Finding novel segments
185
ACCACAGATTATATCGAGAGGGGATATGATGAAGGGGACTAC
Aligning antibody against V, D, and J segments
186
ACCACAGATTATATCGAGAGGGGATATGATGAAGGGGACTAC
End of V
D
Start of J
D alignment generated by IgBlast looks suspiciously short
187
ACCACAGATTATATCGAGAGGGGATATGATGAAGGGGACTAC
7-mer position
7-mer frequency
Novel D segment?
188
ACCACAGATTATATCGAGAGGGGATATGATGAAGGGGACTAC
GGATAT
Our prediction does not match the D-alignment computed by IgBlast and may correspond to novel D segment
One more CDR3 and VDJ alignment
189
ACAAGCGGGGGCGAGAGGTACTCTCATACTAATGGTTATCCAAACTACTTTGACTAC
End of V
D
Start of J
One more CDR3 and VDJ alignment
190
ACAAGCGGGGGCGAGAGGTACTCTCATACTAATGGTTATCCAAACTACTTTGACTAC
End of V
D
Start of J
VD insertion
DJ insertion
191
ACAAGCGGGGGCGAGAGGTACTCTCATACTAATGGTTATCCAAACTACTTTGACTAC
VD insertion
DJ insertion
Two D-segments in a single CDR3?
Double D segments were reported in HIV specific antibodies
Larimore et al, J. of Immun 2012
192
ACAAGCGGGGGCGAGAGGTACTCTCATACTAATGGTTATCCAAACTACTTTGACTAC
TACTAATGGT
First peak falls in VD junction, second peak confirms D-alignment computed by IgBlast
VD insertion
193
ACAAGCGGGGGCGAGAGGTACTCTCATACTAATGGTTATCCAAACTACTTTGACTAC
TACTAATGGT
First peak falls in VD junction, second peak confirms D-alignment computed by IgBlast
VD insertion
D1D2 insertion
D1
D2
Open problem: Are double D-segments common in healthy individuals?
Extension of a known D segment?
194
False positive reference D segment?
195
Triple D segment???
196
Modeling VDJ recombination
197
V1 | V2 | V3 | V4 |
p1 | p2 | p3 | p4 |
-2 | -1 | 0 | 1 | 2 | 3 | 4 |
p(V1, –2) | p(V1, –1) | p(V1, 0) | p(V1, 1) | p(V1, 2) | p(V1, 3) | p(V1, 4) |
Probability of cleavage / palindromic insertion of length l:
Probability of segment usage:
V1:
1 nt | 2 nt | 3 nt | 4 nt | 5 nt |
p(1) | p(2) | p(3) | p(4) | p(5) |
Probability of VD/DJ insertion:
Previous studies used incomplete database of V, D, J segments:
Murugan PNAS 2012; Elhanati Trans R Soc Biol Sci 2015; Ralph PLOS 2016
Modeling VDJ recombination
198
V1 | V2 | V3 | V4 |
p1 | p2 | p3 | p4 |
-2 | -1 | 0 | 1 | 2 | 3 | 4 |
p(V1, –2) | p(V1, –1) | p(V1, 0) | p(V1, 1) | p(V1, 2) | p(V1, 3) | p(V1, 4) |
Probability of cleavage / palindromic insertion of length l:
Probability of segment usage:
V1:
Probability of VD/DJ insertion:
VDJ modeling problem. Given multiple antibody repertoires generated from a set of known and unknown V, D, and J segments, develop a more accurate statistical model of VDJ recombination
1 nt | 2 nt | 3 nt | 4 nt | 5 nt |
p(1) | p(2) | p(3) | p(4) | p(5) |
Finding novel V segments Animate
199
Germline V segment
✤ ✺
✤ ✺
✤ ✺
✤ ✺
✤ ✺
✪ ❃ ❇
✪ ❃ ❇
✪ ❃ ❇
✪ ❃ ❇
✪ ❃ ❇
✤ ✺
✪ ❃ ❇
Novel V segments:
Our analysis revealed:
Novel segments in rhesus macaques were discovered by
Corcoran et al., Nat Communications, 2017
Crossover causes genetic diversity Repaint
200
father chr.
mother chr.
child chr.
child chr.
Unequal crossover may produce new segments
Length of Ig loci: ~1.25 Mbp
Frequency of crossing over: 1 point per 1 Mbp
201
Variations in Ig loci: alternative method for population analysis?
202
Ig loci reconstruction problem. Given error-prone reads representing antibodies and reads sampled from the genome, assemble the Ig loci