Immunoinformatics: application of algorithmic approaches to
solving immunological problems
Center for Algorithmic Biotechnology
St. Petersburg State University
Yana Safonova
Outline
Innate & adaptive immune system
3
cell-mediated immune response
humoral immune response
Antibody & antigen
Antigen recognition
Antibody & antigen
Antigen recognition
Antibody - antigen binding
Antibody & antigen
Antigen recognition
Antibody - antigen binding
1. Antigen
neutralization
Antibody & antigen
Antigen recognition
2. Destroying antigen by
immune cells
Antibody - antigen binding
1. Antigen
neutralization
Once you’ve met an antigen,
your adaptive immune system never forgets it!
This principle is used for vaccine design:
Real antigens
Once you’ve met an antigen,
your adaptive immune system never forgets it!
This principle is used for vaccine design:
Real antigens
Vaccine
Once you’ve met an antigen,
your adaptive immune system never forgets it!
Where do antibody live?
Antibody repertoires
There is a billion of B-cells circulating in human blood at any given moment (out of 1018 estimated antibodies)
13
Analysis of concentrations of all antibodies in the organism (antibody repertoire) is a
fundamental problem in immunology
While generation of antibody repertoires provides a new avenue for antibody drug development, it remains unclear how to construct antibody repertoires from NGS data
V(D)J recombination
Antibodies are produced by B-cells, each with unique genome:
14
IGH locus in human genome (1 MB length)
Antibody somatic recombination
Antibodies are produced by B-cells, each with unique genome:
15
Antibody somatic recombination
Antibodies are produced by B-cells, each with unique genome:
16
Antibody somatic recombination
Antibodies are produced by B-cells, each with unique genome:
17
Antibody somatic recombination
Antibodies are produced by B-cells, each with unique genome:
18
Antibody somatic recombination
Antibodies are produced by B-cells, each with unique genome:
19
Random
insertions/deletions
Antibody somatic recombination
Antibodies are produced by B-cells, each with unique genome:
20
Antibody somatic recombination
Antibodies are produced by B-cells, each with unique genome:
21
Antibody somatic recombination
Antibodies are produced by B-cells, each with unique genome:
22
Antibody somatic recombination
Antibodies are produced by B-cells, each with unique genome:
23
Antibody somatic recombination
Antibodies are produced by B-cells, each with unique genome:
24
Antibody somatic recombination
Antibodies are produced by B-cells,
each with unique genome:
25
Random
insertions/deletions
Antibody somatic recombination
Somatic recombination results in unique immunoglobulins genes encoding amino acid sequence of antibodies
26
Antibody versus antigen
An antibody recognizes a foreign agent (antigen) using its antigen-binding site
27
Antigen binding site in antibody
The most diverged part of antigen-binding site is complementarity determining region 3 (CDR3)
28
CDR3
Somatic hypermutations
Further optimization of antibody affinity is achieved through somatic hypermutations
29
CDR3
...many somatic hypermutations
30
CDR3
Architecture of antibodies
From biological problems to computational challenges
VDJ classification problem. Given an antibody generated from a known set of V, D, and J segments, identify what specific V, D, and J segments generated this antibody
From biological problems to computational challenges
VDJ classification problem. Given an antibody generated from a known set of V, D, and J segments, identify what specific V, D, and J segments generated this antibody
From biological problems to computational challenges
VDJ classification problem. Given an antibody generated from a known set of V, D, and J segments, identify what specific V, D, and J segments generated this antibody
Important model organisms in immunology with still unknown sets of V, D, and J segments
From biological problems to computational challenges
VDJ classification problem. Given an antibody generated from a known set of V, D, and J segments, identify what specific V, D, and J segments generated this antibody
VDJ reconstruction problem. Given a set (millions) of antibodies generated from an unknown set of V, D, and J segments, reconstruct these sets
Outline
Sequencing of antibody repertoire
37
Roche
454
(2005)
low coverage
low accuracy
long reads
VDJ classification
VDJ classification
Sequencing of antibody repertoire
Roche
454
(2005)
low coverage
low accuracy
long reads
VDJ classification
Illumina HiSeq 2000
(2001)
high coverage
high accuracy
short reads
+ CDR3 classification
CDR3 classification
VDJ classification
Sequencing of antibody repertoire
Roche
454
(2005)
low coverage
low accuracy
long reads
VDJ classification
Illumina MiSeq
(2013)
med. coverage
high accuracy
long reads
+ full-length classification
Illumina HiSeq 2000
(2001)
high coverage
high accuracy
short reads
+ CDR3 classification
full-length classification
CDR3 classification
VDJ classification
Sequencing of antibody repertoire
40
Illumina MiSeq
(2013)
med. coverage
high accuracy
long reads
+ full-length classification
Illumina HiSeq 2000
(2001)
high coverage
high accuracy
short reads
+ CDR3 classification
Roche
454
(2005)
low coverage
low accuracy
long reads
VDJ classification
HiSeq Rapid SBS Kit v2
(2015)
high coverage
high accuracy
long reads
+ high throughput
high
throughput
full-length classification
CDR3 classification
VDJ classification
Full-length antibody classification
(repertoire construction)
In contrast to well-studied VDJ and CDR3 classification, full-length antibody classification takes into account the entire variable region of antibody
41
MiGEC: Shugay et al., Nat Methods, 2014�MiXCR: Bolotin et al., Nat Methods, 2015�IMSEQ: Kuchenbecker et al., Bioinformatics, 2015�IgRepertoireConstructor: Safonova et al., Bioinformatics, 2015
Repertoire construction problem
What makes this clustering problem difficult?
x 1018
High repetitiveness
High mutation rate
Huge repertoire size
Uneven distribution of abundances
Outline
Secondary diversification of antibodies
Clonal analysis of antibody repertoire
46
Clonal analysis of antibody repertoire
47
Clonal analysis of antibody repertoire
48
Clonal analysis of antibody repertoire
Standard phylogenetic algorithms assume that all species are represented by leaves and should be adapted for clonal trees
49
Who is the ancestor here?
50
germline segments
Who is the ancestor here?
51
1
2
New
hypermutaions
Who is the ancestor here?
52
1
2
Shared hypermutations
New
hypermutaions
Another example: who is the ancestor here?
53
Another example: who is the ancestor here?
54
Individual hypermutations 1
Individual hypermutations 2
Ancestral antibody may be missing…
55
Shared hypermutaions
1
2
Ancestral antibody is not present in the repertoire
Individual hypermutations 1
Individual hypermutations 2
What is the evolutionary tree?
9 antibody sequences share CDR3 and differ by SHMs in V segments
Hypermutations (SHMs)
in V segment
Any tree reconstruction approach will work
+1
+3
+1
+2
+2
+4
+3
+1
Nested SHMs define directions of edges between antibodies in the clonal tree
Repertoire construction step is very important for clonal analysis!
Repertoire construction step is very important for clonal analysis!
SHMs in V segments are easy to find
D
J
SHMs in CDR3 are difficult to identify
SHMs in CDR3 are difficult to identify
A more complex case: who is the ancestor?
63
CDR3
A more complex case: who is the ancestor?
64
CDR3
?
A more complex case: who is the ancestor?
65
1
2
Information about VDJ scenarios allows us to make the a choice:
CDR3
?
A more complex case: who is the ancestor?
66
Information about VDJ scenarios allows us to make the right choice:
CDR3
?
1
2
Another puzzle
4 antibodies share SHMs in V segments but differ in CDR3s
Another puzzle
Why do we need a VDJ probabilistic model?
To compute VDJ scenario, we need to:
Murugan et al., PNAS, 2012
V
D
J
Why do we need a VDJ probabilistic model?
To compute VDJ scenario, we need to:
Murugan et al., PNAS, 2012
V
D
J
Recombination events are not distributed uniformly
Why do we need a VDJ probabilistic model?
To compute VDJ scenario, we need to:
Murugan et al., PNAS, 2012
V
D
J
We need a probabilistic VDJ recombination model for a realistic description of these events
Recombination events are not distributed uniformly
Why do we need an SHM probabilistic model?
Somatic hypermutagenesis engages AID enzyme that changes immunoglobulin genes to improve antibody affinity
Rogozin and Kolchanov, Biochimica et Biophysica Acta, 1992
SHM hotspots such as the degenerative 4-mers:
trigger mutations in antibodies
AT
AG
C
CT
Building probabilistic SHM model
Yaari et al., Front Immunol, 2013
5-mer | Freq | A | C | G | T |
ACAAC | 83 | − | 0.24 | 0.48 | 0.28 |
GGCGT | 1742 | 0.22 | − | 0.12 | 0.66 |
CCGTC | 12 | 0.35 | 0.52 | − | 0.13 |
TCTCC | 516 | 0.32 | 0.54 | 0.14 | − |
Building probabilistic SHM model
Yaari et al., Front Immunol, 2013
TCTCC 5-mer profiles for IGL, IGH, and IGK chains aggregated over 60 datasets
5-mer | Freq | A | C | G | T |
ACAAC | 83 | − | 0.24 | 0.48 | 0.28 |
GGCGT | 1742 | 0.22 | − | 0.12 | 0.66 |
CCGTC | 12 | 0.35 | 0.52 | − | 0.13 |
TCTCC | 516 | 0.32 | 0.54 | 0.14 | − |
Outline
Time series
Laserson et al, PNAS, 2014
Clonal analysis in time
Clonal analysis of time series of antibody repertoire allows one to estimate efficiency of immune response
Sequencing data provided by
before immunization
right after immunization
highest
immune response
Outline
Clonal analysis for antibody repertoire
Sequencing data provided by
Clonal analysis for paired antibody repertoire
Sequencing data provided by
Clonal analysis for antibody repertoire
Sequencing data provided by
Light chain duality
co-expression of both kappa and lambda chains by a single B-cell
Pelanda et al., Cur Opin Immunol, 2014
Giachino et al., J Exp Med, 1995
Allelic inclusion
production of chains from both haplomes by B-cells
Casellas et al., J Exp Med, 2007
Beck-Engeser et al., PNAS, 1987
Duality + allelic inclusion
A single B-cell may express multiple chains due to allelic inclusions and/or light chain duality
Multi-chain effect
A single B-cell may express multiple chains due to allelic inclusions and/or light chain duality
Multi-chain effect: B-cell can express up to 6 different chains:
Multi-chain effect
A single B-cell may express multiple chains due to allelic inclusions and/or light chain duality
?
?
which ones participate in the real pairing?
Multi-chain effect: B-cell can express up to 6 different chains:
Multi-chain effect is common in healthy B-cells!
one heavy chain (IGM)
IGK + IGL
one heavy chain
(IGM +IGD)
IGK + IGL
one heavy chain (IGA)
IGK + IGL
25% (!) of B-cells with known pairing have allelic inclusions and/or light chain duality
two heavy chains and single light chain
two heavy chains and multiple light chains
Clonal analysis reveals true chain pairing
K1
H1
L1
Cells 1, 2, and 3 express identical heavy, kappa and lambda chains. Thus, 1, 2, and 3 are clones of the same B-cell
Which light chain contributes to the antibody:
kappa or lambda?
1
2
3
Example from AbVitro sequencing data
Clonal analysis reveals true chain pairing
K1
H1
L1
L2
1
2
3
Cell 4 shares heavy and kappa chains with cells 1, 2 and 3, but has different lambda chain (L2)
4
Clonal analysis reveals true chain pairing
K1
H1
L1
L2
1
2
3
Alignment of L1 and L2 reveals that L1 is an ancestor of L2
Thus, cell 4 is a descendant of cells 1, 2, and 3
4
Cell 4 shares heavy and kappa chains with cells 1, 2 and 3, but has different lambda chain (L2)
Clonal analysis reveals true chain pairing
Clonal analysis reveals true chain pairing
K1
H1
L1
L2
1
2
3
4
Alignment of L1 and L2 reveals that L1 is an ancestor of L2
Thus, cell 4 is a descendant of cells 1, 2, and 3
Evolution of L1 into L2 provides evidence that cells 1, 2, 3, and 4 generate functional antibodies
Clonal analysis reveals true chain pairing
K1
H1
L1
L2
1
2
3
4
But it contradicts with a fact that H1 is non-productive
Evolution of L1 into L2 provides evidence that cells 1, 2, 3, and 4 generate functional antibodies
Alignment of L1 and L2 reveals that L1 is an ancestor of L2
Thus, cell 4 is a descendant of cells 1, 2, and 3
There are more B-cells to analyze!
K1
H1
L1
L2
K2
H2
1
2
3
4
Cell 5 expresses heavy and kappa chains
5
There are more B-cells to analyze!
K1
H1
L1
L2
K2
K3
H2
1
2
3
4
5
K2 and K1 have originated from a an unknown kappa chain K3 that is missing in the repertoire
We are not done yet…
K1
H1
L1
L2
K2
K3
H2
H3
K4
L3
1
2
3
4
5
Cell 6 expresses heavy, kappa and lambda chains
6
We are not done yet…
K1
H1
L1
L2
K2
K3
H2
H3
K4
L3
1
2
3
4
5
Alignment reveals that H3 is an ancestor of H2
6
We are not done yet…
K1
H1
L1
L2
K2
K3
H2
H3
K4
L3
1
2
3
4
5
K4 is an ancestor of a virtual chain K3
6
We are not done yet…
K1
H1
L1
L2
K2
K3
H2
H3
K4
L3
1
2
3
4
5
L3 is an ancestor of L1
6
Evolutionary analysis helps to understand true chain pairing
H1 lineage is non-productive, so it does not participate in pairing
Lineage H3 → H2 is more likely to participate in chain pairing
Evolutionary analysis helps to understand true chain pairing
Lambda lineage undergoes selection, thus it more likely participates in chain pairing
Evolutionary analysis helps to understand true chain pairing
Using information about clonal lineages for H, K and L chains and the SHM model, we can select the most likely chain pairing
H3 → H2
L3 → L1 → L2
Alexander
Shlemov
Andrey
Bzikadze
Sergey
Bankevich
Alla
Lapidus
Pavel A.
Pevzner
Timofey
Prodanov
Andrey
Slabodkin
Yana
Safonova
Thank you!
103