Immunoinformatics
Yana Safonova
Center for Algorithmic Biotechnology
Saint-Petersburg State University
Innate & adaptive immune system
2
cell-mediated immune response
humoral immune response
3
Antibody repertoires
Human immune system generates 1012 - 1018 distinct antibodies
Analysis of concentrations of circulating antibodies (antibody repertoire) is a fundamental problem in immunology
While generation of antibody repertoires provides a new avenue for antibody drug development, it remains unclear how to construct antibody repertoires from massive NGS data
Antibody somatic recombination
Antibodies are produced by B-cells, each with unique genome:
4
Antibody somatic recombination
Antibodies are produced by B-cells, each with unique genome:
5
Antibody somatic recombination
Antibodies are produced by B-cells, each with unique genome:
6
Antibody somatic recombination
Antibodies are produced by B-cells, each with unique genome:
7
Antibody somatic recombination
Antibodies are produced by B-cells, each with unique genome:
8
Antibody somatic recombination
Antibodies are produced by B-cells, each with unique genome:
9
Antibody somatic recombination
Antibodies are produced by B-cells, each with unique genome:
10
Antibody somatic recombination
Antibodies are produced by B-cells, each with unique genome:
11
Antibody somatic recombination
Antibodies are produced by B-cells, each with unique genome:
12
Antibody somatic recombination
Antibodies are produced by B-cells, each with unique genome:
13
Antibody somatic recombination
Antibodies are produced by B-cells, each with unique genome:
14
Antibody somatic
recombination
Antibodies are produced by B-cells, each with unique genome:
15
Antibody somatic recombination
Somatic recombination results in unique immunoglobulins genes encoding amino acid sequence of antibodies
16
Antibody somatic recombination
An antibody recognizes a foreign agent (antigen) using its antigen-binding site
17
Antibody somatic recombination
The most diverged part of antigen-binding site is CDR3, or complementarity determining region
18
CDR3
Antibody somatic recombination
Further optimization of antibody affinity is achieved through extensive mutations referred as somatic hypermutations
19
CDR3
Antibody somatic recombination
20
CDR3
Antibody somatic recombination
21
Antibody somatic recombination
22
CDR3
CDR2
CDR1
Three approaches to clustering antibodies
23
V(D)J classification:
what are V, D, and J segments of an antibody?
CDR3 classification:
what is the CDR3 region of an antibody?
Repertoire сonstruction: what is an antibody?
V(D)J classification
decomposes each read to its V, D, and J gene segments
24
IMGT/V-QUEST
Brochet et al, Nucleic Acids Res, 2008
IgBlast
Ye et al, Nucleic Acids Res, 2013
iHMMune-align
Gaeta et al, Bioinformatics, 2007
Antibody graph
Bonissone and Pevzner, RECOMB 2015
V gene segments
D gene segments
J gene segments
V(D)J classification
decomposes each read to its V, D, and J gene segments
25
A
B
C
IMGT/V-QUEST
Brochet et al, Nucleic Acids Res, 2008
IgBlast
Ye et al, Nucleic Acids Res, 2013
iHMMune-align
Gaeta et al, Bioinformatics, 2007
Antibody graph
Bonissone and Pevzner, RECOMB 2015
V gene segments
D gene segments
J gene segments
CDR3 classification
classifies each read according to CDR3 (~30 nt length)
26
Robins et al, Sci Transl Med, 2009
Weinstein et al, Science, 2009
Warren et al, Genome Res, 2011
Bolotin et al, Eur J Immunol, 2012
CDR3 classification
classifies each read according to CDR3 (~30 nt length)
27
A1
B
C
A2
Robins et al, Sci Transl Med, 2009
Weinstein et al, Science, 2009
Warren et al, Genome Res, 2011
Bolotin et al, Eur J Immunol, 2012
Full length antibody classification
(repertoire construction)
takes into account the entire variable region of antibody
MiXCR, IMSEQ, IgRepertoireConstructor tools
28
Full length antibody classification
(repertoire construction)
takes into account the entire variable region of antibody
MiXCR, IMSEQ, IgRepertoireConstructor tools
29
A11
B
A2
A12
C1
C2
Why repertoire reconstruction is important?
Repertoire construction is a prerequisite for analysis of antibody evolution
VDJ and CDR3 classification are not sufficient for understanding variability of antibody
Many antibodies share the same CDR3
30
Sequencing of antibody repertoire
31
Roche 454
VDJ classification only
Illumina MiSeq
VDJ, CDR3 classifications and repertoire reconstruction
Illumina HiSeq
CDR3 classification only
low coverage | high coverage | high coverage |
low accuracy | high accuracy | high accuracy |
long reads | short reads | long reads |
Sequencing of antibody repertoire
32
Construction of antibody repertoire from NGS data and its analysis are important steps in design of antibody drugs and clinical studies
IgRepertoireConstructor
33
Natural variations look like sequencing errors in Ig-seq reads
Our tool reconstructs full-length antibody clones from Illumina reads. It constructs graph showing similarity between Ig-seq reads and identifies original antibody clones using clustering techniques
Works even on polyclonal datasets without barcoding
Shows high accuracy compared to existing tools
Will be available at
Illumina BaseSpace
Hamming graph for the real antibody repertoire
6 antibodies with large abundances and a number of singleton antibodies
34
Each antibody form a dense subgraph in the Hamming graph (distance = 3)
35
Reads derived from the same antibody differ from
each other due to sequencing errors
How does the Hamming graph look like?
107 vertices, 1426 edges
36
Actually, the real Hamming graph is uncolored
Problem: extract the original dense subgraphs of the Hamming graph
37
Results of repertoire construction
3.1M reads → 2.32M antibodies
2.32M = 2.27M trivial + 0.05M non-trivial
38
targets of biomedical follow-up
Multilayer MS search identifies ≈ 22% of spectra at 1% FDR
compared to ≈ 6% at 2% FDR in Cheung et al, Nat Biotech, 2012
target of repertoire analysis:
50,000 non trivial clusters
206 with abundance > 500
largest cluster size is 33,000
Secondary diversification of B-cells
After successful binding with antigen B-cell has undergone somatic hypermutagenesis and clonal expansion
As a result, B-cells originated from the same clone form families
39
Classification of B-cells
Limon and Fruman. Akt and mTOR in B cell activation and differentiation. Front. Immunol. 2012
40
low SHM rate
high expression
short life cycle
high SHM rate
high expression
long life cycle
high SHM rate
medium expression
long life cycle
no SHMs
low expression
questionable life cycle
Evolutionary analysis of Ig repertoire
B-cell family can be presented as clonal tree
Intermediate clones can be both presented and skipped in repertoire
41
Evolutionary analysis of Ig repertoire
Standard approaches do not work here since they expect that all species are located in leafs
42
AntEvolo: basic principles
43
AntEvolo: basic principles
44
AntEvolo: basic principles
45
Results - clear situation
46
Antibody sequences that share CDR3 can be unambiguously presented as a clonal tree
Result - unclear situation
47
This graph can not be simplified without additional information
Let’s try to untangle it!
TGTGCGGGGGGTAGCAGTGGCTGGATTGACTACTGG
TGTGCGAGGGGCGGCAGCAGCTGGTTTGACTACTGG
TGTGCGAGAGATAGCAGTGGCCGGTTTGACTACTGG
TGTGCGAGAGACGCTAGTGGCCCCTTTGACTACTGG
AGTGGCT
TAGCAGTGGCTGG
TAGCAG
48
V
J
IGHD5-12*01
IGHD6-19*01
IGHD6-6*01
Best hits computed by IgBLast
Let’s try to untangle it!
TGTGCGGGGGGTAGCAGTGGCTGGATTGACTACTGG
TGTGCGAGGGGCGGCAGCAGCTGGTTTGACTACTGG
TGTGCGAGAGATAGCAGTGGCCGGTTTGACTACTGG
TGTGCGAGAGACGCTAGTGGCCCCTTTGACTACTGG
AGTGGCT
TAGCAGTGGCTGG
TAGCAG
49
V
J
IGHD5-12*01
IGHD6-19*01
IGHD6-6*01
D
it might be palindrome
Let’s try to untangle it!
TGTGCGGGGGGTAGCAGTGGCTGGATTGACTACTGG
TGTGCGAGGGGCGGCAGCAGCTGGTTTGACTACTGG
TGTGCGAGAGATAGCAGTGGCCGGTTTGACTACTGG
TGTGCGAGAGACGCTAGTGGCCCCTTTGACTACTGG
AGTGGCT
TAGCAGTGGCTGG
TAGCAG
50
V
J
IGHD5-12*01
IGHD6-19*01
IGHD6-6*01
D
Let’s try to untangle it!
TGTGCGGGGGGTAGCAGTGGCTGGATTGACTACTGG
TGTGCGAGGGGCGGCAGCAGCTGGTTTGACTACTGG
TGTGCGAGAGATAGCAGTGGCCGGTTTGACTACTGG
TGTGCGAGAGACGCTAGTGGCCCCTTTGACTACTGG
AGTGGCT
TAGCAGTGGCTGG
TAGCAG
51
V
J
IGHD5-12*01
IGHD6-19*01
IGHD6-6*01
D
Let’s try to untangle it!
TGTGCGGGGGGTAGCAGTGGCTGGATTGACTACTGG
TGTGCGAGGGGCGGCAGCAGCTGGTTTGACTACTGG
TGTGCGAGAGATAGCAGTGGCCGGTTTGACTACTGG
TGTGCGAGAGACGCTAGTGGCCCCTTTGACTACTGG
AGTGGCT
TAGCAGTGGCTGG
TAGCAG
52
V
J
IGHD5-12*01
IGHD6-19*01
IGHD6-6*01
D
Let’s try to untangle it!
TGTGCGGGGGGTAGCAGTGGCTGGATTGACTACTGG
TGTGCGAGGGGCGGCAGCAGCTGGTTTGACTACTGG
TGTGCGAGAGATAGCAGTGGCCGGTTTGACTACTGG
TGTGCGAGAGACGCTAGTGGCCCCTTTGACTACTGG
AGTGGCT
TAGCAGTGGCTGG
TAGCAG
53
V
J
IGHD5-12*01
IGHD6-19*01
IGHD6-6*01
Result - simplified unclear situation
54
To simplify such clonal graph we need more information about V(D)J formation
Result - VERY unclear situation
55
Most situations can not be simplified manually:
we need to introduce CDR3 analysis as an algorithm step
Analysis of SHMs
based on clonal tree
TGTGCCAGAGATCCTGGCAGCTCGTCTTACTGGTACTTCGATCTCTGG
56
synonymous mutations
meaningful mutations
V
D
J
CDR3 does not contain cleavage: only P & N insertions, that can be evidence of large age of B-cell
One of the longest paths contains synonymous SHMs only
Utilizing information about pairing
Heavy chain
Light chain
57
28415 - LC
28520 - LC
28449 - LC
Application of AntEvolo to analysis of immunization
58
before immunization
right after immunization
highest
immune response
Clonal analysis of time series of antibody repertoire allows one to estimate efficiency of immune response
Analysis of SHMs using clonal trees
59
shared
SHM
synonymous SHMs
Useless mutations:
Helpful mutation:
Deep analysis of SHMs in an antibody family using clonal tree allows one to detect helpful SHMs
Combination of shared hypermutations potentially can improve affinity of antibody
60
Thank you!
61