Clonal analysis of
antibody repertoires
Yana Safonova
Data Science Postdoctoral Fellow
Antibodies are agents of the adaptive immune system
V
V
V
D
D
D
J
J
D
chr 14
Antibodies are agents of the adaptive immune system
V
V
V
D
D
D
J
J
D
chr 14
J
D
V
V
J
antibody gene
CDRs represent antigen-binding sites
J
D
V
CDR1
CDR2
CDR3
FR1
FR2
FR3
FR4
Antibody repertoire is unique for individual
5
Genome
Antibody repertoire is unique for individual
6
Genome
Microbiome
Antibody repertoire is unique for individual
7
Genome
Microbiome
Immunome
Antibodies are not as versatile as we think
8
Most bnAbs to influenza are made of IGHV1-69
9
V
D
J
IGHV1-69
IGHV1-69 has 14 allelic variations!
10
V
D
IGHV1-69
IGHV1-69*01
IGHV1-69*02
IGHV1-69*03
IGHV1-69*14
...
55: Phe
55: Leu
55: Phe
55: Phe
IGHV1-69*04
55: Leu
J
Only 50% of alleles are specific to influenza
11
V
D
J
IGHV1-69
IGHV1-69*01
IGHV1-69*02
IGHV1-69*03
IGHV1-69*14
...
55: Phe
55: Leu
55: Phe
55: Phe
IGHV1-69*04
55: Leu
Successful binding to hemagglutinin
Loss of binding properties
Importance of repertoire studies
Known associations
12
Influenza + IGHV1-69,
Lingwood et al., 2012
Cytomegalovirus + IGHV3-30, IGKV3-11
Thomson et al., 2008
Rheumatic heart disease + IGHV4-61
Parks et al., 2017
Antibody repertoire + GWAS
Known associations
Future directions
13
Influenza + IGHV1-69,
Lingwood et al., 2012
Cytomegalovirus + IGHV3-30, IGKV3-11
Thomson et al., 2008
Rheumatic heart disease + IGHV4-61
Parks et al., 2017
allergy
Standard vaccination
14
Standard vaccination
15
0
10
20
30
40
50
60
70
Personalized future of vaccinations
16
| | | | |
| | | | |
| | | | |
Personalized future of vaccinations
17
| | | | |
| | | | |
| | | | |
Personalized future of vaccinations
18
| | | | |
| | | | |
| | | | |
Personalized future of vaccinations
19
| | | | |
| | | | |
| | | | |
State-of-the-art vaccination studies
20
| | | | |
| | | | |
State-of-the-art vaccination studies
21
| | | | |
| | | | |
Collecting antibody repertoire | ||||
| | | | |
State-of-the-art vaccination studies
22
| | | | |
| | | | |
Collecting antibody repertoire | ||||
| | | | |
Finding traits separating the positive and negative groups | ||||
IGHV1-69*01 55, Phe | IGHV1-69*01 55, Phe | IGHV1-69*02 55, Leu | IGHV1-69*01 55, Phe | IGHV1-69*04 55, Leu |
Antibody repertoire sequencing (Rep-seq)
23
V
D
J
Length: ~360 nt
Left read
Right read
VDJ from DNA or RNA
Error-prone immunosequencing reads
Turchaninova et al, Nat Protocols, 2016
Repertoire construction problem
24
× 7
× 3
× 2
× 2
Antibody repertoire
× 1
Rep-seq reads
Antibody repertoire is the set of
antibody sequences with their abundances
pRESTO | MiXCR | IgRepertoireConstructor |
Vander Heiden et al., 2014 | Bolotin et al., 2015 | Safonova et al., 2015 |
Quality assessment of antibody repertoires
25
Constructed repertoire
Reference repertoire
Precision:
Sensitivity:
Cluster size threshold: 1
26
1
0
0.5
0
1
0.5
1
Precision
Sensitivity
Constructed repertoire
Reference repertoire
Precision:
Sensitivity:
= 6 / 10
= 6 / 9
Cluster size threshold: 3
27
1
0
0.5
0
1
0.5
1
3
Precision
Sensitivity
Constructed repertoire
Reference repertoire
Precision:
Sensitivity:
= 5 / 8
= 5 / 9
Cluster size threshold: 5
28
1
0
0.5
0
1
0.5
1
3
5
Precision
Sensitivity
Constructed repertoire
Reference repertoire
Precision:
Sensitivity:
= 4 / 5
= 4 / 9
Cluster size threshold: 10, 100
29
1
0
0.5
0
1
0.5
1
3
5
10
100
Precision
Sensitivity
Constructed repertoire
Reference repertoire
Precision:
Sensitivity:
= 2 / 2
= 2 / 9
Optimal threshold
Quality assessment of repertoires
Repertoire of a healthy individual
Repertoire of a vaccinated individual
Shlemov et al., 2017
30
Antibodies are subjects of fast evolution
Immune system mutates and amplifies a binding antibody
Mutation rate in antibody genes is 3-4 order of magnitude higher than in other genome
One antibody = one antigen
Antibody repertoire is a set of clonal lineages
Antibody repertoire is a set of unknown clonal lineages
Quality assessment of real repertoires
Repertoire of a healthy individual:
Repertoire of a vaccinated individual:
Shlemov et al., 2017
36
Highly abundant antibodies create artificial diversity
37
Highly abundant antibodies create artificial diversity
38
Highly abundant antibodies create artificial diversity
39
B (3)
A (1)
C (1)
D (1)
E (1)
A
B
C
D
E
1
1
2
1
B
B
Hamming graph reveals similarities between reads
40
B (3)
A (1)
C (1)
D (1)
E (1)
A
B
C
D
E
1
1
2
1
B
B
2
3
3
2
2
3
MST reveals artificial antibodies
41
B (3)
A (1)
C (1)
D (1)
E (1)
A
B
C
D
E
1
1
2
1
B
B
2
3
3
2
2
3
MST reveals artificial antibodies
42
B (3)
A (1)
C (1)
D (1)
E (1)
A
B
C
D
E
1
1
2
1
B
B
MST reveals artificial antibodies
43
B (3)
A
B
C
D
E
B
B
Problem 1&2: simultaneous error correction and clonal reconstruction
44
Constructing similarity graph on distinct antibody sequences & decomposing into clonal lineages
Maximum parsimony phylogenetic tree: evolutionary history with the minimum number of changes
Cleaning sequencing errors and other sample preparation artifacts
A
B
C
D
E
B
A
C
D
E
1
1
2
1
B
Finding MST in the constructed graph
Removing leaves in the constructed tree
Choice of MST is ambiguous
45
Edges with the same weight lead to many minimum spanning trees
>1k
<10
10–100
100–1k
Abundances
Choice of MST is ambiguous
46
Different MSTs have different number of leaves
>1k
<10
10–100
100–1k
Abundances
Choice of MST is ambiguous
47
>1k
<10
10–100
100–1k
Abundances
6 lowly abundant sequences were not removed from the first MST
Simple MST vs max leaf MST
48
Error correction + clonal reconstruction
49
Constructing similarity graph on distinct antibody sequences & decomposing into clonal lineages
Maximum parsimony phylogenetic tree: evolutionary history with the minimum number of changes
Cleaning sequencing errors and other sample preparation artifacts
A
B
C
D
E
B
A
C
D
E
1
1
2
1
B
Finding the max leaf MST in the constructed graph
Removing leaves in the constructed tree
From clonal tree to clonal graph
50
Usage of V genes
51
Usage of gene V = # reads aligned to gene V / total # reads * 100
Reflects high abundant clonal lineages
Does not reflect VDJ recombinations
Four donors vaccinated with flu
Clonal usage of V genes
52
Four donors vaccinated with flu
Clonal usage of gene V = # lineages derived from gene V / total # lineages * 100
Reflect VDJ recombinations
Makes differences between individuals more visible
Clonal usage of V genes
53
Four donors vaccinated with flu
Clonal usage of gene V = # lineages derived from gene V / total # lineages * 100
Reflect VDJ recombinations
Makes differences between individuals more visible
Influenza
+
IGHV1-69,
Lingwood et al., 2012
Alignment of naive Abs to IGHV1-69
54
ACGCGATCGATCGATCGATC
ACGCGATCGATCGATCGATC
ACGCGATCGATCGATCGATC
ACGCGGTCGATCGATCGATC
ACGCGGTCGATCGATCGATC
ACGCGGTCGATCGATCGATC
ACGCGATCGATGGATCGATC
6 12
Reads
IGHV1-69
6 12
100%
50%
0%
50%
100%
Alignment of naive Abs to IGHV1-69
55
0%
50%
100%
AP | 50 | 55 | ||||||
NP | 148 | 149 | 150 | AA | 163 | 164 | 165 | AA |
Dnr4 | G | G | G | G | T | T | T | F |
Dnr5 | A | G | G | R | C | T | T | L |
Dnr6 | G | G | G | G | C T | T | T | L F |
Dnr8 | G | G | G | G | T | T | T | F |
AP | 57 | 74 | ||||||
NP | 169 | 170 | 171 | AA | 220 | 221 | 222 | AA |
Dnr4 | A | C | A | T | A G | A | A | K E |
Dnr5 | A | T | A | I | A | A | A | K |
Dnr6 | A | C T | A | T I | A G | A | A | K E |
Dnr8 | A | C | A | T | G | A | A | E |
Alignment of naive Abs to IGHV1-69
56
0%
50%
100%
AP | 50 | 55 | ||||||
NP | 148 | 149 | 150 | AA | 163 | 164 | 165 | AA |
Dnr4 | G | G | G | G | T | T | T | F |
Dnr5 | A | G | G | R | C | T | T | L |
Dnr6 | G | G | G | G | C T | T | T | L F |
Dnr8 | G | G | G | G | T | T | T | F |
AP | 57 | 74 | ||||||
NP | 169 | 170 | 171 | AA | 220 | 221 | 222 | AA |
Dnr4 | A | C | A | T | A G | A | A | K E |
Dnr5 | A | T | A | I | A | A | A | K |
Dnr6 | A | C T | A | T I | A G | A | A | K E |
Dnr8 | A | C | A | T | G | A | A | E |
SHMs in in position 55, IGHV1-69
57
F / L / others
SHMs in position 74, IGHV1-69
58
K / E / others
IGHV1-69: summary
59
AP | 50 | 55 | ||||||
NP | 148 | 149 | 150 | AA | 163 | 164 | 165 | AA |
Dnr4 | G | G | G | G | T | T | T | F |
Dnr5 | A | G | G | R | C | T | T | L |
Dnr6 | G | G | G | G | C T | T | T | L F |
Dnr8 | G | G | G | G | T | T | T | F |
AP | 57 | 74 | ||||||
NP | 169 | 170 | 171 | AA | 220 | 221 | 222 | AA |
Dnr4 | A | C | A | T | A G | A | A | K E |
Dnr5 | A | T | A | I | A | A | A | K |
Dnr6 | A | C T | A | T I | A G | A | A | K E |
Dnr8 | A | C | A | T | G | A | A | E |
Other genes with differences in clonal usage
60
Positional clonal graph
61
Position 57 in CDR2
Positional clonal graph
62
Position 34 in FR2
Mutability plot
63
Finding Ag binding sites
Positions with high mutability likely correspond to antigen binding sites
SHM analysis helps to reveal non-canonical binding sites
In addition to CDR 1 - 3, FR3 may also contain Ag binding site called CDR4
64
CDR1
CDR2
CDR3
CDR4?
CDRs ≠ antigen-binding sites
Sela-Culang et al., Frontiers Immunol, 2013
CDRs ≠ antigen-binding sites
Sela-Culang et al., Frontiers Immunol, 2013
Antibody humanization
67
Santos et al., Braz J Pharm Sci, 2018
H1
H2
H3
H1
H2
H3
Clonal analysis + humanization
Hypothetical pipeline:
68
Algorithm limitations
69
Biological follow-up projects
Estimating efficacy of vaccines in cow population
Population-wide study of human antibody repertoires
70
Cattle antibodies with ultra-long CDR3s
Wang et al., Cell, 2013
Cattle IGH locus
Stanfield et al., Adv Immunol, 2018
IGHD8-2
ORF = 3
GTAGTTGTCCTGATGGTTATAGTTATGGTTATGGTTGTGGTTATGGTTATGGTTGTAGTGGTTATGATTGTTATGGTTATGGTGGTTATGGTGGTTATGGTGGTTATGGTTATAGTAGTTATAGTTATAGTTATACTTACGAATATAC
ORF = 2
GTAGTTGTCCTGATGGTTATAGTTATGGTTATGGTTGTGGTTATGGTTATGGTTGTAGTGGTTATGATTGTTATGGTTATGGTGGTTATGGTGGTTATGGTGGTTATGGTTATAGTAGTTATAGTTATAGTTATACTTACGAATATAC
ORF = 1
GTAGTTGTCCTGATGGTTATAGTTATGGTTATGGTTGTGGTTATGGTTATGGTTGTAGTGGTTATGATTGTTATGGTTATGGTGGTTATGGTGGTTATGGTGGTTATGGTTATAGTAGTTATAGTTATAGTTATACTTACGAATATAC
Cys
TGT
TGC
AGC
CGC
GGC
TAC
TCC
TTC
TGA
TGG
AGT
CGT
GGT
TAT
TCT
TTT
Cys-induced diversity of cattle antibodies
Wang et al., Cell, 2013
Acknowledges
UCSD Data Science postdoctoral fellowships
AAI fellowship for computational immunologists
UCSD
Vinnu Bhardwaj
Andrey Bzikadze
Chao Zhang
Massimo Franceschetti
Siavash Mirarab
Ramesh Rao
USDA
Tim Smith
Sung Bong Shin
Digital Proteomics
Stefano Bonissone
Natalie Castellana
Stanford U
Scott Boyd
U of Louisville
William Gibson
Justin Kos
Pavel Pevzner
UCSD
Corey Watson
U of Louisville