Immunogenomics
Yana Safonova
University of California San Diego
University of Louisville School of Medicine
Population studies of IG loci
From biological problems to computational challenges
3
VDJ classification problem. Given an antibody generated from a known set of V, D, and J segments, identify what specific V, D, and J segments generated this antibody
From biological problems to computational challenges
4
Model organisms in immunology with still unknown sets of V, D, and J segments
VDJ classification problem. Given an antibody generated from a known set of V, D, and J segments, identify what specific V, D, and J segments generated this antibody
From biological problems to computational challenges
5
VDJ reconstruction problem. Given antibodies generated from an unknown set of V, D, and J segments, reconstruct these sets
VDJ classification problem. Given an antibody generated from a known set of V, D, and J segments, identify what specific V, D, and J segments generated this antibody
VDJ classification problem is solved!
6
IMGT/V-QUEST
Brochet et al, Nucleic Acids Res, 2008
IgBlast
Ye et al, Nucleic Acids Res, 2013
iHMMune-align
Gaeta et al, Bioinformatics, 2007
Antibody graph
Bonissone and Pevzner, RECOMB 2015
VDJ classification problem. Given an antibody generated from a known set of V, D, and J segments, identify what specific V, D, and J segments generated this antibody
includes database of V, D, J segments
VDJ classification problem is solved!
7
VDJ classification problem. Given an antibody generated from a known set of V, D, and J segments, identify what specific V, D, and J segments generated this antibody
Only if VDJ segments do not vary widely between individuals!
How VDJ segments vary across population?
8
VDJ variants problem. Given reference V, D, and J segments and antibody repertoire from an individual, reconstruct how V, D, and J segments in this individual differ from the reference and discover new V, D, and J segments.
Finding novel V segments
9
Germline V segment
✤ ✺
✤ ✺
✤ ✺
Finding novel V segments
10
Germline V segment
✤ ✺
✤ ✺
✤ ✺
✤ ✺
✤ ✺
✪ ❃ ❇
✪ ❃ ❇
✤ ✺
Novel V segments:
Finding novel V segments
11
Germline V segment
✤ ✺
✤ ✺
✤ ✺
✤ ✺
✤ ✺
✪ ❃ ❇
✪ ❃ ❇
✪ ❃ ❇
✪ ❃ ❇
✪ ❃ ❇
✤ ✺
✪ ❃ ❇
Novel V segments:
Finding novel V segments
12
Germline V segment
✤ ✺
✤ ✺
✤ ✺
✤ ✺
✤ ✺
✪ ❃ ❇
✪ ❃ ❇
✪ ❃ ❇
✪ ❃ ❇
✪ ❃ ❇
✤ ✺
✪ ❃ ❇
Novel V segments:
Chromosome recombination is a source of genetic diversity
V1
V2
V3
V4
V1
V2
V3
V4
Chromosome recombination is a source of genetic diversity
V1
V2
V3
V4
V1
V2
V3
V4
V1
V2
V3
V4
V1
V2
V3
V4
Chromosome recombination is responsible for changing CNVs of existing genes...
V1
V2
V3
V4
V1
V2
V3
V4
V1
V3
V4
V1
V2
V2
V3
Alu
Alu
Alu
Alu
V4
Alu
Alu
Alu
Alu
… and creating novel genes
V1
V2
V3
V4
V1
V2
V3
V4
V1
V4
V1
V2
V3
V3
V2
V2
V3
V4
Chromosome recombination may result in changing IGH structure
V1
V2
D1
V1
V2
V1
Alu
Alu
Alu
D2
Alu
D1
D2
Alu
D2
V1
V2
Alu
Alu
D1
V2
D1
D2
Alu
… and possible VDJ recombinations
V1
V2
Alu
Alu
D1
V2
D1
D2
Alu
Immune genes accumulate many mutations
V1
V2
Alu
Alu
D1
V3
D2
D3
Alu
Immune genes accumulate many mutations
V1
V2
Alu
Alu
D1
V3
D2
D3
Alu
VDJ recombination is no longer able to recombine V3 and D1
Repeat composition of IGH locus
Alu
MIR
LINE1
LINE2
Retrotransposons
retroviral and other LTRs
DNA transposons
medium frequency repetitive sequences
simple repeats
Matsuda et al., J Exp Med, 1998
Mutations in IGHV genes matter!
Lingwood et al. Structural and genetic basis for development of broadly neutralizing influenza antibodies. Nature. 2012
Mutations in IGHV genes matter!
Lingwood et al. Structural and genetic basis for development of broadly neutralizing influenza antibodies. Nature. 2012
The loss of binding abilities was shown experimentally!
Lingwood et al. Structural and genetic basis for development of broadly neutralizing influenza antibodies. Nature. 2012
Three types of neutralizing immunoglobulins: grey - wild type, colored - mutant
Mutations in IGHV genes matter!
Avnir et al, Scientific Reports, 2016
Other known associations
Population-based paradigm of vaccination
Watson et al, Trends in Immunology, 2017
Approaches for finding IGH variants
Assembly of IGH locus from WGS data
(+) easy to find non-mutated V genes
(–) assembly is very challenging due to repeats
(–) it is unclear how to identify expressed genes
Analysis of Rep-seq data (based on RNA)
(+) easy to identify expressed genes
(–) sequences might contain SHMs and errors
(–) even for non-mutated sequences, inference of novel genes can be tricky
Assembly of IGH locus
What are pros / cons of these approaches?
Another attempt to assemble IGH locus
Watson et al, The American Journal of Human Genetics, 2013
Summary of SVs in IGH locus
De novo inference of
immunoglobulin genes using Rep-seq
Antibodies are not as versatile as we think
33
Most bnAbs to influenza are made of IGHV1-69
34
V
D
J
IGHV1-69
IGHV1-69 has 14 allelic variations!
35
V
D
IGHV1-69
IGHV1-69*01
IGHV1-69*02
IGHV1-69*03
IGHV1-69*14
...
54: Phe
54: Leu
54: Phe
54: Phe
IGHV1-69*04
54: Leu
J
Only 50% of alleles are specific to influenza
36
V
D
J
IGHV1-69
IGHV1-69*01
IGHV1-69*02
IGHV1-69*03
IGHV1-69*14
...
54: Phe
54: Leu
54: Phe
54: Phe
IGHV1-69*04
54: Leu
Successful binding to hemagglutinin
Loss of binding properties
Main idea
Main idea
Computational challenges
Alignment shows potential variations
Unknown genes can also be similar
Closest known gene
50%
Real V genes:
Inference of novel immunoglobulin genes
Problem 1: Given a database of known V, D, J genes and a Rep-seq sample, compute individual V, D, and J genes
Problem 2: Given a database of known V, D, J genes and a Rep-seq sample, compute individual alleles of V, D, and J genes
Inference of novel alleles of V & J genes
TigGER tool: Gadala-Maria D, Yaari G, Uduman M, Kleinstein SH. Automated analysis of high-throughput B-cell sequencing data reveals a high frequency of novel immunoglobulin V gene segment alleles. Proc Natl Acad Sci U S A. 2015 Feb 24;112(8):E862-70.
Inference of novel V and J genes
Corcoran et al., 2016: Production of individualized V gene databases reveals high levels of immunoglobulin genetic diversity
Similar bioinformatics problems: Alu repeats
Price AL, Eskin E, Pevzner PA. Whole-genome analysis of Alu repeat elements
reveals complex evolutionary history. Genome Res. 2004 Nov;14(11):2245-52.
Inference of D genes
Inference of D genes
47
D gene
V
D gene
J
V
D gene
J
V
D gene
J
V
D gene
J
V
D gene
J
Let’s trim V prefixes and J suffixes
D gene
D gene
D gene
D gene
D gene
D gene
Set S*
D genes are most abundant parts of CDR3s
A k-mer is common if it is presented in 0.1% of CDR3s
Safonova and Pevzner, 2019
Let’s look at the most abundant 15-mer...
Safonova and Pevzner, 2019
Let’s look at the most abundant 15-mer...
IC =
Safonova and Pevzner, 2019
Let’s look at the most abundant 15-mer...
We extend k-mer by a nucleotide if its IC > 0.5
IC =
Safonova and Pevzner, 2019
...and try to reconstruct corresponding D gene
Safonova and Pevzner, 2019
Novel variations of D genes
54
Safonova and Pevzner, Front Immunol, 2019
D1*01
D1*02
Inferred V gene
Novel variations of D genes
55
Safonova and Pevzner, Front Immunol, 2019
D1*01
D1*02
Inferred V gene
Novel variant of
V gene
Heterozygous D genes help to
reconstruct haplotypes of IGH locus
Reconstructing haplotypes
50%
Real V genes:
D1*01
D1*02
Inferred D genes:
Reconstructing haplotypes
V1*1
V1*2
V2*1
V2*2
50%
Real V genes:
D1*01
D1*02
Inferred D genes:
50%
D1*01
D1*02
Reconstructing haplotypes
V1*1
V1*2
V2
V3
33%
Real V genes:
D1*01
D1*02
Inferred D genes:
33%
D1*01
D1*02
66%
66%
Reconstructing haplotypes
V1
V2*1
V2*2
V3
33%
Real V genes:
D1*01
D1*02
Inferred D genes:
33%
D1*01
D1*02
66%
66%
Reconstructing haplotypes
V1*1
V1*2
V1*2D
V2
50%
Real V genes:
D1*01
D1*02
Inferred D genes:
33%
D1*01
D1*02
66%
Reconstructing IG haplotypes on real data
Kirik et al. Data on haplotype-supported immunoglobulin germline gene inference. 2017
Inference of haplotypes
Mosaic deletion patterns of the human antibody heavy chain gene locus shown by Bayesian haplotyping
Chain pairing &
single cell Rep-seq data
RT-PCR linkage
Separating B cells
Single-cell RNA-seq of adaptive immune repertoires
66
McDaniel et al., Nat Protocols, 2016
Single-cell RNA-seq of adaptive immune repertoires
67
McDaniel et al., Nat Protocols, 2016
Single-cell RNA-seq of adaptive immune repertoires
68
McDaniel et al., Nat Protocols, 2016
Single-cell RNA-seq of adaptive immune repertoires
69
McDaniel et al., Nat Protocols, 2016
> 97% precision,
3% of collisions:
single cell barcode corresponds to several cells
The Chromium Single Cell Immune Profiling Solution
Single cell repertoire sequencing
Allelic inclusions in B cells
VDJ recombination randomly selects between IGK and IGL
73
Resulting antibody may be self-reactive
74
In this case, immune system gives B cell producing self-reactive antibody a second chance
Receptor editing process is intended to fix self-reactive chains
75
Newly recombined antibody may be helpful
76
Sometimes receptor editing affects another light chain locus (isotypic inclusion)
77
B cell produces 2 antibodies
Self-reactive chain still works, but its expression is suppressed
78
Receptor editing may also affect alternative allele (allelic inclusion)
79
In the worst case, B cell may produce 6 different chains
80
It is still unknown how many antibodies can be produced by such B cell and how many of them are self-reactive
References
Other immunogenomics directions
Immunogenomics
T cell receptors
HLA / MHC
KIR
Somatic recombination
Cell mediated response
Cancer immunotherapy
B cell activation
Recognition of self and non-self
Killer-cell immunoglobulin-like receptors
T cell receptor
T cell receptor loci
TRB locus on chromosome 7
TRA/TRD locus on chromosome 14
TRG locus on chromosome 7
TCR structure
Thank you!