1 of 87

Immunogenomics

Yana Safonova

University of California San Diego

University of Louisville School of Medicine

2 of 87

Population studies of IG loci

3 of 87

From biological problems to computational challenges

3

VDJ classification problem. Given an antibody generated from a known set of V, D, and J segments, identify what specific V, D, and J segments generated this antibody

4 of 87

From biological problems to computational challenges

4

Model organisms in immunology with still unknown sets of V, D, and J segments

VDJ classification problem. Given an antibody generated from a known set of V, D, and J segments, identify what specific V, D, and J segments generated this antibody

5 of 87

From biological problems to computational challenges

5

VDJ reconstruction problem. Given antibodies generated from an unknown set of V, D, and J segments, reconstruct these sets

VDJ classification problem. Given an antibody generated from a known set of V, D, and J segments, identify what specific V, D, and J segments generated this antibody

6 of 87

VDJ classification problem is solved!

6

IMGT/V-QUEST

Brochet et al, Nucleic Acids Res, 2008

IgBlast

Ye et al, Nucleic Acids Res, 2013

iHMMune-align

Gaeta et al, Bioinformatics, 2007

Antibody graph

Bonissone and Pevzner, RECOMB 2015

VDJ classification problem. Given an antibody generated from a known set of V, D, and J segments, identify what specific V, D, and J segments generated this antibody

includes database of V, D, J segments

7 of 87

VDJ classification problem is solved!

7

VDJ classification problem. Given an antibody generated from a known set of V, D, and J segments, identify what specific V, D, and J segments generated this antibody

Only if VDJ segments do not vary widely between individuals!

8 of 87

How VDJ segments vary across population?

8

VDJ variants problem. Given reference V, D, and J segments and antibody repertoire from an individual, reconstruct how V, D, and J segments in this individual differ from the reference and discover new V, D, and J segments.

9 of 87

Finding novel V segments

9

Germline V segment

10 of 87

Finding novel V segments

10

Germline V segment

✪ ❃ ❇

✪ ❃ ❇

Novel V segments:

11 of 87

Finding novel V segments

11

Germline V segment

✪ ❃ ❇

✪ ❃ ❇

✪ ❃ ❇

✪ ❃ ❇

✪ ❃ ❇

✪ ❃ ❇

Novel V segments:

12 of 87

Finding novel V segments

12

Germline V segment

✪ ❃ ❇

✪ ❃ ❇

✪ ❃ ❇

✪ ❃ ❇

✪ ❃ ❇

✪ ❃ ❇

Novel V segments:

  • Novel genes can be found using both WGS and Rep-seq data

  • Recent studies suggest that composition of IGHV genes is unique for an individual

13 of 87

Chromosome recombination is a source of genetic diversity

V1

V2

V3

V4

V1

V2

V3

V4

14 of 87

Chromosome recombination is a source of genetic diversity

V1

V2

V3

V4

V1

V2

V3

V4

V1

V2

V3

V4

V1

V2

V3

V4

15 of 87

Chromosome recombination is responsible for changing CNVs of existing genes...

V1

V2

V3

V4

V1

V2

V3

V4

V1

V3

V4

V1

V2

V2

V3

Alu

Alu

Alu

Alu

V4

Alu

Alu

Alu

Alu

16 of 87

… and creating novel genes

V1

V2

V3

V4

V1

V2

V3

V4

V1

V4

V1

V2

V3

V3

V2

V2

V3

V4

17 of 87

Chromosome recombination may result in changing IGH structure

V1

V2

D1

V1

V2

V1

Alu

Alu

Alu

D2

Alu

D1

D2

Alu

D2

V1

V2

Alu

Alu

D1

V2

D1

D2

Alu

18 of 87

… and possible VDJ recombinations

V1

V2

Alu

Alu

D1

V2

D1

D2

Alu

19 of 87

Immune genes accumulate many mutations

V1

V2

Alu

Alu

D1

V3

D2

D3

Alu

20 of 87

Immune genes accumulate many mutations

V1

V2

Alu

Alu

D1

V3

D2

D3

Alu

VDJ recombination is no longer able to recombine V3 and D1

21 of 87

Repeat composition of IGH locus

Alu

MIR

LINE1

LINE2

Retrotransposons

retroviral and other LTRs

DNA transposons

medium frequency repetitive sequences

simple repeats

Matsuda et al., J Exp Med, 1998

22 of 87

Mutations in IGHV genes matter!

  • IGHV1-69 is responsible for formation of bnAbs to the influenza A (hemagglutinin)
  • Antibodies without SHMs have limited ability to bind to the antigen!

  • In CDR1 without mutations, Phe 29 is buried in the ‘canonical’ conformation of the CDR1

  • The somatically mutated CDR1 flips the hydrophobic residue Phe 29 out, placing this residue in contact with HA.

Lingwood et al. Structural and genetic basis for development of broadly neutralizing influenza antibodies. Nature. 2012

23 of 87

Mutations in IGHV genes matter!

  • IGHV1-69 is responsible for formation of bnAbs to the influenza A (hemagglutinin)
  • 14 alleles of IGHV1-69 can be differentiated by the presence of either a phenylalanine (F) or leucine (L) at amino acid position 54

  • Two amino acids at the tip of CDR H2, Ile 53 and Phe 54, seem to be an anchor by which germline IGHV1-69 might attach to HA

  • Mutation of these two amino acids to Ala abolished HA germline binding for all VH-1-69 antibodies

Lingwood et al. Structural and genetic basis for development of broadly neutralizing influenza antibodies. Nature. 2012

24 of 87

The loss of binding abilities was shown experimentally!

Lingwood et al. Structural and genetic basis for development of broadly neutralizing influenza antibodies. Nature. 2012

Three types of neutralizing immunoglobulins: grey - wild type, colored - mutant

25 of 87

Mutations in IGHV genes matter!

  • IGHV1-69 is responsible for formation of bnAbs to the influenza A hemagglutinin
  • 14 alleles of IGHV1-69 can be differentiated by the presence of either a phenylalanine (F) or leucine (L) at amino acid position 54
  • Replacement of Phe54 by Leu54 has been shown to dramatically reduce binding affinities

Avnir et al, Scientific Reports, 2016

26 of 87

Other known associations

  • Thomson CA, Bryson S, McLean GR, Creagh AL, Pai EF, Schrader JW. Germline V-genes sculpt the binding site of a family of antibodies neutralizing human cytomegalovirus. EMBO J. 2008 Oct 8;27(19):2592-602.
  • Parks T, Mirabel MM, Kado J, Auckland K, Nowak J, Rautanen A, Mentzer AJ, Marijon E, Jouven X, Perman ML, Cua T, Kauwe JK, Allen JB, Taylor H, Robson KJ, Deane CM, Steer AC, Hill AVS; Pacific Islands Rheumatic Heart Disease Genetics Network. Association between a common immunoglobulin heavy chain allele and rheumatic heart disease risk in Oceania. Nat Commun. 2017 May 11;8:14946.
  • Bryson S, et al. Structures of Preferred Human IgV Genes-Based Protective Antibodies Identify How Conserved Residues Contact Diverse Antigens and Assign Source of Specificity to CDR3 Loop Variation. J Immunol. 2016;196:4723–4730.

27 of 87

Population-based paradigm of vaccination

Watson et al, Trends in Immunology, 2017

28 of 87

Approaches for finding IGH variants

Assembly of IGH locus from WGS data

(+) easy to find non-mutated V genes

(–) assembly is very challenging due to repeats

(–) it is unclear how to identify expressed genes

Analysis of Rep-seq data (based on RNA)

(+) easy to identify expressed genes

(–) sequences might contain SHMs and errors

(–) even for non-mutated sequences, inference of novel genes can be tricky

29 of 87

Assembly of IGH locus

  • IGH locus is a ~1.25 Mbp long region located on Chr 14

  • The size of the entire human genome is ~3Gb

  • There are two ways to assemble IGH locus:
    • Assemble to the whole genome and identify contigs corresponding to IGH locus
    • Develop a panel for IGH capturing and assemble IGH locus from target sequencing data

What are pros / cons of these approaches?

30 of 87

Another attempt to assemble IGH locus

Watson et al, The American Journal of Human Genetics, 2013

31 of 87

Summary of SVs in IGH locus

32 of 87

De novo inference of

immunoglobulin genes using Rep-seq

33 of 87

Antibodies are not as versatile as we think

33

34 of 87

Most bnAbs to influenza are made of IGHV1-69

34

V

D

J

IGHV1-69

35 of 87

IGHV1-69 has 14 allelic variations!

35

V

D

IGHV1-69

IGHV1-69*01

IGHV1-69*02

IGHV1-69*03

IGHV1-69*14

...

54: Phe

54: Leu

54: Phe

54: Phe

IGHV1-69*04

54: Leu

J

36 of 87

Only 50% of alleles are specific to influenza

36

V

D

J

IGHV1-69

IGHV1-69*01

IGHV1-69*02

IGHV1-69*03

IGHV1-69*14

...

54: Phe

54: Leu

54: Phe

54: Phe

IGHV1-69*04

54: Leu

Successful binding to hemagglutinin

Loss of binding properties

37 of 87

Main idea

  • Let’s assume that we found some antibodies of interest
    • E.g., antibody specific to some antigen

  • We want to characterize these antibodies

  • What kind of traits we can compute for a given antibody sequence?

38 of 87

Main idea

  • Let’s assume that we found some antibodies of interest
    • E.g., antibody specific to some antigen

  • We want to characterize these antibodies

  • What kind of traits we can compute for a given antibody sequence?
    • V, D, J genes participating in VDJ recombination
    • VDJ scenario including VD and DJ insertions

39 of 87

Computational challenges

  • V, D, and J genes are VERY close to each other
    • When we look at alignment to V, D, and J genes, it typically shows us several hits
    • It is unclear how to choose between them
    • Even if alignment tool found only one gene as the perfect match, the gene can have many alleles
  • Some genes can be unknown
    • IMGT database is not specific to some population
    • Currently there are no comprehensive population studies of diversity of immunoglobulin loci

40 of 87

Alignment shows potential variations

41 of 87

Unknown genes can also be similar

Closest known gene

50%

Real V genes:

42 of 87

Inference of novel immunoglobulin genes

Problem 1: Given a database of known V, D, J genes and a Rep-seq sample, compute individual V, D, and J genes

Problem 2: Given a database of known V, D, J genes and a Rep-seq sample, compute individual alleles of V, D, and J genes

43 of 87

Inference of novel alleles of V & J genes

TigGER tool: Gadala-Maria D, Yaari G, Uduman M, Kleinstein SH. Automated analysis of high-throughput B-cell sequencing data reveals a high frequency of novel immunoglobulin V gene segment alleles. Proc Natl Acad Sci U S A. 2015 Feb 24;112(8):E862-70.

44 of 87

Inference of novel V and J genes

Corcoran et al., 2016: Production of individualized V gene databases reveals high levels of immunoglobulin genetic diversity

45 of 87

Similar bioinformatics problems: Alu repeats

Price AL, Eskin E, Pevzner PA. Whole-genome analysis of Alu repeat elements

reveals complex evolutionary history. Genome Res. 2004 Nov;14(11):2245-52.

46 of 87

Inference of D genes

  • D genes are much shorter than V and J genes
  • Their alignment is more complicated problem compared to V and J genes

47 of 87

Inference of D genes

47

D gene

V

D gene

J

V

D gene

J

V

D gene

J

V

D gene

J

V

D gene

J

48 of 87

Let’s trim V prefixes and J suffixes

D gene

D gene

D gene

D gene

D gene

D gene

Set S*

49 of 87

D genes are most abundant parts of CDR3s

  • Known k-mers belong to known D genes

  • Allelic k-mers belong to known variations of D genes

  • Partially known k-mers contain known k–2-kmers

  • All other k-mers are foreign

A k-mer is common if it is presented in 0.1% of CDR3s

Safonova and Pevzner, 2019

50 of 87

Let’s look at the most abundant 15-mer...

Safonova and Pevzner, 2019

51 of 87

Let’s look at the most abundant 15-mer...

IC =

Safonova and Pevzner, 2019

52 of 87

Let’s look at the most abundant 15-mer...

We extend k-mer by a nucleotide if its IC > 0.5

IC =

Safonova and Pevzner, 2019

53 of 87

...and try to reconstruct corresponding D gene

Safonova and Pevzner, 2019

54 of 87

Novel variations of D genes

54

Safonova and Pevzner, Front Immunol, 2019

D1*01

D1*02

Inferred V gene

55 of 87

Novel variations of D genes

55

Safonova and Pevzner, Front Immunol, 2019

D1*01

D1*02

Inferred V gene

Novel variant of

V gene

Heterozygous D genes help to

reconstruct haplotypes of IGH locus

56 of 87

Reconstructing haplotypes

50%

Real V genes:

D1*01

D1*02

Inferred D genes:

57 of 87

Reconstructing haplotypes

V1*1

V1*2

V2*1

V2*2

50%

Real V genes:

D1*01

D1*02

Inferred D genes:

50%

D1*01

D1*02

58 of 87

Reconstructing haplotypes

V1*1

V1*2

V2

V3

33%

Real V genes:

D1*01

D1*02

Inferred D genes:

33%

D1*01

D1*02

66%

66%

59 of 87

Reconstructing haplotypes

V1

V2*1

V2*2

V3

33%

Real V genes:

D1*01

D1*02

Inferred D genes:

33%

D1*01

D1*02

66%

66%

60 of 87

Reconstructing haplotypes

V1*1

V1*2

V1*2D

V2

50%

Real V genes:

D1*01

D1*02

Inferred D genes:

33%

D1*01

D1*02

66%

61 of 87

Reconstructing IG haplotypes on real data

Kirik et al. Data on haplotype-supported immunoglobulin germline gene inference. 2017

62 of 87

Inference of haplotypes

Mosaic deletion patterns of the human antibody heavy chain gene locus shown by Bayesian haplotyping

63 of 87

Chain pairing &

single cell Rep-seq data

64 of 87

RT-PCR linkage

65 of 87

Separating B cells

66 of 87

Single-cell RNA-seq of adaptive immune repertoires

66

McDaniel et al., Nat Protocols, 2016

67 of 87

Single-cell RNA-seq of adaptive immune repertoires

67

McDaniel et al., Nat Protocols, 2016

68 of 87

Single-cell RNA-seq of adaptive immune repertoires

68

McDaniel et al., Nat Protocols, 2016

69 of 87

Single-cell RNA-seq of adaptive immune repertoires

69

McDaniel et al., Nat Protocols, 2016

> 97% precision,

3% of collisions:

single cell barcode corresponds to several cells

70 of 87

The Chromium Single Cell Immune Profiling Solution

71 of 87

Single cell repertoire sequencing

  • Preserves information about chain pairing (HC + LC)

  • Inherits all shortcomings of molecular barcoding

  • Has small output (~100 k cells)

  • Can be very tricky to analyze because of allelic inclusion in B cells

72 of 87

Allelic inclusions in B cells

73 of 87

VDJ recombination randomly selects between IGK and IGL

73

74 of 87

Resulting antibody may be self-reactive

74

In this case, immune system gives B cell producing self-reactive antibody a second chance

75 of 87

Receptor editing process is intended to fix self-reactive chains

75

76 of 87

Newly recombined antibody may be helpful

76

77 of 87

Sometimes receptor editing affects another light chain locus (isotypic inclusion)

77

B cell produces 2 antibodies

78 of 87

Self-reactive chain still works, but its expression is suppressed

78

79 of 87

Receptor editing may also affect alternative allele (allelic inclusion)

79

80 of 87

In the worst case, B cell may produce 6 different chains

80

It is still unknown how many antibodies can be produced by such B cell and how many of them are self-reactive

81 of 87

References

  • Liu S, Velez MG, Humann J, Rowland S, Conrad FJ, Halverson R, Torres RM, Pelanda R. Receptor editing can lead to allelic inclusion and development of B cells that retain antibodies reacting with high avidity autoantigens. J Immunol. 2005 Oct 15;175(8):5067-76.

  • Casellas R, Zhang Q, Zheng NY, Mathias MD, Smith K, Wilson PC. Igkappa allelic inclusion is a consequence of receptor editing. J Exp Med. 2007 Jan 22;204(1):153-60.

82 of 87

Other immunogenomics directions

83 of 87

Immunogenomics

T cell receptors

HLA / MHC

KIR

Somatic recombination

Cell mediated response

Cancer immunotherapy

B cell activation

Recognition of self and non-self

Killer-cell immunoglobulin-like receptors

84 of 87

T cell receptor

  • VDJ recombination
  • Non-genomic insertions
  • No SHMs!

85 of 87

T cell receptor loci

TRB locus on chromosome 7

TRA/TRD locus on chromosome 14

TRG locus on chromosome 7

86 of 87

TCR structure

87 of 87

Thank you!