1 of 75

Clonal analysis of

antibody repertoires

Yana Safonova

Data Science Postdoctoral Fellow

2 of 75

Antibodies are agents of the adaptive immune system

  • Antibodies are proteins that bind to an antigen and cause its neutralization

  • Antibodies are not encoded in the genome directly, but present a result of somatic genomic recombination

V

V

V

D

D

D

J

J

D

chr 14

3 of 75

Antibodies are agents of the adaptive immune system

  • Diversity of antibody genes is extremely high
  • Set of produced antibodies (antibody repertoire) is unique for an individual
  • Antibodies are proteins that bind to an antigen and cause its neutralization

  • Antibodies are not encoded in the genome directly, but present a result of somatic genomic recombination

V

V

V

D

D

D

J

J

D

chr 14

J

D

V

V

J

antibody gene

4 of 75

CDRs represent antigen-binding sites

J

D

V

CDR1

CDR2

CDR3

FR1

FR2

FR3

FR4

5 of 75

Antibody repertoire is unique for individual

5

Genome

6 of 75

Antibody repertoire is unique for individual

6

Genome

Microbiome

7 of 75

Antibody repertoire is unique for individual

7

Genome

Microbiome

Immunome

8 of 75

Antibodies are not as versatile as we think

8

9 of 75

Most bnAbs to influenza are made of IGHV1-69

9

V

D

J

IGHV1-69

10 of 75

IGHV1-69 has 14 allelic variations!

10

V

D

IGHV1-69

IGHV1-69*01

IGHV1-69*02

IGHV1-69*03

IGHV1-69*14

...

55: Phe

55: Leu

55: Phe

55: Phe

IGHV1-69*04

55: Leu

J

11 of 75

Only 50% of alleles are specific to influenza

11

V

D

J

IGHV1-69

IGHV1-69*01

IGHV1-69*02

IGHV1-69*03

IGHV1-69*14

...

55: Phe

55: Leu

55: Phe

55: Phe

IGHV1-69*04

55: Leu

Successful binding to hemagglutinin

Loss of binding properties

12 of 75

Importance of repertoire studies

Known associations

12

Influenza + IGHV1-69,

Lingwood et al., 2012

Cytomegalovirus + IGHV3-30, IGKV3-11

Thomson et al., 2008

Rheumatic heart disease + IGHV4-61

Parks et al., 2017

13 of 75

Antibody repertoire + GWAS

Known associations

Future directions

13

Influenza + IGHV1-69,

Lingwood et al., 2012

Cytomegalovirus + IGHV3-30, IGKV3-11

Thomson et al., 2008

Rheumatic heart disease + IGHV4-61

Parks et al., 2017

allergy

14 of 75

Standard vaccination

14

15 of 75

Standard vaccination

15

  • None of existing vaccines is 100% effective
  • This might be caused by genomic variations of immunoglobulin loci

0

10

20

30

40

50

60

70

16 of 75

Personalized future of vaccinations

16

17 of 75

Personalized future of vaccinations

17

18 of 75

Personalized future of vaccinations

18

19 of 75

Personalized future of vaccinations

19

20 of 75

State-of-the-art vaccination studies

20

21 of 75

State-of-the-art vaccination studies

21

Collecting antibody repertoire

22 of 75

State-of-the-art vaccination studies

22

Collecting antibody repertoire

Finding traits separating the positive and negative groups

IGHV1-69*01

55, Phe

IGHV1-69*01

55, Phe

IGHV1-69*02

55, Leu

IGHV1-69*01

55, Phe

IGHV1-69*04

55, Leu

23 of 75

Antibody repertoire sequencing (Rep-seq)

23

V

D

J

Length: ~360 nt

Left read

Right read

VDJ from DNA or RNA

Error-prone immunosequencing reads

Turchaninova et al, Nat Protocols, 2016

24 of 75

Repertoire construction problem

24

× 7

× 3

× 2

× 2

Antibody repertoire

× 1

Rep-seq reads

Antibody repertoire is the set of

antibody sequences with their abundances

pRESTO

MiXCR

IgRepertoireConstructor

Vander Heiden et al., 2014

Bolotin et al., 2015

Safonova et al., 2015

25 of 75

Quality assessment of antibody repertoires

25

Constructed repertoire

Reference repertoire

Precision:

Sensitivity:

26 of 75

Cluster size threshold: 1

26

1

0

0.5

0

1

0.5

1

Precision

Sensitivity

Constructed repertoire

Reference repertoire

Precision:

Sensitivity:

= 6 / 10

= 6 / 9

27 of 75

Cluster size threshold: 3

27

1

0

0.5

0

1

0.5

1

3

Precision

Sensitivity

Constructed repertoire

Reference repertoire

Precision:

Sensitivity:

= 5 / 8

= 5 / 9

28 of 75

Cluster size threshold: 5

28

1

0

0.5

0

1

0.5

1

3

5

Precision

Sensitivity

Constructed repertoire

Reference repertoire

Precision:

Sensitivity:

= 4 / 5

= 4 / 9

29 of 75

Cluster size threshold: 10, 100

29

1

0

0.5

0

1

0.5

1

3

5

10

100

Precision

Sensitivity

Constructed repertoire

Reference repertoire

Precision:

Sensitivity:

= 2 / 2

= 2 / 9

Optimal threshold

30 of 75

Quality assessment of repertoires

Repertoire of a healthy individual

Repertoire of a vaccinated individual

Shlemov et al., 2017

30

31 of 75

Antibodies are subjects of fast evolution

32 of 75

Immune system mutates and amplifies a binding antibody

Mutation rate in antibody genes is 3-4 order of magnitude higher than in other genome

33 of 75

One antibody = one antigen

34 of 75

Antibody repertoire is a set of clonal lineages

35 of 75

Antibody repertoire is a set of unknown clonal lineages

36 of 75

Quality assessment of real repertoires

Repertoire of a healthy individual:

  • High VDJ diversity
  • Small clonal lineages

Repertoire of a vaccinated individual:

  • Low VDJ diversity
  • Large clonal lineages

Shlemov et al., 2017

36

37 of 75

Highly abundant antibodies create artificial diversity

37

38 of 75

Highly abundant antibodies create artificial diversity

38

39 of 75

Highly abundant antibodies create artificial diversity

39

B (3)

A (1)

C (1)

D (1)

E (1)

A

B

C

D

E

1

1

2

1

B

B

40 of 75

Hamming graph reveals similarities between reads

40

B (3)

A (1)

C (1)

D (1)

E (1)

A

B

C

D

E

1

1

2

1

B

B

2

3

3

2

2

3

41 of 75

MST reveals artificial antibodies

41

B (3)

A (1)

C (1)

D (1)

E (1)

A

B

C

D

E

1

1

2

1

B

B

2

3

3

2

2

3

42 of 75

MST reveals artificial antibodies

42

B (3)

A (1)

C (1)

D (1)

E (1)

A

B

C

D

E

1

1

2

1

B

B

43 of 75

MST reveals artificial antibodies

43

B (3)

A

B

C

D

E

B

B

44 of 75

Problem 1&2: simultaneous error correction and clonal reconstruction

44

Constructing similarity graph on distinct antibody sequences & decomposing into clonal lineages

Maximum parsimony phylogenetic tree: evolutionary history with the minimum number of changes

Cleaning sequencing errors and other sample preparation artifacts

A

B

C

D

E

B

A

C

D

E

1

1

2

1

B

Finding MST in the constructed graph

Removing leaves in the constructed tree

45 of 75

Choice of MST is ambiguous

45

Edges with the same weight lead to many minimum spanning trees

>1k

<10

10–100

100–1k

Abundances

46 of 75

Choice of MST is ambiguous

46

Different MSTs have different number of leaves

>1k

<10

10–100

100–1k

Abundances

47 of 75

Choice of MST is ambiguous

47

>1k

<10

10–100

100–1k

Abundances

6 lowly abundant sequences were not removed from the first MST

48 of 75

Simple MST vs max leaf MST

48

  • A healthy donor vaccinated with flu
  • 226,164 distinct reads
  • 1082 clonal lineages
  • The largest lineage consists of 123,437 reads before cleaning

49 of 75

Error correction + clonal reconstruction

49

Constructing similarity graph on distinct antibody sequences & decomposing into clonal lineages

Maximum parsimony phylogenetic tree: evolutionary history with the minimum number of changes

Cleaning sequencing errors and other sample preparation artifacts

A

B

C

D

E

B

A

C

D

E

1

1

2

1

B

Finding the max leaf MST in the constructed graph

Removing leaves in the constructed tree

50 of 75

From clonal tree to clonal graph

  1. Translate nucleotide sequences in the cleaned MST
  2. Collapse identical AA sequences
  3. Construct a clonal graph on AA sequences
  4. AA sequences (v, w) are adjacent if nucleotide sequences corresponding to them were adjacent in the MST

50

51 of 75

Usage of V genes

51

Usage of gene V = # reads aligned to gene V / total # reads * 100

Reflects high abundant clonal lineages

Does not reflect VDJ recombinations

Four donors vaccinated with flu

52 of 75

Clonal usage of V genes

52

Four donors vaccinated with flu

Clonal usage of gene V = # lineages derived from gene V / total # lineages * 100

Reflect VDJ recombinations

Makes differences between individuals more visible

53 of 75

Clonal usage of V genes

53

Four donors vaccinated with flu

Clonal usage of gene V = # lineages derived from gene V / total # lineages * 100

Reflect VDJ recombinations

Makes differences between individuals more visible

Influenza

+

IGHV1-69,

Lingwood et al., 2012

54 of 75

Alignment of naive Abs to IGHV1-69

54

ACGCGATCGATCGATCGATC

ACGCGATCGATCGATCGATC

ACGCGATCGATCGATCGATC

ACGCGGTCGATCGATCGATC

ACGCGGTCGATCGATCGATC

ACGCGGTCGATCGATCGATC

ACGCGATCGATGGATCGATC

6 12

Reads

IGHV1-69

6 12

100%

50%

0%

50%

100%

  • Select mismatches with high frequency
  • Compare nucleotides at the selected positions in all individuals

55 of 75

Alignment of naive Abs to IGHV1-69

55

0%

50%

100%

  • Find amino acids corresponding to the selected mismatches
  • AP - amino acid position
  • NP - nucleotide position

AP

50

55

NP

148

149

150

AA

163

164

165

AA

Dnr4

G

G

G

G

T

T

T

F

Dnr5

A

G

G

R

C

T

T

L

Dnr6

G

G

G

G

C

T

T

T

L

F

Dnr8

G

G

G

G

T

T

T

F

AP

57

74

NP

169

170

171

AA

220

221

222

AA

Dnr4

A

C

A

T

A

G

A

A

K

E

Dnr5

A

T

A

I

A

A

A

K

Dnr6

A

C

T

A

T

I

A

G

A

A

K

E

Dnr8

A

C

A

T

G

A

A

E

56 of 75

Alignment of naive Abs to IGHV1-69

56

0%

50%

100%

  • Clonal usage of IGHV1-69 is consistent with F / L alleles
  • We detected three more mismatches that can be associated with flu response

AP

50

55

NP

148

149

150

AA

163

164

165

AA

Dnr4

G

G

G

G

T

T

T

F

Dnr5

A

G

G

R

C

T

T

L

Dnr6

G

G

G

G

C

T

T

T

L

F

Dnr8

G

G

G

G

T

T

T

F

AP

57

74

NP

169

170

171

AA

220

221

222

AA

Dnr4

A

C

A

T

A

G

A

A

K

E

Dnr5

A

T

A

I

A

A

A

K

Dnr6

A

C

T

A

T

I

A

G

A

A

K

E

Dnr8

A

C

A

T

G

A

A

E

57 of 75

SHMs in in position 55, IGHV1-69

57

F / L / others

58 of 75

SHMs in position 74, IGHV1-69

58

K / E / others

59 of 75

IGHV1-69: summary

  • 50 is conserved in all trees
  • 55 is conserved in most trees
  • 57 is conserved in most trees
  • 74 is not conserved

59

AP

50

55

NP

148

149

150

AA

163

164

165

AA

Dnr4

G

G

G

G

T

T

T

F

Dnr5

A

G

G

R

C

T

T

L

Dnr6

G

G

G

G

C

T

T

T

L

F

Dnr8

G

G

G

G

T

T

T

F

AP

57

74

NP

169

170

171

AA

220

221

222

AA

Dnr4

A

C

A

T

A

G

A

A

K

E

Dnr5

A

T

A

I

A

A

A

K

Dnr6

A

C

T

A

T

I

A

G

A

A

K

E

Dnr8

A

C

A

T

G

A

A

E

60 of 75

Other genes with differences in clonal usage

  • The only meaningful nucleotide position is 216
  • It corresponds to AA position 72
  • R is preserved in all clonal graphs
  • Positions 164 and 234 are conservative in all clonal graphs specific to flu
  • Positions 164 and 234 are also conservative in mucosal repertoires

60

61 of 75

Positional clonal graph

61

  • Select a clonal graph
  • Select a position in the amino acid sequence
  • Color vertices of the graph according to amino acids at the selected position

Position 57 in CDR2

62 of 75

Positional clonal graph

62

  • Select a clonal graph
  • Select a position in the amino acid sequence
  • Color vertices of the graph according to amino acids at the selected position

Position 34 in FR2

63 of 75

Mutability plot

  1. Compute SHM graph
  2. Remove low abundant leaves in SHM graph
  3. Recount SHMs

63

64 of 75

Finding Ag binding sites

Positions with high mutability likely correspond to antigen binding sites

SHM analysis helps to reveal non-canonical binding sites

In addition to CDR 1 - 3, FR3 may also contain Ag binding site called CDR4

64

CDR1

CDR2

CDR3

CDR4?

65 of 75

CDRs ≠ antigen-binding sites

Sela-Culang et al., Frontiers Immunol, 2013

66 of 75

CDRs ≠ antigen-binding sites

Sela-Culang et al., Frontiers Immunol, 2013

67 of 75

Antibody humanization

67

Santos et al., Braz J Pharm Sci, 2018

H1

H2

H3

H1

H2

H3

68 of 75

Clonal analysis + humanization

Hypothetical pipeline:

  • Identify important positions using clonal analysis:
    • Ag binding sites
    • Positions that are important for Ab stability
  • Report set of pairs (position, amino acid) that should be preserved in the humanized antibody

68

69 of 75

Algorithm limitations

  • Current version works with mismatches only and ignores indels
    • Affects rabbit and bird repertoires

  • Sometimes clonal lineage assignment procedure combines similar, but non-related antibodies
    • IGHV6-1 cluster
    • Short CDR3s in rodent species

69

70 of 75

Biological follow-up projects

Estimating efficacy of vaccines in cow population

  • Rep-seq of 200 cattle individuals
  • Each individual was vaccinated with 5 anti-viral components
  • For each individual, four time-points are available

Population-wide study of human antibody repertoires

  • Rep-seq and WGS of 114 healthy human individuals
  • For each individual, all isotypes are available: IgM, IgG, IgD, IgA, and IgE
  • For each individual, metainformation was collected: ethnicity, age, sex, blood type

70

71 of 75

Cattle antibodies with ultra-long CDR3s

Wang et al., Cell, 2013

72 of 75

Cattle IGH locus

Stanfield et al., Adv Immunol, 2018

73 of 75

IGHD8-2

ORF = 3

GTAGTTGTCCTGATGGTTATAGTTATGGTTATGGTTGTGGTTATGGTTATGGTTGTAGTGGTTATGATTGTTATGGTTATGGTGGTTATGGTGGTTATGGTGGTTATGGTTATAGTAGTTATAGTTATAGTTATACTTACGAATATAC

ORF = 2

GTAGTTGTCCTGATGGTTATAGTTATGGTTATGGTTGTGGTTATGGTTATGGTTGTAGTGGTTATGATTGTTATGGTTATGGTGGTTATGGTGGTTATGGTGGTTATGGTTATAGTAGTTATAGTTATAGTTATACTTACGAATATAC

ORF = 1

GTAGTTGTCCTGATGGTTATAGTTATGGTTATGGTTGTGGTTATGGTTATGGTTGTAGTGGTTATGATTGTTATGGTTATGGTGGTTATGGTGGTTATGGTGGTTATGGTTATAGTAGTTATAGTTATAGTTATACTTACGAATATAC

Cys

TGT

TGC

AGC

CGC

GGC

TAC

TCC

TTC

TGA

TGG

AGT

CGT

GGT

TAT

TCT

TTT

74 of 75

Cys-induced diversity of cattle antibodies

Wang et al., Cell, 2013

75 of 75

Acknowledges

UCSD Data Science postdoctoral fellowships

AAI fellowship for computational immunologists

UCSD

Vinnu Bhardwaj

Andrey Bzikadze

Chao Zhang

Massimo Franceschetti

Siavash Mirarab

Ramesh Rao

USDA

Tim Smith

Sung Bong Shin

Digital Proteomics

Stefano Bonissone

Natalie Castellana

Stanford U

Scott Boyd

U of Louisville

William Gibson

Justin Kos

Pavel Pevzner

UCSD

Corey Watson

U of Louisville