1 of 61

Immunoinformatics

Yana Safonova

Center for Algorithmic Biotechnology

Saint-Petersburg State University

2 of 61

Innate & adaptive immune system

2

cell-mediated immune response

humoral immune response

3 of 61

3

Antibody repertoires

Human immune system generates 1012 - 1018 distinct antibodies

Analysis of concentrations of circulating antibodies (antibody repertoire) is a fundamental problem in immunology

While generation of antibody repertoires provides a new avenue for antibody drug development, it remains unclear how to construct antibody repertoires from massive NGS data

4 of 61

Antibody somatic recombination

Antibodies are produced by B-cells, each with unique genome:

4

5 of 61

Antibody somatic recombination

Antibodies are produced by B-cells, each with unique genome:

5

6 of 61

Antibody somatic recombination

Antibodies are produced by B-cells, each with unique genome:

6

7 of 61

Antibody somatic recombination

Antibodies are produced by B-cells, each with unique genome:

7

8 of 61

Antibody somatic recombination

Antibodies are produced by B-cells, each with unique genome:

8

9 of 61

Antibody somatic recombination

Antibodies are produced by B-cells, each with unique genome:

9

10 of 61

Antibody somatic recombination

Antibodies are produced by B-cells, each with unique genome:

10

11 of 61

Antibody somatic recombination

Antibodies are produced by B-cells, each with unique genome:

11

12 of 61

Antibody somatic recombination

Antibodies are produced by B-cells, each with unique genome:

12

13 of 61

Antibody somatic recombination

Antibodies are produced by B-cells, each with unique genome:

13

14 of 61

Antibody somatic recombination

Antibodies are produced by B-cells, each with unique genome:

14

15 of 61

Antibody somatic

recombination

Antibodies are produced by B-cells, each with unique genome:

15

16 of 61

Antibody somatic recombination

Somatic recombination results in unique immunoglobulins genes encoding amino acid sequence of antibodies

16

17 of 61

Antibody somatic recombination

An antibody recognizes a foreign agent (antigen) using its antigen-binding site

17

18 of 61

Antibody somatic recombination

The most diverged part of antigen-binding site is CDR3, or complementarity determining region

18

CDR3

19 of 61

Antibody somatic recombination

Further optimization of antibody affinity is achieved through extensive mutations referred as somatic hypermutations

19

CDR3

20 of 61

Antibody somatic recombination

20

CDR3

21 of 61

Antibody somatic recombination

21

22 of 61

Antibody somatic recombination

22

CDR3

CDR2

CDR1

23 of 61

Three approaches to clustering antibodies

23

V(D)J classification:

what are V, D, and J segments of an antibody?

CDR3 classification:

what is the CDR3 region of an antibody?

Repertoire сonstruction: what is an antibody?

24 of 61

V(D)J classification

decomposes each read to its V, D, and J gene segments

24

IMGT/V-QUEST

Brochet et al, Nucleic Acids Res, 2008

IgBlast

Ye et al, Nucleic Acids Res, 2013

iHMMune-align

Gaeta et al, Bioinformatics, 2007

Antibody graph

Bonissone and Pevzner, RECOMB 2015

V gene segments

D gene segments

J gene segments

25 of 61

V(D)J classification

decomposes each read to its V, D, and J gene segments

25

A

B

C

IMGT/V-QUEST

Brochet et al, Nucleic Acids Res, 2008

IgBlast

Ye et al, Nucleic Acids Res, 2013

iHMMune-align

Gaeta et al, Bioinformatics, 2007

Antibody graph

Bonissone and Pevzner, RECOMB 2015

V gene segments

D gene segments

J gene segments

26 of 61

CDR3 classification

classifies each read according to CDR3 (~30 nt length)

26

Robins et al, Sci Transl Med, 2009

Weinstein et al, Science, 2009

Warren et al, Genome Res, 2011

Bolotin et al, Eur J Immunol, 2012

27 of 61

CDR3 classification

classifies each read according to CDR3 (~30 nt length)

27

A1

B

C

A2

Robins et al, Sci Transl Med, 2009

Weinstein et al, Science, 2009

Warren et al, Genome Res, 2011

Bolotin et al, Eur J Immunol, 2012

28 of 61

Full length antibody classification

(repertoire construction)

takes into account the entire variable region of antibody

MiXCR, IMSEQ, IgRepertoireConstructor tools

28

29 of 61

Full length antibody classification

(repertoire construction)

takes into account the entire variable region of antibody

MiXCR, IMSEQ, IgRepertoireConstructor tools

29

A11

B

A2

A12

C1

C2

30 of 61

Why repertoire reconstruction is important?

Repertoire construction is a prerequisite for analysis of antibody evolution

VDJ and CDR3 classification are not sufficient for understanding variability of antibody

Many antibodies share the same CDR3

30

31 of 61

Sequencing of antibody repertoire

31

Roche 454

VDJ classification only

Illumina MiSeq

VDJ, CDR3 classifications and repertoire reconstruction

Illumina HiSeq

CDR3 classification only

low coverage

high coverage

high coverage

low accuracy

high accuracy

high accuracy

long reads

short reads

long reads

32 of 61

Sequencing of antibody repertoire

32

Construction of antibody repertoire from NGS data and its analysis are important steps in design of antibody drugs and clinical studies

33 of 61

IgRepertoireConstructor

33

Natural variations look like sequencing errors in Ig-seq reads

Our tool reconstructs full-length antibody clones from Illumina reads. It constructs graph showing similarity between Ig-seq reads and identifies original antibody clones using clustering techniques

Works even on polyclonal datasets without barcoding

Shows high accuracy compared to existing tools

Will be available at

Illumina BaseSpace

34 of 61

Hamming graph for the real antibody repertoire

6 antibodies with large abundances and a number of singleton antibodies

34

35 of 61

Each antibody form a dense subgraph in the Hamming graph (distance = 3)

35

Reads derived from the same antibody differ from

each other due to sequencing errors

36 of 61

How does the Hamming graph look like?

107 vertices, 1426 edges

36

37 of 61

Actually, the real Hamming graph is uncolored

Problem: extract the original dense subgraphs of the Hamming graph

37

38 of 61

Results of repertoire construction

3.1M reads → 2.32M antibodies

2.32M = 2.27M trivial + 0.05M non-trivial

38

targets of biomedical follow-up

Multilayer MS search identifies ≈ 22% of spectra at 1% FDR

compared to ≈ 6% at 2% FDR in Cheung et al, Nat Biotech, 2012

target of repertoire analysis:

50,000 non trivial clusters

206 with abundance > 500

largest cluster size is 33,000

39 of 61

Secondary diversification of B-cells

After successful binding with antigen B-cell has undergone somatic hypermutagenesis and clonal expansion

As a result, B-cells originated from the same clone form families

39

40 of 61

Classification of B-cells

Limon and Fruman. Akt and mTOR in B cell activation and differentiation. Front. Immunol. 2012

40

low SHM rate

high expression

short life cycle

high SHM rate

high expression

long life cycle

high SHM rate

medium expression

long life cycle

no SHMs

low expression

questionable life cycle

41 of 61

Evolutionary analysis of Ig repertoire

B-cell family can be presented as clonal tree

Intermediate clones can be both presented and skipped in repertoire

41

42 of 61

Evolutionary analysis of Ig repertoire

Standard approaches do not work here since they expect that all species are located in leafs

42

43 of 61

AntEvolo: basic principles

43

44 of 61

AntEvolo: basic principles

44

45 of 61

AntEvolo: basic principles

45

46 of 61

Results - clear situation

46

Antibody sequences that share CDR3 can be unambiguously presented as a clonal tree

47 of 61

Result - unclear situation

47

This graph can not be simplified without additional information

48 of 61

Let’s try to untangle it!

TGTGCGGGGGGTAGCAGTGGCTGGATTGACTACTGG

TGTGCGAGGGGCGGCAGCAGCTGGTTTGACTACTGG

TGTGCGAGAGATAGCAGTGGCCGGTTTGACTACTGG

TGTGCGAGAGACGCTAGTGGCCCCTTTGACTACTGG

AGTGGCT

TAGCAGTGGCTGG

TAGCAG

48

V

J

IGHD5-12*01

IGHD6-19*01

IGHD6-6*01

Best hits computed by IgBLast

49 of 61

Let’s try to untangle it!

TGTGCGGGGGGTAGCAGTGGCTGGATTGACTACTGG

TGTGCGAGGGGCGGCAGCAGCTGGTTTGACTACTGG

TGTGCGAGAGATAGCAGTGGCCGGTTTGACTACTGG

TGTGCGAGAGACGCTAGTGGCCCCTTTGACTACTGG

AGTGGCT

TAGCAGTGGCTGG

TAGCAG

49

V

J

IGHD5-12*01

IGHD6-19*01

IGHD6-6*01

D

it might be palindrome

50 of 61

Let’s try to untangle it!

TGTGCGGGGGGTAGCAGTGGCTGGATTGACTACTGG

TGTGCGAGGGGCGGCAGCAGCTGGTTTGACTACTGG

TGTGCGAGAGATAGCAGTGGCCGGTTTGACTACTGG

TGTGCGAGAGACGCTAGTGGCCCCTTTGACTACTGG

AGTGGCT

TAGCAGTGGCTGG

TAGCAG

50

V

J

IGHD5-12*01

IGHD6-19*01

IGHD6-6*01

D

51 of 61

Let’s try to untangle it!

TGTGCGGGGGGTAGCAGTGGCTGGATTGACTACTGG

TGTGCGAGGGGCGGCAGCAGCTGGTTTGACTACTGG

TGTGCGAGAGATAGCAGTGGCCGGTTTGACTACTGG

TGTGCGAGAGACGCTAGTGGCCCCTTTGACTACTGG

AGTGGCT

TAGCAGTGGCTGG

TAGCAG

51

V

J

IGHD5-12*01

IGHD6-19*01

IGHD6-6*01

D

52 of 61

Let’s try to untangle it!

TGTGCGGGGGGTAGCAGTGGCTGGATTGACTACTGG

TGTGCGAGGGGCGGCAGCAGCTGGTTTGACTACTGG

TGTGCGAGAGATAGCAGTGGCCGGTTTGACTACTGG

TGTGCGAGAGACGCTAGTGGCCCCTTTGACTACTGG

AGTGGCT

TAGCAGTGGCTGG

TAGCAG

52

V

J

IGHD5-12*01

IGHD6-19*01

IGHD6-6*01

D

53 of 61

Let’s try to untangle it!

TGTGCGGGGGGTAGCAGTGGCTGGATTGACTACTGG

TGTGCGAGGGGCGGCAGCAGCTGGTTTGACTACTGG

TGTGCGAGAGATAGCAGTGGCCGGTTTGACTACTGG

TGTGCGAGAGACGCTAGTGGCCCCTTTGACTACTGG

AGTGGCT

TAGCAGTGGCTGG

TAGCAG

53

V

J

IGHD5-12*01

IGHD6-19*01

IGHD6-6*01

54 of 61

Result - simplified unclear situation

54

To simplify such clonal graph we need more information about V(D)J formation

55 of 61

Result - VERY unclear situation

55

Most situations can not be simplified manually:

we need to introduce CDR3 analysis as an algorithm step

56 of 61

Analysis of SHMs

based on clonal tree

TGTGCCAGAGATCCTGGCAGCTCGTCTTACTGGTACTTCGATCTCTGG

56

synonymous mutations

meaningful mutations

V

D

J

CDR3 does not contain cleavage: only P & N insertions, that can be evidence of large age of B-cell

One of the longest paths contains synonymous SHMs only

57 of 61

Utilizing information about pairing

Heavy chain

Light chain

57

28415 - LC

28520 - LC

28449 - LC

58 of 61

Application of AntEvolo to analysis of immunization

58

before immunization

right after immunization

highest

immune response

Clonal analysis of time series of antibody repertoire allows one to estimate efficiency of immune response

59 of 61

Analysis of SHMs using clonal trees

59

shared

SHM

synonymous SHMs

Useless mutations:

Helpful mutation:

Deep analysis of SHMs in an antibody family using clonal tree allows one to detect helpful SHMs

Combination of shared hypermutations potentially can improve affinity of antibody

60 of 61

60

61 of 61

Thank you!

61