1 of 103

Immunoinformatics: application of algorithmic approaches to

solving immunological problems

Center for Algorithmic Biotechnology

St. Petersburg State University

Yana Safonova

2 of 103

Outline

  • Introduction
  • Repertoire construction problem
  • Evolutionary analysis of antibodies
  • Analysis of immune response dynamics
  • Analysis of paired antibody repertoires & new biological insights from analysis of paired repertoires

3 of 103

Innate & adaptive immune system

3

cell-mediated immune response

humoral immune response

4 of 103

Antibody & antigen

Antigen recognition

5 of 103

Antibody & antigen

Antigen recognition

Antibody - antigen binding

6 of 103

Antibody & antigen

Antigen recognition

Antibody - antigen binding

1. Antigen

neutralization

7 of 103

Antibody & antigen

Antigen recognition

2. Destroying antigen by

immune cells

Antibody - antigen binding

1. Antigen

neutralization

8 of 103

Once you’ve met an antigen,

your adaptive immune system never forgets it!

9 of 103

This principle is used for vaccine design:

Real antigens

Once you’ve met an antigen,

your adaptive immune system never forgets it!

10 of 103

This principle is used for vaccine design:

Real antigens

Vaccine

Once you’ve met an antigen,

your adaptive immune system never forgets it!

11 of 103

12 of 103

Where do antibody live?

13 of 103

Antibody repertoires

There is a billion of B-cells circulating in human blood at any given moment (out of 1018 estimated antibodies)

13

Analysis of concentrations of all antibodies in the organism (antibody repertoire) is a

fundamental problem in immunology

While generation of antibody repertoires provides a new avenue for antibody drug development, it remains unclear how to construct antibody repertoires from NGS data

14 of 103

V(D)J recombination

Antibodies are produced by B-cells, each with unique genome:

14

IGH locus in human genome (1 MB length)

15 of 103

Antibody somatic recombination

Antibodies are produced by B-cells, each with unique genome:

15

16 of 103

Antibody somatic recombination

Antibodies are produced by B-cells, each with unique genome:

16

17 of 103

Antibody somatic recombination

Antibodies are produced by B-cells, each with unique genome:

17

18 of 103

Antibody somatic recombination

Antibodies are produced by B-cells, each with unique genome:

18

19 of 103

Antibody somatic recombination

Antibodies are produced by B-cells, each with unique genome:

19

Random

insertions/deletions

20 of 103

Antibody somatic recombination

Antibodies are produced by B-cells, each with unique genome:

20

21 of 103

Antibody somatic recombination

Antibodies are produced by B-cells, each with unique genome:

21

22 of 103

Antibody somatic recombination

Antibodies are produced by B-cells, each with unique genome:

22

23 of 103

Antibody somatic recombination

Antibodies are produced by B-cells, each with unique genome:

23

24 of 103

Antibody somatic recombination

Antibodies are produced by B-cells, each with unique genome:

24

25 of 103

Antibody somatic recombination

Antibodies are produced by B-cells,

each with unique genome:

25

Random

insertions/deletions

26 of 103

Antibody somatic recombination

Somatic recombination results in unique immunoglobulins genes encoding amino acid sequence of antibodies

26

27 of 103

Antibody versus antigen

An antibody recognizes a foreign agent (antigen) using its antigen-binding site

27

28 of 103

Antigen binding site in antibody

The most diverged part of antigen-binding site is complementarity determining region 3 (CDR3)

28

CDR3

29 of 103

Somatic hypermutations

Further optimization of antibody affinity is achieved through somatic hypermutations

29

CDR3

30 of 103

...many somatic hypermutations

30

CDR3

31 of 103

Architecture of antibodies

32 of 103

From biological problems to computational challenges

VDJ classification problem. Given an antibody generated from a known set of V, D, and J segments, identify what specific V, D, and J segments generated this antibody

33 of 103

From biological problems to computational challenges

VDJ classification problem. Given an antibody generated from a known set of V, D, and J segments, identify what specific V, D, and J segments generated this antibody

34 of 103

From biological problems to computational challenges

VDJ classification problem. Given an antibody generated from a known set of V, D, and J segments, identify what specific V, D, and J segments generated this antibody

Important model organisms in immunology with still unknown sets of V, D, and J segments

35 of 103

From biological problems to computational challenges

VDJ classification problem. Given an antibody generated from a known set of V, D, and J segments, identify what specific V, D, and J segments generated this antibody

VDJ reconstruction problem. Given a set (millions) of antibodies generated from an unknown set of V, D, and J segments, reconstruct these sets

36 of 103

Outline

  • Introduction
  • Repertoire construction problem
  • Evolutionary analysis of antibodies
  • Analysis of immune response dynamics
  • Analysis of paired antibody repertoires & new biological insights from analysis of paired repertoires

37 of 103

Sequencing of antibody repertoire

37

Roche

454

(2005)

low coverage

low accuracy

long reads

VDJ classification

VDJ classification

38 of 103

Sequencing of antibody repertoire

Roche

454

(2005)

low coverage

low accuracy

long reads

VDJ classification

Illumina HiSeq 2000

(2001)

high coverage

high accuracy

short reads

+ CDR3 classification

CDR3 classification

VDJ classification

39 of 103

Sequencing of antibody repertoire

Roche

454

(2005)

low coverage

low accuracy

long reads

VDJ classification

Illumina MiSeq

(2013)

med. coverage

high accuracy

long reads

+ full-length classification

Illumina HiSeq 2000

(2001)

high coverage

high accuracy

short reads

+ CDR3 classification

full-length classification

CDR3 classification

VDJ classification

40 of 103

Sequencing of antibody repertoire

40

Illumina MiSeq

(2013)

med. coverage

high accuracy

long reads

+ full-length classification

Illumina HiSeq 2000

(2001)

high coverage

high accuracy

short reads

+ CDR3 classification

Roche

454

(2005)

low coverage

low accuracy

long reads

VDJ classification

HiSeq Rapid SBS Kit v2

(2015)

high coverage

high accuracy

long reads

+ high throughput

high

throughput

full-length classification

CDR3 classification

VDJ classification

41 of 103

Full-length antibody classification

(repertoire construction)

In contrast to well-studied VDJ and CDR3 classification, full-length antibody classification takes into account the entire variable region of antibody

41

MiGEC: Shugay et al., Nat Methods, 2014�MiXCR: Bolotin et al., Nat Methods, 2015�IMSEQ: Kuchenbecker et al., Bioinformatics, 2015�IgRepertoireConstructor: Safonova et al., Bioinformatics, 2015

42 of 103

Repertoire construction problem

  • Giant read clustering problem
  • Giant error correction problem

43 of 103

What makes this clustering problem difficult?

x 1018

High repetitiveness

High mutation rate

Huge repertoire size

Uneven distribution of abundances

  • Global coverage threshold cannot be used for error correction
  • Sequencing errors often look like natural variations

44 of 103

Outline

  • Introduction
  • Repertoire construction problem
  • Evolutionary analysis of antibodies
  • Analysis of immune response dynamics
  • Analysis of paired antibody repertoires & new biological insights from analysis of paired repertoires

45 of 103

Secondary diversification of antibodies

46 of 103

Clonal analysis of antibody repertoire

  • B-cell lineages reflect evolutionary development of antibodies

46

47 of 103

Clonal analysis of antibody repertoire

  • B-cell lineages reflect evolutionary development of antibodies
  • Lineage can be represented as a clonal tree

47

48 of 103

Clonal analysis of antibody repertoire

  • B-cell lineages reflect evolutionary development of antibodies
  • Lineage can be represented as a clonal tree
  • Some intermediate clones may be missing in the repertoire

48

49 of 103

Clonal analysis of antibody repertoire

Standard phylogenetic algorithms assume that all species are represented by leaves and should be adapted for clonal trees

49

50 of 103

Who is the ancestor here?

50

germline segments

51 of 103

Who is the ancestor here?

51

1

2

New

hypermutaions

52 of 103

Who is the ancestor here?

52

1

2

Shared hypermutations

New

hypermutaions

53 of 103

Another example: who is the ancestor here?

53

54 of 103

Another example: who is the ancestor here?

54

Individual hypermutations 1

Individual hypermutations 2

55 of 103

Ancestral antibody may be missing…

55

Shared hypermutaions

1

2

Ancestral antibody is not present in the repertoire

Individual hypermutations 1

Individual hypermutations 2

56 of 103

What is the evolutionary tree?

9 antibody sequences share CDR3 and differ by SHMs in V segments

Hypermutations (SHMs)

in V segment

57 of 103

Any tree reconstruction approach will work

+1

+3

+1

+2

+2

+4

+3

+1

Nested SHMs define directions of edges between antibodies in the clonal tree

58 of 103

Repertoire construction step is very important for clonal analysis!

59 of 103

Repertoire construction step is very important for clonal analysis!

60 of 103

SHMs in V segments are easy to find

D

J

  • One can easily identify mutations in the V segment using alignment against the template (germline V segment)

61 of 103

SHMs in CDR3 are difficult to identify

  • One can easily identify mutations in the V segment using alignment against the template (germline V segment)
  • But there is no template for CDR3!

62 of 103

SHMs in CDR3 are difficult to identify

  • One can easily identify mutations in the V segment using alignment against the template (germline V segment)
  • But there is no template for CDR3!
    • deletions in gene segments
    • non-genomic VD and DJ insertions
    • addition of palindromes

63 of 103

A more complex case: who is the ancestor?

63

CDR3

64 of 103

A more complex case: who is the ancestor?

64

CDR3

?

65 of 103

A more complex case: who is the ancestor?

65

1

2

Information about VDJ scenarios allows us to make the a choice:

  • Antibodies 1 and 2 belong to the same lineage

CDR3

?

66 of 103

A more complex case: who is the ancestor?

66

Information about VDJ scenarios allows us to make the right choice:

  • Antibodies 1 and 2 belong to the same lineage
  • Antibodies 1 and 2 are not related

CDR3

?

1

2

67 of 103

Another puzzle

4 antibodies share SHMs in V segments but differ in CDR3s

68 of 103

Another puzzle

  • It is unclear how to select direction between two similar CDR3s
  • It is unclear whether two similar CDR3s belong to a single clonal tree or not

69 of 103

Why do we need a VDJ probabilistic model?

To compute VDJ scenario, we need to:

  • perform VDJ classification to find germline segments (well-studied problem)
  • specify deletions in gene segments
  • specify non-genomic insertions
  • specify addition of palindromes

Murugan et al., PNAS, 2012

V

D

J

70 of 103

Why do we need a VDJ probabilistic model?

To compute VDJ scenario, we need to:

  • perform VDJ classification to find germline segments (well-studied problem)
  • specify deletions in gene segments
  • specify non-genomic insertions
  • specify addition of palindromes

Murugan et al., PNAS, 2012

V

D

J

Recombination events are not distributed uniformly

71 of 103

Why do we need a VDJ probabilistic model?

To compute VDJ scenario, we need to:

  • perform VDJ classification to find germline segments (well-studied problem)
  • specify deletions in gene segments
  • specify non-genomic insertions
  • specify addition of palindromes

Murugan et al., PNAS, 2012

V

D

J

We need a probabilistic VDJ recombination model for a realistic description of these events

Recombination events are not distributed uniformly

72 of 103

Why do we need an SHM probabilistic model?

Somatic hypermutagenesis engages AID enzyme that changes immunoglobulin genes to improve antibody affinity

Rogozin and Kolchanov, Biochimica et Biophysica Acta, 1992

SHM hotspots such as the degenerative 4-mers:

trigger mutations in antibodies

AT

AG

C

CT

73 of 103

Building probabilistic SHM model

  • The SHM model takes into account both the mutated nucleotide and its neighbours
  • Detect new hot spots and compares SHMs in IG chains

Yaari et al., Front Immunol, 2013

5-mer

Freq

A

C

G

T

ACAAC

83

0.24

0.48

0.28

GGCGT

1742

0.22

0.12

0.66

CCGTC

12

0.35

0.52

0.13

TCTCC

516

0.32

0.54

0.14

74 of 103

Building probabilistic SHM model

  • The SHM model takes into account both the mutated nucleotide and its neighbours
  • Detect new hot spots and compares SHMs in IG chains

Yaari et al., Front Immunol, 2013

TCTCC 5-mer profiles for IGL, IGH, and IGK chains aggregated over 60 datasets

5-mer

Freq

A

C

G

T

ACAAC

83

0.24

0.48

0.28

GGCGT

1742

0.22

0.12

0.66

CCGTC

12

0.35

0.52

0.13

TCTCC

516

0.32

0.54

0.14

75 of 103

Outline

  • Introduction
  • Repertoire construction problem
  • Evolutionary analysis of antibodies
  • Analysis of immune response dynamics
  • Analysis of paired antibody repertoires & new biological insights from analysis of paired repertoires

76 of 103

Time series

Laserson et al, PNAS, 2014

77 of 103

Clonal analysis in time

Clonal analysis of time series of antibody repertoire allows one to estimate efficiency of immune response

Sequencing data provided by

before immunization

right after immunization

highest

immune response

78 of 103

Outline

  • Introduction
  • Repertoire construction problem
  • Evolutionary analysis of antibodies
  • Analysis of immune response dynamics
  • Analysis of paired antibody repertoires & new biological insights from analysis of paired repertoires

79 of 103

Clonal analysis for antibody repertoire

Sequencing data provided by

80 of 103

Clonal analysis for paired antibody repertoire

Sequencing data provided by

81 of 103

Clonal analysis for antibody repertoire

  • utilizes information about chain pairing to construct paired clonal tree
  • reveals that, contrary to previous views, B-cells often co-express multiple heavy and light chains.

Sequencing data provided by

82 of 103

Light chain duality

co-expression of both kappa and lambda chains by a single B-cell

Pelanda et al., Cur Opin Immunol, 2014

Giachino et al., J Exp Med, 1995

83 of 103

Allelic inclusion

production of chains from both haplomes by B-cells

Casellas et al., J Exp Med, 2007

Beck-Engeser et al., PNAS, 1987

84 of 103

Duality + allelic inclusion

A single B-cell may express multiple chains due to allelic inclusions and/or light chain duality

85 of 103

Multi-chain effect

A single B-cell may express multiple chains due to allelic inclusions and/or light chain duality

Multi-chain effect: B-cell can express up to 6 different chains:

86 of 103

Multi-chain effect

A single B-cell may express multiple chains due to allelic inclusions and/or light chain duality

?

?

which ones participate in the real pairing?

Multi-chain effect: B-cell can express up to 6 different chains:

87 of 103

Multi-chain effect is common in healthy B-cells!

one heavy chain (IGM)

IGK + IGL

one heavy chain

(IGM +IGD)

IGK + IGL

one heavy chain (IGA)

IGK + IGL

25% (!) of B-cells with known pairing have allelic inclusions and/or light chain duality

two heavy chains and single light chain

two heavy chains and multiple light chains

88 of 103

Clonal analysis reveals true chain pairing

K1

H1

L1

Cells 1, 2, and 3 express identical heavy, kappa and lambda chains. Thus, 1, 2, and 3 are clones of the same B-cell

Which light chain contributes to the antibody:

kappa or lambda?

1

2

3

Example from AbVitro sequencing data

89 of 103

Clonal analysis reveals true chain pairing

K1

H1

L1

L2

1

2

3

Cell 4 shares heavy and kappa chains with cells 1, 2 and 3, but has different lambda chain (L2)

4

90 of 103

Clonal analysis reveals true chain pairing

K1

H1

L1

L2

1

2

3

Alignment of L1 and L2 reveals that L1 is an ancestor of L2

Thus, cell 4 is a descendant of cells 1, 2, and 3

4

Cell 4 shares heavy and kappa chains with cells 1, 2 and 3, but has different lambda chain (L2)

91 of 103

Clonal analysis reveals true chain pairing

Clonal analysis reveals true chain pairing

K1

H1

L1

L2

1

2

3

4

Alignment of L1 and L2 reveals that L1 is an ancestor of L2

Thus, cell 4 is a descendant of cells 1, 2, and 3

Evolution of L1 into L2 provides evidence that cells 1, 2, 3, and 4 generate functional antibodies

92 of 103

Clonal analysis reveals true chain pairing

K1

H1

L1

L2

1

2

3

4

But it contradicts with a fact that H1 is non-productive

Evolution of L1 into L2 provides evidence that cells 1, 2, 3, and 4 generate functional antibodies

Alignment of L1 and L2 reveals that L1 is an ancestor of L2

Thus, cell 4 is a descendant of cells 1, 2, and 3

93 of 103

There are more B-cells to analyze!

K1

H1

L1

L2

K2

H2

1

2

3

4

Cell 5 expresses heavy and kappa chains

5

94 of 103

There are more B-cells to analyze!

K1

H1

L1

L2

K2

K3

H2

1

2

3

4

5

K2 and K1 have originated from a an unknown kappa chain K3 that is missing in the repertoire

95 of 103

We are not done yet…

K1

H1

L1

L2

K2

K3

H2

H3

K4

L3

1

2

3

4

5

Cell 6 expresses heavy, kappa and lambda chains

6

96 of 103

We are not done yet…

K1

H1

L1

L2

K2

K3

H2

H3

K4

L3

1

2

3

4

5

Alignment reveals that H3 is an ancestor of H2

6

97 of 103

We are not done yet…

K1

H1

L1

L2

K2

K3

H2

H3

K4

L3

1

2

3

4

5

K4 is an ancestor of a virtual chain K3

6

98 of 103

We are not done yet…

K1

H1

L1

L2

K2

K3

H2

H3

K4

L3

1

2

3

4

5

L3 is an ancestor of L1

6

99 of 103

Evolutionary analysis helps to understand true chain pairing

H1 lineage is non-productive, so it does not participate in pairing

Lineage H3 → H2 is more likely to participate in chain pairing

100 of 103

Evolutionary analysis helps to understand true chain pairing

  • Lambda lineage contain synonymous mutations

  • Mutations in lambda lineage are grouped into CDRs
  • Mutations in kappa chain are distributed randomly along variable region

Lambda lineage undergoes selection, thus it more likely participates in chain pairing

101 of 103

Evolutionary analysis helps to understand true chain pairing

Using information about clonal lineages for H, K and L chains and the SHM model, we can select the most likely chain pairing

H3 → H2

L3 → L1 → L2

102 of 103

Alexander

Shlemov

Andrey

Bzikadze

Sergey

Bankevich

Alla

Lapidus

Pavel A.

Pevzner

Timofey

Prodanov

Andrey

Slabodkin

Yana

Safonova

103 of 103

Thank you!

103