1 of 71

Computer Science meets Immunology:

how computational analyses help to

study diseases

Yana Safonova

postdoctoral researcher, PhD

University of California San Diego

University of Louisville School of Medicine

@yana_safonova_

2 of 71

Newly emerging and re-emerging diseases

2

3 of 71

Newly emerging and re-emerging diseases

3

SARS-CoV-2

4 of 71

Response to (re)emerging diseases

4

5 of 71

Response to (re)emerging diseases

5

6 of 71

Introduction to molecular biology

6

7 of 71

Introduction to molecular biology

Human genome ~ 3Gbp

8 of 71

Introduction to molecular biology

8

The Library of Babel (H. L. Borges) contains all possible combinations of symbols, mostly meaningless ones

Human genome ~ 3Gbp

Genes in human genome take 1-2% of its length

9 of 71

Introduction to molecular biology

9

Hou and Lin, PLoS ONE, 2009

The Library of Babel (H. L. Borges) contains all possible combinations of symbols, mostly meaningless ones

Human genome ~ 3Gbp

Genes in human genome take 1-2% of its length

10 of 71

Introduction to molecular biology

10

Hou and Lin, PLoS ONE, 2009

The Library of Babel (H. L. Borges) contains all possible combinations of symbols, mostly meaningless ones

Human genome ~ 3Gbp

Genes in human genome take 1-2% of its length

The central dogma of biology

11 of 71

The birth of bioinformatics

11

Genome assembly

Human Genome Project

1990 – 2003

12 of 71

The birth of bioinformatics

12

Genome assembly

Gene prediction

13 of 71

The birth of bioinformatics

13

Genome assembly

Gene prediction

Gene expression

14 of 71

The birth of bioinformatics

14

Genome assembly

Gene prediction

Gene expression

Differential gene expression

15 of 71

Immune system = innate (or inherited) +

adaptive (or acquired) immune systems

15

16 of 71

Adaptive immune system

  • Variety of threats to human body is huge and unpredictable

  • Genome is too small to encode defences against all these threats

  • Immune system has an ability to adapt to various threats using agents (e.g., antibodies) that are not encoded in the genome.

16

17 of 71

Antibodies are agents of the adaptive immune system

  • Antibodies are proteins that bind to an antigen and cause its neutralization

  • Antibodies are not encoded in the genome directly, but present a result of somatic genomic recombination

17

V

V

V

D

D

D

J

J

D

chr 14 (human genome)

18 of 71

Antibodies are agents of the adaptive immune system

  • Diversity of antibody genes is extremely high
  • Set of produced antibodies (antibody repertoire) is unique for an individual
  • Antibodies are proteins that bind to an antigen and cause its neutralization

  • Antibodies are not encoded in the genome directly, but present a result of somatic genomic recombination

18

V

V

V

D

D

D

J

J

D

J

D

V

V

J

antibody gene

chr 14 (human genome)

19 of 71

Antibodies are agents of the adaptive immune system

  • Diversity of antibody genes is extremely high
  • Set of produced antibodies (antibody repertoire) is unique for an individual
  • Antibodies are proteins that bind to an antigen and cause its neutralization

  • Antibodies are not encoded in the genome directly, but present a result of somatic genomic recombination

19

V

V

V

D

D

D

J

J

D

J

D

V

V

J

chr 14 (human genome)

J

D

V

V

J

20 of 71

Why are antibodies so versatile if there are only 55×23×6 VDJ recombinations?

20

21 of 71

Why are antibodies so versatile if there are only 55×23×6 VDJ recombinations?

21

Recombination process is imperfect and includes many random processes:

  • Palindromic insertions

22 of 71

Why are antibodies so versatile if there are only 55×23×6 VDJ recombinations?

22

Recombination process is imperfect and includes many random processes:

  • Palindromic insertions
  • Segment cleavage

23 of 71

Why are antibodies so versatile if there are only 55×23×6 VDJ recombinations?

23

Recombination process is imperfect and includes many random processes:

  • Palindromic insertions
  • Segment cleavage

24 of 71

Why are antibodies so versatile if there are only 55×23×6 VDJ recombinations?

24

Recombination process is imperfect and includes many random processes:

  • Palindromic insertions
  • Segment cleavage
  • Non-genomic insertions

25 of 71

Antibody repertoire sequencing (Rep-seq)

25

V

D

J

Length: ~360 nt

VDJ from DNA or RNA

Error-prone immunosequencing reads

sequencing read

26 of 71

Antibody repertoire is unique for an individual

26

27 of 71

Antibody repertoire is unique for an individual

  • VDJ sequences are extremely diverse
  • If a VDJ sequence is shared between two individuals, it is a likely a frequent recombination rather a functionally important sequence
  • We cannot study antibody responses just comparing VDJ sequences

frequent VDJ recombinations

28 of 71

Genomic variations

28

29 of 71

Genomic variations

29

mother ‘s DNA

father ‘s DNA

30 of 71

Genomic variations

30

mother ‘s DNA

father ‘s DNA

Single Nucleotide Polymorphism (SNP) is associated with a position in the genome:

  • A – both chromosomes have A
  • A / G – one chromosome has A, another one has G
  • G – both chromosomes have G

31 of 71

Altered genes produce altered proteins

31

Single Nucleotide Polymorphism (SNP) is associated with a position in the genome:

  • A – both chromosomes have A
  • A / G – one chromosome has A, another one has G
  • G – both chromosomes have G

32 of 71

Altered genes produce altered proteins

32

Single Nucleotide Polymorphism (SNP) is associated with a position in the genome:

  • A – both chromosomes have A
  • A / G – one chromosome has A, another one has G
  • G – both chromosomes have G

33 of 71

Genome-wide association studies (GWAS)

33

A

A/G

G

Patients

15

10

8

Controls

2

6

25

P-value (Fisher exact probability test) = 0.000026

a single SNP

34 of 71

Genome-wide association studies (GWAS)

34

35 of 71

Variants of IGHV1-69 shape Ab response to flu

35

V

D

J

IGHV1-69

IGHV1-69*01

IGHV1-69*02

IGHV1-69*03

IGHV1-69*14

...

54: F

54: L

54: F

54: F

IGHV1-69*04

54: L

Successful binding to hemagglutinin

Loss of binding properties

Avnir et al., Sci Rep, 2016

Titers (= antibody counts) before and after immunization

36 of 71

GWAS of antibody responses

36

Challenges

  • A single IG locus includes many V, D, and J genes. If a single gene loses functionality, in most cases others can replace it
  • Recent studies report a lack of associations between genomic variations outside IG loci and adaptive immune responses
  • Many other factors (age, diet, environment) influence adaptive immune responses
  • Features of antibody repertoires go far beyond genomic variations

37 of 71

Bioinformatics + immunology = immunoinformatics

37

Go I Know Not Whither and

Fetch I Know Not What

3D organization

amino acid content and PTMs

allelic

diversity

unusual VDJs

cross- and self-reactivity

antibody

dependent enhancement

non-coding variations

???

38 of 71

Bioinformatics + immunology = immunoinformatics

38

Go I Know Not Whither and

Fetch I Know Not What

Error-prone immunosequencing reads

39 of 71

Bioinformatics + immunology = immunoinformatics

39

Go I Know Not Whither and

Fetch I Know Not What

Error-prone immunosequencing reads

40 of 71

~ Flu ~

40

41 of 71

CDRs represent antigen-binding sites

41

V

D

J

CDR1

CDR2

CDR3

42 of 71

Anatomy of IGHV1-69-guided response to flu

42

Control Ab

Lindwood et al., Nature, 2012

IGHV1-69*01

IGHV1-69*02

IGHV1-69*03

IGHV1-69*14

...

54: F

54: L

54: F

54: F

IGHV1-69*04

54: L

Two flu-specific Abs:

43 of 71

Antibody titers vs IGHV1-69 genotype

43

Avnir et al., Sci Rep, 2016

Titers (= antibody counts) before and after immunization

The titer data suggests that genotype of IGHV1-69 shapes the response to flu and other V genes do not fully replace the “bad” variant of IGHV1-69

~20% of people have L/L variants of IGHV1-69

44 of 71

Bioinformatics analysis of flu response

44

Ke, Nouri, Safonova, et al., in preparation

  • Usage of gene G = the fraction of VDJ sequences derived from G
  • Individual antibody repertoire can be described as a usage vector for all existing genes
  • We can compare usage vectors for F/F, F/L, and L/L individuals:

45 of 71

Linked variations of V genes

45

Safonova and Watson, in preparation

A

A/G

G

C

15

10

8

A/C

6

8

3

A

2

5

17

P-value = 0.00967

Blue group is a set of V genes associated with flu response

These genes have linked variations and perhaps close specificities

46 of 71

~ HIV ~

46

47 of 71

HIV and antibody response

47

HIV infects immune cells:

  • T cells
  • Macrophages
  • Dendritic cells

48 of 71

HIV and antibody response

48

HIV infects immune cells:

  • T cells
  • Macrophages
  • Dendritic cells

B cells producing antibodies are not affected by HIV but cannot fight it because of a high mutation rate of the HIV genome

49 of 71

Recognition of HIV

Verkoczy, Adv Immunol, 2017

Alter & Ackerman, Cell, 2014

50 of 71

CDRs represent antigen-binding sites

50

V

D

J

CDR1

CDR2

CDR3

51 of 71

Long CDR3s in response to HIV

51

Broadly neutralizing antibodies against HIV are characterized by extremely long CDR3s

Ultralong CDR3s in human antibodies present a result of VDDJ recombination:

Sok et al., Nature, 2017

Safonova and Pevzner, Front Immunol, 2019

52 of 71

Finding VDDJ recombinations

Safonova and Pevzner, Front Immunol, 2019

53 of 71

Finding VDDJ recombinations

Safonova and Pevzner, Front Immunol, 2019

54 of 71

Most D-D pairs follow ordering in IGH locus

14 datasets

Safonova and Pevzner, Front Immunol, 2019

14 individuals

55 of 71

Most D-D pairs follow ordering in IGH locus

14 datasets

Safonova and Pevzner, Front Immunol, 2019

14 individuals

Can we explain the most frequent pairs?

56 of 71

Recombination signal sequences

56

23

J

12/23

D

12

12

D

12/23

V

12

23

12

The 12/23 rule forbids:

  • V-J pairs
  • D-D pairs
  • V-V pairs
  • J-J pairs

12 nt = 1 turn of DNA

23 nt = 2 turns of DNA

57 of 71

Non-canonical recombination signals

57

Pos 1

Pos 2

A

= 11/26

= 1/26

C

= 1/26

= 25/26

G

= 0/26

= 0/26

T

=14/26

= 0/26

58 of 71

Non-canonical recombination signals

58

Safonova and Pevzner, Genome Res, 2020

Pos 1

Pos 2

A

= 11/26

= 1/26

C

= 1/26

= 25/26

G

= 0/26

= 0/26

T

=14/26

= 0/26

Prob(ACGTACGTA) =

= 11/26 * 25/26 * ...

59 of 71

Non-canonical recombination signals

59

Safonova and Pevzner, Genome Res, 2020

Pos 1

Pos 2

A

= 11/26

= 1/26

C

= 1/26

= 25/26

G

= 0/26

= 0/26

T

=14/26

= 0/26

Prob(ACGTACGTA) =

= 11/26 * 25/26 * ...

3 DNA turns

2 DNA turns

60 of 71

Non-canonical recombination signals

60

Safonova and Pevzner, Genome Res, 2020

3 DNA turns

2 DNA turns

V

23

J

1-turn/2-turn

D

12

23

34

D

12

23

1-turn/2-turn

12

12

23

34

1-turn/2-turn

1-turn/3-turn

61 of 71

Sequence alignment

61

62 of 71

Tandem repeats correlate with D-D fusions

62

Safonova and Pevzner, Genome Res, 2020

R1

R2

R3

R4

D6-6

D6-13

D6-19

D1-1

D1-7

D1-14

D1-20

D2-2

D2-8

D2-15

D2-21

D3-3

D3-9

D3-10

D3-16

D3-22

D4-4

D4-11

D4-17

D4-23

D5-5

D5-12

D5-18

D5-24

mouse

Common marmoset

pale spear-nosed bat

63 of 71

~ SARS-CoV-2 ~

63

64 of 71

Antibodies are subjects of fast evolution

65 of 71

Immune system mutates and amplifies a binding antibody

Mutation rate in antibody genes is 3-4 order of magnitude higher than in other genome

66 of 71

One antibody = one antigen

67 of 71

Antibody repertoire is a set of clonal lineages

68 of 71

Severity of COVID-19 and Ab response

68

Safonova, Tieri, et al., in preparation

non-naive only (IgG)

naive and non-naive (IgM)

69 of 71

Immunogenomics approach to vaccine design

69

Watson, Glanville, Marasco, Trends in Immunol, 2017

70 of 71

Data science approach to predicting the efficiency of antibody response

70

V

V

V

D

D

D

J

J

C

1

2

3

1

2

3

1

2

VDDJ

SHM rate

...

71 of 71

71

Acknowledgments

U of Louisville

William Gibson

Justin Kos

Oscar Rodriguez

David Tieri

Jun Yan

U of New South Wales

Andrew Collins

Katherine Jackson

UCSD

Massimo Franceschetti

Siavash Mirarab

Ramesh Rao

Andrey Bzikadze

Vinnu Bhardwaj

Chao Zhang

USDA

Tim Smith

Sung Bong Shin

Iowa State U

James Reecy

Luke Kramer

Scripps Institute

Raiees Andrabi

Vaughn Smider

Smithsonian Conservation Biology Institute

Klaus-Peter Koepfli

Institute for Bioorganic Chemistry

Ivan Zvyagin

Artem Mikelov

Mikhail Shugay

Pavel Pevzner

UCSD

Intersect fellowship for computational immunologists

Data Science postdoctoral fellowship

Corey Watson

U of Louisville

Harvard Medical School

Wayne Marasco

Hanzhong Ke

Yale University

Steven Kleinstein

Nima Nouri