1 of 55

Анализ модификаций гистонов на основании данных ChIP-Seq

1

2 of 55

План занятия

2

  1. Гистоновые метки и ENCODE
  2. Эксперимент ChIP-seq
  3. Дизайн эксперимента
  4. Биоинформатический анализ

3 of 55

Гистоновый код – названия меток

Что в химическом смысле означают метка H3K4me1, H3K36me, H3K36ac, H3K4me0?

Может ли на одной нуклеосоме находиться метки

H3K36me и H3K36ac

H3K36ac и H3K4me1

H3K4me1 и H3K4me3

H3K4me1 и H3K4ac

H3K4me1, H3K9ac, H3K14ac, H3K27me3, H3K36me3, H3K79me2, H4K12ac ?

4 of 55

Проект ENCODE – каталог всех функциональных элементов в геноме человека (энхансеры, промотеры и тп)

5 of 55

Проект ENCODE – каталог всех функциональных элементов в геноме человека (энхансеры, промотеры и тп)

6 of 55

Проект ENCODE – матрица со всеми экспериментами

7 of 55

Проект ENCODE – данные ChIP-seq

(клеточные линии)

8 of 55

Проект ENCODE – данные ChIP-seq

(первичные клетки и органы)

9 of 55

Проект ENCODE – визуализация через UCSC геномный браузер

10 of 55

Проект ENCODE – визуализация через UCSC геномный браузер

11 of 55

ChIP-seq на транскрипционные факторы

(Transcription Factors, TFs)

Chromatin

Immuno

Precipitation

12 of 55

ChIP-seq

Осажде́ние, преципита́ция — образование твёрдого осадка в растворе в процессе химической реакции, например, при добавлении соответствующих реагентов. Химическое вещество, вызывающее образование твёрдого вещества, называют «осадителем».

12

13 of 55

ChIP-seq

13

Иммунопреципита́ция — метод выделения белка из сложных смесей, таких как клеточные лизаты, сыворотки и тканевые гомогенаты, при помощи специфичных к белку антител.

https://link.springer.com/protocol/10.1007/978-1-0716-3362-5_15

14 of 55

ChIP-seq

14

Иммунопреципита́ция — метод выделения белка из сложных смесей, таких как клеточные лизаты, сыворотки и тканевые гомогенаты, при помощи специфичных к белку антител.

Иммунопреципитация хроматина (ChIP) — это тип экспериментального метода иммунопреципитации, используемый для исследования взаимодействия белков и ДНК в клетке.

15 of 55

ChIP-seq for TF

Nature Reviews Genetics 13, 840-852

A | Chromatin immunoprecipitation followed by sequencing (ChIP–seq) for DNA-binding proteins such as transcription factors. Recent variations on the standard protocol include using endonuclease digestion instead of sonication (ChIP–exo) to increase the resolution of binding-site detection and to eliminate contaminating DNA, and DNA amplification after ChIP for samples with limited cells.

b | ChIP–seq for histone modifications uses micrococcal nuclease (MNase) digestion to fragment DNA and can also now be run on low-quantity samples when combined with the additional post-ChIP amplification.

16 of 55

Технология ChIP-Seq

16

  1. First, the DNA-binding protein is crosslinked to DNA in vivo by treating cells with formaldehyde
  2. Then the chromatin is sheared by sonication into small fragments, generally in the 200–600 bp range.
  3. Then an antibody specific to the protein of interest is used to immunoprecipitate the DNA-protein complex.
  4. Finally, the crosslinks are reversed and the released DNA is assayed to determine the sequences bound by the protein.

In construction of a sequencing library, the immunoprecipitated DNA is subjected to size selection (typically in the ~150–300 bp range), although there appears to be a bias toward shorter fragments in sequencing.

17 of 55

sc-ChIP-seq

17

18 of 55

Library Preparation

Need sufficient amount of starting material because the ChIP will enrich for a small proportion

Ideally the starting material for one ChIP uses 107 cells from culture

19 of 55

Crosslink proteins to DNA

20 of 55

Fragment

The DNA is sheared into small fragments - usually 200-500 bp in length

Check by running on a gel

21 of 55

Protein specific antibody

The sheared protein-bound DNA is immunoprecipitated using a specific antibody

22 of 55

Immunoprecipitate

The antibody binds primarily to the protein of interest but there may be cross reactivity with other proteins with similar epitopes

23 of 55

Reverse crosslink and purify DNA

24 of 55

Откуда антитела?

24

25 of 55

http://www.slideshare.net/nasagusto/monoclonal-antibodies-14851287

26 of 55

Давайте вернемся к шагам эксперимента и подумаем о потенциальных проблемах

27 of 55

Impact of sequencing depth

H3K4me3

Adapted from Jung et al (2014). NAR.

28 of 55

Impact of sequencing depth

H3K27me3

Adapted from Jung et al (2014). NAR.

29 of 55

Why are controls necessary?

  • Signal depends on # active binding sites, the number of starting genomes, IP efficiency
  • Open chromatin regions fragment more easily than closed regions
  • Repetitive sequences might seem to be enriched
  • Uneven distribution of sequence tags across the genome
  • Hyper-ChIPable regions
  • Allows us to compare with the same region in a matched control
  • ENCODE also provides a “Black List”

17

30 of 55

ChIP-Seq Controls

18

Crosslink proteins to DNA

Shear DNA (sonication)

Reverse crosslink

Size selection and PCR

Immunoprecipitation

Non-specific antibody (IgG “mock IP”)

Specific antibody (ChIP enrichment)

+

No IP (Input DNA)

Biological samples/Library preparation

31 of 55

Сравнение сигнала и фона ("шума")

https://bioinformatics-core-shared-training.github.io/cruk-autumn-school-2017/ChIP/Materials/Lectures/Lecture5_Peak%20Calling_SS.pdf

32 of 55

Replicates and reproducibility

  • Biological replicates are essential to understand variation and for differential binding analysis
  • More replicates is often preferable to greater depth
  • Better to sequence high- quality sample at lower depth than low-quality sample to higher depth

All binding

High confidence

2090

Sox2 Replicate 1

(4605)

Sox2 Replicate 2

(2382)

33 of 55

Bioinformatics analysis

Nature Protocols 7, 45–61 (2012)

34 of 55

Поиск пиков – peak calling

The tag density around a true binding site should show a bimodal enrichment pattern (or paired peaks).

MACS first scans the whole dataset searching for highly significant enriched regions.

Given a sonication size (bandwidth) and a high-confidence fold-enrichment (mfold), MACS slides two bandwidth windows across the genome to find regions with tags more than mfold enriched relative to a random tag genome distribution.

MACS randomly samples 1,000 of these high-quality peaks, separates their positive and negative strand tags, and aligns them by the midpoint between their centers.

The distance between the modes of the two peaks in the alignment is defined as ‘d’ and represents the estimated fragment length. MACS shifts all the tags by d/2 toward the 3’ ends to the most likely protein-DNA interaction sites.

35 of 55

Программы поиска ChIP-seq пиков

36 of 55

36

37 of 55

Peak callers

  • Variability in number of peaks called
  • Tend to agree on the strongest signals

WIlbanks & Facciotti (2010). PLoS ONE.

38 of 55

Проект ENCODE – визуализация через UCSC геномный браузер

39 of 55

Downstream analysis

  • Detecting differential enrichment across samples
    • Steinhauser et al, Brief Bioinform. (2016)

Figure 4. Proportion of true and false positives for each tool on the simulated FoxA1 data set (A, B) and H3K36me3 data (C, D)

Sharp ChIP-seq signal: FoxA1

Broad ChIP-seq signal: H3K36me3

Single replicate tools Multiple replicate tools

40 of 55

Decision tree indicating the proper choice of tool depending on the data set: shape of the signal (sharp peaks or broad enrichments), presence of replicates and presence of an external set of regions of interest [Steinhauser, et al, 2016].

41 of 55

Downstream analysis

  • Annotation of peaks - distance from TSS
    • ChIPseeker, Homer, ChiLin

42 of 55

Downstream analysis

  • Annotation of peaks - genomic context
    • ChIPseeker, Homer, ChiLin

43 of 55

Downstream analysis

  • Functional enrichment analysis
    • ChIPseeker, GREAT, Homer, ChiLin

44 of 55

Downstream analysis

  • Motif discovery
    • MEME suite, ChiLin, Homer

45 of 55

TF binding motif: PWM

46 of 55

If you need good motifs to compare...

47 of 55

De novo motif search

Nature Biotechnology 24, 959 - 961 (2006)

48 of 55

Software for de novo motif seach: MEME suite

49 of 55

ChIP-seq на гистоновые модификации

50 of 55

Белки, которые пишут и стирают гистоновый код

В БД EpiFactors имеется информация про 69 белковых комплекса

(список не обновлялся несколько лет)

51 of 55

Где находятся инструкции о том где и что писать в эпигенетике?

Инструменты -- белковые комплексы, которые способны делать или убирать определенные эпигенетические модификации

Результат работы – наблюдаемые эпигенетические изменения/состояния в разных клетка организма

Инструкции? – кто указывает, что и где писать или стирать?

52 of 55

Длинные некодирующие РНК (lncRNA)

53 of 55

нкРНК MEG3 – образование триплексов

54 of 55

нкРНК CHASERR– ко-транскрипционное РНК-РНК вз-вия

55 of 55

В технологии CRISPR также используется нкРНК (gRNA)