1 of 44

Transcriptomics and epigenomics

Spring 2024

(Based on slides made by Ivan Antonov and Bareket Dassa)

2 of 44

Part 1. Biological background�Regulatory regions

3 of 44

На прошлом занятии мы изучили ChIP-seq,

Вопрос: когда транскрипционный фактор может связаться с ДНК?

4 of 44

Learning objectives

  • What kinds of regulatory regions do you know?
  • Promoters, enhancers, insulators: what is the difference?
  • The concept of 'open chromatin'

5 of 44

Simple eukaryotic transcriptional unit (a) vs complex metazoan transcriptional control module (b)

6 of 44

Regulatory elements

  • 'Open chromatin' in the region
  • Nucleosome depletion on the site
  • Regulatory protein (transcription factor) binding

7 of 44

Transcription factor

- A protein that binds to specific DNA sequences

- Usually in a complex

- Promote (as an activator) or block (as a repressor) the recruitment of RNA polymerase

- More than 1500 TF in humans

8 of 44

Regulatory elements: promoter

A regulatory region of DNA usually located upstream (or around) TSS, providing a control point for regulated gene transcription initiation

9 of 44

Promoter elements

Not all of these elements are present in all promoters and many of these elements are present in lineage-specific genes.

Genes & Dev. 2002. 16: 2583-2592

10 of 44

Enhancer

Relatively distant regulatory element: can be proximal (~300bp) and distal (up to Mb)

11 of 44

Promoter vs enhancer

12 of 44

Insulator

CTCF

13 of 44

CTCF function

Aire regulates chromatin looping by evicting CTCF from domain boundaries and favoring accumulation of cohesin on superenhancers (2021)

14 of 44

Part 2. Next Generation Sequencing�DNA accessibility analysis�(DNAse-seq and ATAC-seq)

15 of 44

Regulation of transcription

ENCODE project

16 of 44

DNase-seq

Nature Methods 11, 39–40 (2014)

Identification of the regulatory regions, based on the genome-wide sequencing of regions sensitive to cleavage by DNase I.

17 of 44

DNase protocol

For the analysis the same ChIP-seq software can be used

The only difference is that we are not looking for the center but rather for the edge of the fragment

Solution: move reads ½ of the length upstream

18 of 44

Footprinting

19 of 44

Footprinting

v

(Neph 2012)

20 of 44

Dnase-specific protocol: ENCODE

21 of 44

DNase-specific software: HotSpots

Hotspots (height = score)

enrichment of tags relative to a local background model based

on the number of tags in a 50kb surrounding window

22 of 44

Different Types of DNase-Seq

  • (1) Without size selection: looking for all restriction sites – algorithms looking for very short regions or even hotspots should be used
  • (2) With size selection: looking for the regions where restrictions sites are not further 50-100 bp apart – can be processed as ChIP-seq data with just shifted reads (MACS2, HOMER, etc.)

23 of 44

ATAC-seq

ATAC-seq stands for Assay for Transposase-Accessible Chromatin with high throughput sequencing. ATAC-seq employs a mutated hyperactive transposase Tn5. The high activity allows for highly efficient cutting of exposed DNA and simultaneous ligation of specific sequences adapters.

24 of 44

Transposase and trasposones

  • Transposase is an enzyme that binds to the end of a transposon and catalyses its movement to another part of the genome by a cut and paste mechanism or a replicative transposition mechanism.

25 of 44

Nobel prize for the discovery of transposones

26 of 44

DNAse-seq vs ATAC-seq

27 of 44

Why using ATAC-seq?

ATAC-seq provides genome-wide information on open chromatin regions at nucleotide resolution

What information?

  • Profile regulatory elements (promoters, enhancers), which are

accessible to transcription machinery

  • Nucleosome positioning and chromatin compaction
  • Characterize genome wide DNA-protein interactions (TF, RNA polymerase)

28 of 44

Why using ATAC-seq?

  • Rapid assay preparation time, less experimental calibration
  • Protocol requires a small input (500-50,000 cells)
  • To quantify differences in cellular response to treatment or disease
  • To gain mechanistic insight into gene regulation by TF footprinting

Is chromatin accessibility indicative of active/functional regulatory regions?

“Patterns of reads in open chromatin regions result from a complex interplay of experimental

effects with TF binding and nucleosome occupancy, among other biological factors”

He, H. H. et al. Refined DNase-seq protocol and data analysis reveals intrinsic bias in transcription factor footprint identification. Nature Methods (2014).

29 of 44

ATAC-seq compared with ChIP-seq

https://www.sciencedirect.com/scienc

e/article/pii/S1350946216300301#fig6

30 of 44

Specifics of peak calling

  • Dnase- and ATAC – seq:
    • search for any kind of regulatory region
    • cannot find a regulator, only target regions
    • cleavage site is at the end of the fragment.

31 of 44

ATAC-seq analysis workflow

Remove adaptors & quality trimming,

Quality control

Reads mapping to the genome

Remove mitochondrial reads, reduce PCR duplicates

Select nucleosome-free fragments

FastQC

cutadapt

bowtie2

grep, samtools,

Picard tools

awk, samtools

Why?

32 of 44

Why should we reduce reads, which map on mitochondrial DNA?

ATAC-seq samples may contain ~20–80% of mitochondrial sequencing reads, depending on the cell type

Regions of open chromatin of interest are usually located in the nuclear genome

Reduce PCR duplicates

From: softgenetics.com

33 of 44

ATAC-seq analysis workflow

Remove adaptors & quality trimming, Quality control

Reads mapping to the genome (paired-end)

Remove mitochondrial reads, reduce PCR duplicates

Select nucleosome-free fragments

FastQC cutadapt

bowtie2

grep, samtools, Picard tools

awk, samtools

How?

34 of 44

ATAC-seq provides genome-wide information on chromatin compaction

Buenrostro (2013) Nature Methods

  • Insert size distribution of sequenced fragments shows a periodicity of ~200bp
  • Selecting reads that are shorter than the length generally protected by a nucleosome

nucleosome- free fragments

35 of 44

ATAC-seq analysis workflow (2)

Peak calling

Peaks visualization

Peak annotation, Motif discovery, TF footprinting

NGSplot, IGV

Homer, GREAT, CEAS

macs2

36 of 44

Peak calling and visualization in a genome browser

37 of 44

Identify ATAC-seq bias

treatment

  • Controlling for the enzymatic cleavage bias with “naked DNA” control The Tn5 transposase is known to cleave DNA in a sequence-dependent manner, because of its tendency to cleave some DNA sequences more efficiently than others.

Chung, H.-R. et al. The effect of micrococcal nuclease digestion on nucleosome positioning data. PLoS ONE (2010).

  • Avoiding high read redundancy:

Filter out duplicate reads to avoid calling false peaks (i.e. reads at the exact same genome location and the same strand if their number exceeds the expected redundancy).

  • Adjusting for sequencing depth:

High levels of coverage are need for an informative experiment

control

ENCODE consortium’s Standards, Guidelines and Best Practices: https://www.encodeproject.org/atac-seq/

38 of 44

Identify ATAC-seq bias

“Better methods are also needed to compare chromatin profiles between treatment groups and to account for variability in sample quality, enrichment level, batch effect and read depth”

Identifying and mitigating bias in next-generation sequencing methods for chromatin

biology. Clifford A. Meyer & X. Shirley Liu, Nature Reviews Genetics 2014

http://www.nature.com/nrg/journal/v15/n11/full/nrg3788.html#t1

Omni-ATAC, an improved ATAC-seq protocol, with improvement of signal- to-background ratio, identifies more accessible regions, using fewer cells and less amounts of transposase

(Corces et al, Nature Methods 2017)

39 of 44

Additional applications of ATAC-seq

Genome-wide information on

nucleosome positioning in regulatory regions

40 of 44

Integrating ATAC-seq data and other epigenetic information

For studying the relationship between genome structure and changes in global patterns of gene expression

For more reading:

Integration of ATAC-seq and RNA-seq to generate dynamic gene regulatory networks:

A Transcriptional Time Course of Myeloid Differentiation

(Ramirez et al., 2017, Cell Systems)

Combining ATAC-Seq with

RNA-Seq:

https://link.springer.com/protocol/10.1007% 2F978-1-4939-8618-7_15

41 of 44

“The chromatin accessibility landscape of primary human cancers”

Science (2018)

  • Generated ATAC-seq data in 410 tumor samples from TCGA across 23 cancer types.
  • Predicted interactions between DNA regulatory elements and gene promoters by genome-wide correlation of gene expression and chromatin accessibility, integrated with other omics data available for the tumor samples.

42 of 44

Applications of ATAC-seq

Infer footprints of DNA-protein binding

(genome-wide factor occupancy)

43 of 44

High-throughput single-cell ATAC-seq

Toward Single-cell “Regulomics”?

Single-cell chromatin accessibility can reveals cell-type-specific epigenomic variability

Experimental design for using ATAC-seq to identify epigenomic states of

multiple cell types from human donors:

Schematic of the ATAC-seq workflow on the SMARTer ICELL8 system:

44 of 44

ATAC-seq lecture summary

  • Using NGS to profiles genome-wide epigenetic modifications
  • ATAC-seq captures open and accessible regions of chromatin
  • A transposase Tn5 cuts an exposed DNA region and simultaneous ligated sequencing adapters
  • A bioinformatic workflow based on nucleosome-free fragments is available
  • ATAC-seq enables profiling promoters, enhancers, or other regulatory elements accessible to transcription machinery
  • Application in high-throughput single-cell ATAC-seq