1 of 55

Transcriptomics and epigenomics

Spring 2025

2 of 55

Part 1. Biological background�Regulatory regions

3 of 55

На прошлом занятии мы изучили ChIP-seq,

Вопрос: когда транскрипционный фактор может связаться с ДНК?

4 of 55

Цели

  • Какие типы регуляторных областей вы знаете?
  • Промоторы, энхансеры, инсуляторы: в чем разница?
  • Концепция «открытого хроматина»

5 of 55

Simple eukaryotic transcriptional unit (a) vs

complex metazoan transcriptional control module (b)

6 of 55

Regulatory elements

  • 'Open chromatin' in the region
  • Nucleosome depletion on the site
  • Regulatory protein (transcription factor) binding

7 of 55

Transcription factor

- A protein that binds to specific DNA sequences

- Usually in a complex

- Promote (as an activator) or block (as a repressor) the recruitment of RNA polymerase

- More than 1500 TF in humans

8 of 55

Regulatory elements: promoter

A regulatory region of DNA usually located upstream (or around) TSS, providing a control point for regulated gene transcription initiation

9 of 55

Promoter elements

Not all of these elements are present in all promoters and many of these elements are present in lineage-specific genes.

Genes & Dev. 2002. 16: 2583-2592

10 of 55

  • Синтетический фактор транскрипции (synTF): как правило, экзогенный TF, который содержит DBD со специфичностью связывания, ортогональной организму или типу клеток, в которых он используется для управления индуцируемым промотором.
  • Трансген: ген, кодирующий природный или синтетический белок или продукт РНК, который синтетически вводится в клетку.
  • TAD - transcriptional activation domains
  • DBD - DNA-binding domain

Искусственные промоторы у трансгенов

11 of 55

Enhancer

Relatively distant regulatory element: can be proximal (~300bp) and distal (up to Mb)

12 of 55

Promoter vs enhancer

13 of 55

Insulator

CTCF

14 of 55

CTCF function

Aire regulates chromatin looping by evicting CTCF from domain boundaries and favoring accumulation of cohesin on superenhancers (2021)

15 of 55

Part 2. Next Generation Sequencing�DNA accessibility analysis�(DNAse-seq and ATAC-seq)

16 of 55

Regulation of transcription

ENCODE project

17 of 55

Доступность хроматина изменяется при раке

18 of 55

Доступность хроматина изменяется при раке

19 of 55

DNase-seq

Nature Methods 11, 39–40 (2014)

Identification of the regulatory regions, based on the genome-wide sequencing of regions sensitive to cleavage by DNase I.

20 of 55

DNase protocol

For the analysis the same ChIP-seq software can be used

The only difference is that we are not looking for the center but rather for the edge of the fragment

Solution: move reads ½ of the length upstream

21 of 55

Types of DNase-Seq

  • Без выбора размера: поиск всех сайтов рестрикции — следует использовать алгоритмы, ищущие очень короткие регионы или даже горячие точки
  • С выбором размера: поиск регионов, где сайты рестрикции не находятся дальше, чем на 50–100 п.н. друг от друга — можно обрабатывать как данные ChIP-seq с просто смещенными чтениями (MACS, HOMER и т. д.)

22 of 55

Footprinting

ДНК-аза не может разрезать там, где сидит белок

При сортировке фрагментов при форезе в геле получаются пропущенные полосы

23 of 55

Footprinting

v

(Neph 2012)

24 of 55

Dnase-specific protocol: ENCODE

25 of 55

DNase-specific software: HotSpots2

Hotspots (height = score)

enrichment of tags relative to a local background model based

on the number of tags in a 50kb surrounding window

26 of 55

ATAC-seq

ATAC-seq stands for Assay for Transposase-Accessible Chromatin with high throughput sequencing. ATAC-seq employs a mutated hyperactive transposase Tn5. The high activity allows for highly efficient cutting of exposed DNA and simultaneous ligation of specific sequences adapters.

27 of 55

Transposase and trasposones

  • Transposase is an enzyme that binds to the end of a transposon and catalyses its movement to another part of the genome by a cut and paste mechanism or a replicative transposition mechanism.

28 of 55

Cut & Paste transposition

29 of 55

Nobel prize for the discovery of transposones

30 of 55

DNAse-seq vs ATAC-seq

31 of 55

FAIRE-seq

Formaldehyde-

Assisted

Isolation of

Regulatory

Elements

32 of 55

MNase-seq

micrococcal

nuclease

33 of 55

The steps in CUT&Tag.

  1. Added antibody (green) binds to the target chromatin protein (blue) between nucleosomes (gray ovals) in the genome, and the excess is washed away.
  2. A second antibody (orange) is added and enhances tethering of pA-Tn5 transposome (gray boxes) at antibody-bound sites.
  3. After washing away excess transposome, addition of Mg++ activates the transposome and integrates adapters (red) at chromatin protein binding sites.
  4. After DNA purification genomic fragments with adapters at both ends are enriched by PCR.

Kaya-Okur HS, Wu SJ, Codomo CA, Pledger ES, Bryson TD, Henikoff JG, Ahmad K, Henikoff S. CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nat Commun. 2019 Apr 29;10(1):1930. doi: 10.1038/s41467-019-09982-5. PMID: 31036827; PMCID: PMC6488672.

CUT&Tag

ChIP-seq alternative

34 of 55

35 of 55

Сравнение методов

36 of 55

ATAC-seq

DNase-seq

MNase-seq

CUT&TAG or related ChIC techniques

Enzyme type

Tn5

Endonuclease

Endo- and exonuclease

Tn5 conjugated to an antibody via protein A

Sequence bias?

Yes; complex, Tn5 insertion bias, with preference for A/T in insertion site and C/G flanking

Yes; complex, partially dependent on enzyme concentration and on meth. status of CpGs

Yes; preferential cutting upstream of A/T compared with G/C

Yes; dictated by antibody used to guide Tn5 and by Tn5 bias

№ of input cells or nuclei for bulk

500–50,000

1–10 million

10,000–100,000

100,000–500,000

Low-input/single-cell

Yes; commercial solutions available

Yes

Yes

Yes

Sample type

Fresh or cryopreserved cells or nuclei. Fresh or frozen tissues

Fresh or cryopreserved cells or nuclei. Fresh or frozen tissues. Formaldehyde cross-linked or FFPE samples

Fresh or cryopreserved cells or nuclei. Fresh or frozen tissues. Formaldehyde cross-linked samples

Fresh or cryopreserved cells or nuclei. Fresh or frozen tissues

Library preparation time

~10 h for 12 samples (this protocol)

1–3 d

~2 d

1–2 d

Technical considerations

Library quality is highly dependent on cell viability. Protocol alterations are required for use on fixed cells, and data quality is often reduced for those samples

Enzyme concentration and digestion duration may need to be optimized to sample type

Size of fragments selected affects downstream analysis

Enzyme concentration and digestion duration may need to be optimized to sample type

Apparent nucleosome occupancy is a function of MNase concentration

The amount of antibody used must be titrated for the cell type or sample. This will be a function of the strength of the antibody and the abundance of the target protein

The assay is as specific as the primary antibody used. Additionally, this is a targeted technique, so additional libraries must be made of each modification or protein tested

Sequencing type

Paired-end

Single-end

Single-end

Single-end or paired-end

Sequencing depth

Low; 10 million read pairs per sample with Omni-ATAC

Medium/high: 20–50 million uniquely mapping reads per sample; 200 million for TF footprinting

High; 150–200 million reads per sample (human)

Very low; 3 million read pairs per sample

Data produced

Tn5-accessible chromatin

DNase-accessible chromatin; TF footprinting

Nucleosome positioning, inaccessible chromatin

Location of target on DNA

Major advantage

Links labeling of accessible regions and next-generation sequencing library preparation, making preparation of library straightforward

Footprinting analysis

Method of choice for nucleosome positioning and quantitative nucleosome dynamics

Enables mapping of specific TF or histone modification in low cell numbers. Some histone modifications, like H3K27ac, can be used to look for active enhancers

37 of 55

Why using ATAC-seq?

ATAC-seq provides genome-wide information on open chromatin regions at nucleotide resolution

What information?

  • Profile regulatory elements (promoters, enhancers), which are

accessible to transcription machinery

  • Nucleosome positioning and chromatin compaction
  • Characterize genome wide DNA-protein interactions (TF, RNA polymerase)

38 of 55

Why using ATAC-seq?

  • Rapid assay preparation time, less experimental calibration
  • Protocol requires a small input (500-50,000 cells)
  • To quantify differences in cellular response to treatment or disease
  • To gain mechanistic insight into gene regulation by TF footprinting

Is chromatin accessibility indicative of active/functional regulatory regions?

“Patterns of reads in open chromatin regions result from a complex interplay of experimental

effects with TF binding and nucleosome occupancy, among other biological factors”

He, H. H. et al. Refined DNase-seq protocol and data analysis reveals intrinsic bias in transcription factor footprint identification. Nature Methods (2014).

39 of 55

ATAC-seq compared

with ChIP-seq

https://www.sciencedirect.com/scienc

e/article/pii/S1350946216300301#fig6

40 of 55

Specifics of peak calling

  • Dnase- and ATAC – seq:
    • search for any kind of regulatory region
    • cannot find a regulator, only target regions
    • cleavage site is at the end of the fragment.

41 of 55

ATAC-seq analysis workflow

Remove adaptors & quality trimming,

Quality control

Reads mapping to the genome

Remove mitochondrial reads, reduce PCR duplicates

Select nucleosome-free fragments

FastQC

cutadapt

bowtie2

grep, samtools,

Picard tools

awk, samtools

Why?

42 of 55

Why should we reduce reads, which map on mitochondrial DNA?

ATAC-seq samples may contain ~20–80% of mitochondrial sequencing reads, depending on the cell type

Regions of open chromatin of interest are usually located in the nuclear genome

Reduce PCR duplicates

From: softgenetics.com

43 of 55

ATAC-seq analysis workflow

Remove adaptors & quality trimming, Quality control

Reads mapping to the genome (paired-end)

Remove mitochondrial reads, reduce PCR duplicates

Select nucleosome-free fragments

FastQC cutadapt

bowtie2

grep, samtools, Picard tools

awk, samtools

How?

44 of 55

ATAC-seq provides genome-wide information on chromatin compaction

Buenrostro (2013) Nature Methods

  • Insert size distribution of sequenced fragments shows a periodicity of ~200bp
  • Selecting reads that are shorter than the length generally protected by a nucleosome

nucleosome- free fragments

45 of 55

ATAC-seq analysis workflow (2)

Peak calling

Peaks visualization

Peak annotation, Motif discovery, TF footprinting

NGSplot, IGV

Homer, GREAT, CEAS

macs2

46 of 55

Peak calling and visualization in a genome browser

47 of 55

Identify ATAC-seq bias

treatment

  • Controlling for the enzymatic cleavage bias with “naked DNA” control The Tn5 transposase is known to cleave DNA in a sequence-dependent manner, because of its tendency to cleave some DNA sequences more efficiently than others.

Chung, H.-R. et al. The effect of micrococcal nuclease digestion on nucleosome positioning data. PLoS ONE (2010).

  • Avoiding high read redundancy:

Filter out duplicate reads to avoid calling false peaks (i.e. reads at the exact same genome location and the same strand if their number exceeds the expected redundancy).

  • Adjusting for sequencing depth:

High levels of coverage are need for an informative experiment

control

ENCODE consortium’s Standards, Guidelines and Best Practices: https://www.encodeproject.org/atac-seq/

48 of 55

Identify ATAC-seq bias

“Better methods are also needed to compare chromatin profiles between treatment groups and to account for variability in sample quality, enrichment level, batch effect and read depth”

Identifying and mitigating bias in next-generation sequencing methods for chromatin

biology. Clifford A. Meyer & X. Shirley Liu, Nature Reviews Genetics 2014

http://www.nature.com/nrg/journal/v15/n11/full/nrg3788.html#t1

Omni-ATAC, an improved ATAC-seq protocol, with improvement of signal- to-background ratio, identifies more accessible regions, using fewer cells and less amounts of transposase

(Corces et al, Nature Methods 2017)

49 of 55

Additional applications of ATAC-seq

Genome-wide information on

nucleosome positioning in regulatory regions

50 of 55

Integrating ATAC-seq data and other epigenetic information

For studying the relationship between genome structure and changes in global patterns of gene expression

For more reading:

Integration of ATAC-seq and RNA-seq to generate dynamic gene regulatory networks:

A Transcriptional Time Course of Myeloid Differentiation

(Ramirez et al., 2017, Cell Systems)

Combining ATAC-Seq with

RNA-Seq:

https://link.springer.com/protocol/10.1007% 2F978-1-4939-8618-7_15

51 of 55

“The chromatin accessibility landscape of primary human cancers”

Science (2018)

  • Generated ATAC-seq data in 410 tumor samples from TCGA across 23 cancer types.
  • Predicted interactions between DNA regulatory elements and gene promoters by genome-wide correlation of gene expression and chromatin accessibility, integrated with other omics data available for the tumor samples.

52 of 55

Applications of ATAC-seq

Infer footprints of DNA-protein binding

(genome-wide factor occupancy)

53 of 55

High-throughput single-cell ATAC-seq

Toward Single-cell “Regulomics”?

Single-cell chromatin accessibility can reveals cell-type-specific epigenomic variability

Experimental design for using ATAC-seq to identify epigenomic states of

multiple cell types from human donors:

Schematic of the ATAC-seq workflow on the SMARTer ICELL8 system:

54 of 55

ATAC-seq lecture summary

  • Using NGS to profiles genome-wide epigenetic modifications
  • ATAC-seq captures open and accessible regions of chromatin
  • A transposase Tn5 cuts an exposed DNA region and simultaneous ligated sequencing adapters
  • A bioinformatic workflow based on nucleosome-free fragments is available
  • ATAC-seq enables profiling promoters, enhancers, or other regulatory elements accessible to transcription machinery
  • Application in high-throughput single-cell ATAC-seq

55 of 55

Дополнительная литература

  • Buenrostro JD, Wu B, Chang HY, Greenleaf WJ. ATAC-seq: A Method for Assaying Chromatin Accessibility Genome-Wide. Curr Protoc Mol Biol. 2015 Jan 5;109:21.29.1-21.29.9. doi: 10.1002/0471142727.mb2129s109. PMID: 25559105; PMCID: PMC4374986.
  • Gao, V.R., Yang, R., Das, A. et al. ChromaFold predicts the 3D contact map from single-cell chromatin accessibility. Nat Commun 15, 9432 (2024). https://doi.org/10.1038/s41467-024-53628-0
  • Gur, E.R., Hughes, J.R. scATAC-seq generates more accurate and complete regulatory maps than bulk ATAC-seq. Sci Rep 15, 3665 (2025). https://doi.org/10.1038/s41598-025-87351-7
  • Siyuan Cheng, Benpeng Miao, Tiandao Li, Guoyan Zhao, Bo Zhang, Review and Evaluate the Bioinformatics Analysis Strategies of ATAC-seq and CUT&Tag Data, Genomics, Proteomics & Bioinformatics, Volume 22, Issue 3, June 2024, qzae054, https://doi.org/10.1093/gpbjnl/qzae054
  • Minnoye, L., Marinov, G.K., Krausgruber, T. et al. Chromatin accessibility profiling methods. Nat Rev Methods Primers 1, 10 (2021). https://doi.org/10.1038/s43586-020-00008-9
  • Zhaonan Zou, Tazro Ohta, Shinya Oki, ChIP-Atlas 3.0: a data-mining suite to explore chromosome architecture together with large-scale regulome data, Nucleic Acids Research, Volume 52, Issue W1, 5 July 2024, Pages W45–W53, https://doi.org/10.1093/nar/gkae358
  • Yongjing Liu, Liangyu Fu, Kerstin Kaufmann, Dijun Chen, Ming Chen, A practical guide for DNase-seq data analysis: from data management to common applications, Briefings in Bioinformatics, Volume 20, Issue 5, September 2019, Pages 1865–1877, https://doi.org/10.1093/bib/bby057
  • Adams, J. (2008) The complexity of gene expression, protein interaction, and cell differentiation. Nature Education 1(1):110 link
  • Bhatt B, García-Díaz P, Foight GW. Synthetic transcription factor engineering for cell and gene therapy. Trends Biotechnol. 2024 Apr;42(4):449-463. doi: 10.1016/j.tibtech.2023.09.010. Epub 2023 Oct 19. PMID: 37865540.