Transcriptomics and epigenomics
Spring 2025
Part 1. Biological background�Regulatory regions
На прошлом занятии мы изучили ChIP-seq,
Вопрос: когда транскрипционный фактор может связаться с ДНК?
Цели
Simple eukaryotic transcriptional unit (a) vs
complex metazoan transcriptional control module (b)
Regulatory elements
Transcription factor
- A protein that binds to specific DNA sequences
- Usually in a complex
- Promote (as an activator) or block (as a repressor) the recruitment of RNA polymerase
- More than 1500 TF in humans
Regulatory elements: promoter
A regulatory region of DNA usually located upstream (or around) TSS, providing a control point for regulated gene transcription initiation
Promoter elements
Not all of these elements are present in all promoters and many of these elements are present in lineage-specific genes.
Genes & Dev. 2002. 16: 2583-2592
Искусственные промоторы у трансгенов
Enhancer
Relatively distant regulatory element: can be proximal (~300bp) and distal (up to Mb)
Promoter vs enhancer
Insulator
CTCF
CTCF function
Aire regulates chromatin looping by evicting CTCF from domain boundaries and favoring accumulation of cohesin on superenhancers (2021)
Part 2. Next Generation Sequencing�DNA accessibility analysis�(DNAse-seq and ATAC-seq)
Regulation of transcription
ENCODE project
Доступность хроматина изменяется при раке
Доступность хроматина изменяется при раке
DNase-seq
Nature Methods 11, 39–40 (2014)
Identification of the regulatory regions, based on the genome-wide sequencing of regions sensitive to cleavage by DNase I.
DNase protocol
For the analysis the same ChIP-seq software can be used
The only difference is that we are not looking for the center but rather for the edge of the fragment
Solution: move reads ½ of the length upstream
Types of DNase-Seq
Footprinting
ДНК-аза не может разрезать там, где сидит белок
При сортировке фрагментов при форезе в геле получаются пропущенные полосы
Footprinting
v
(Neph 2012)
Dnase-specific protocol: ENCODE
DNase-specific software: HotSpots2
Hotspots (height = score)
enrichment of tags relative to a local background model based
on the number of tags in a 50kb surrounding window
ATAC-seq
ATAC-seq stands for Assay for Transposase-Accessible Chromatin with high throughput sequencing. ATAC-seq employs a mutated hyperactive transposase Tn5. The high activity allows for highly efficient cutting of exposed DNA and simultaneous ligation of specific sequences adapters.
Transposase and trasposones
Cut & Paste transposition
Nobel prize for the discovery of transposones
Nobel prize lecture:
https://www.nobelprize.org/prizes/medicine/1983/mcclintock/lecture/
DNAse-seq vs ATAC-seq
FAIRE-seq
Formaldehyde-
Assisted
Isolation of
Regulatory
Elements
MNase-seq
micrococcal
nuclease
The steps in CUT&Tag.
Kaya-Okur HS, Wu SJ, Codomo CA, Pledger ES, Bryson TD, Henikoff JG, Ahmad K, Henikoff S. CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nat Commun. 2019 Apr 29;10(1):1930. doi: 10.1038/s41467-019-09982-5. PMID: 31036827; PMCID: PMC6488672.
CUT&Tag
ChIP-seq alternative
Сравнение методов
| ATAC-seq | DNase-seq | MNase-seq | CUT&TAG or related ChIC techniques |
Enzyme type | Tn5 | Endonuclease | Endo- and exonuclease | Tn5 conjugated to an antibody via protein A |
Sequence bias? | Yes; complex, Tn5 insertion bias, with preference for A/T in insertion site and C/G flanking | Yes; complex, partially dependent on enzyme concentration and on meth. status of CpGs | Yes; preferential cutting upstream of A/T compared with G/C | Yes; dictated by antibody used to guide Tn5 and by Tn5 bias |
№ of input cells or nuclei for bulk | 500–50,000 | 1–10 million | 10,000–100,000 | 100,000–500,000 |
Low-input/single-cell | Yes; commercial solutions available | Yes | Yes | Yes |
Sample type | Fresh or cryopreserved cells or nuclei. Fresh or frozen tissues | Fresh or cryopreserved cells or nuclei. Fresh or frozen tissues. Formaldehyde cross-linked or FFPE samples | Fresh or cryopreserved cells or nuclei. Fresh or frozen tissues. Formaldehyde cross-linked samples | Fresh or cryopreserved cells or nuclei. Fresh or frozen tissues |
Library preparation time | ~10 h for 12 samples (this protocol) | 1–3 d | ~2 d | 1–2 d |
Technical considerations | Library quality is highly dependent on cell viability. Protocol alterations are required for use on fixed cells, and data quality is often reduced for those samples | Enzyme concentration and digestion duration may need to be optimized to sample type Size of fragments selected affects downstream analysis | Enzyme concentration and digestion duration may need to be optimized to sample type Apparent nucleosome occupancy is a function of MNase concentration | The amount of antibody used must be titrated for the cell type or sample. This will be a function of the strength of the antibody and the abundance of the target protein The assay is as specific as the primary antibody used. Additionally, this is a targeted technique, so additional libraries must be made of each modification or protein tested |
Sequencing type | Paired-end | Single-end | Single-end | Single-end or paired-end |
Sequencing depth | Low; 10 million read pairs per sample with Omni-ATAC | Medium/high: 20–50 million uniquely mapping reads per sample; 200 million for TF footprinting | High; 150–200 million reads per sample (human) | Very low; 3 million read pairs per sample |
Data produced | Tn5-accessible chromatin | DNase-accessible chromatin; TF footprinting | Nucleosome positioning, inaccessible chromatin | Location of target on DNA |
Major advantage | Links labeling of accessible regions and next-generation sequencing library preparation, making preparation of library straightforward | Footprinting analysis | Method of choice for nucleosome positioning and quantitative nucleosome dynamics | Enables mapping of specific TF or histone modification in low cell numbers. Some histone modifications, like H3K27ac, can be used to look for active enhancers |
Why using ATAC-seq?
ATAC-seq provides genome-wide information on open chromatin regions at nucleotide resolution
What information?
accessible to transcription machinery
Why using ATAC-seq?
Is chromatin accessibility indicative of active/functional regulatory regions?
“Patterns of reads in open chromatin regions result from a complex interplay of experimental
effects with TF binding and nucleosome occupancy, among other biological factors”
He, H. H. et al. Refined DNase-seq protocol and data analysis reveals intrinsic bias in transcription factor footprint identification. Nature Methods (2014).
ATAC-seq compared
with ChIP-seq
https://www.sciencedirect.com/scienc
e/article/pii/S1350946216300301#fig6
Specifics of peak calling
ATAC-seq analysis workflow
Remove adaptors & quality trimming,
Quality control
Reads mapping to the genome
Remove mitochondrial reads, reduce PCR duplicates
Select nucleosome-free fragments
FastQC
cutadapt
bowtie2
grep, samtools,
Picard tools
awk, samtools
Why?
Why should we reduce reads, which map on mitochondrial DNA?
ATAC-seq samples may contain ~20–80% of mitochondrial sequencing reads, depending on the cell type
Regions of open chromatin of interest are usually located in the nuclear genome
Reduce PCR duplicates
From: softgenetics.com
ATAC-seq analysis workflow
Remove adaptors & quality trimming, Quality control
Reads mapping to the genome (paired-end)
Remove mitochondrial reads, reduce PCR duplicates
Select nucleosome-free fragments
FastQC cutadapt
bowtie2
grep, samtools, Picard tools
awk, samtools
How?
ATAC-seq provides genome-wide information on chromatin compaction
Buenrostro (2013) Nature Methods
nucleosome- free fragments
ATAC-seq analysis workflow (2)
Peak calling
Peaks visualization
Peak annotation, Motif discovery, TF footprinting
NGSplot, IGV
Homer, GREAT, CEAS
macs2
Peak calling and visualization in a genome browser
Identify ATAC-seq bias
treatment
Chung, H.-R. et al. The effect of micrococcal nuclease digestion on nucleosome positioning data. PLoS ONE (2010).
Filter out duplicate reads to avoid calling false peaks (i.e. reads at the exact same genome location and the same strand if their number exceeds the expected redundancy).
High levels of coverage are need for an informative experiment
control
ENCODE consortium’s Standards, Guidelines and Best Practices: https://www.encodeproject.org/atac-seq/
Identify ATAC-seq bias
“Better methods are also needed to compare chromatin profiles between treatment groups and to account for variability in sample quality, enrichment level, batch effect and read depth”
Identifying and mitigating bias in next-generation sequencing methods for chromatin
biology. Clifford A. Meyer & X. Shirley Liu, Nature Reviews Genetics 2014
http://www.nature.com/nrg/journal/v15/n11/full/nrg3788.html#t1
Omni-ATAC, an improved ATAC-seq protocol, with improvement of signal- to-background ratio, identifies more accessible regions, using fewer cells and less amounts of transposase
(Corces et al, Nature Methods 2017)
Additional applications of ATAC-seq
Genome-wide information on
nucleosome positioning in regulatory regions
Integrating ATAC-seq data and other epigenetic information
For studying the relationship between genome structure and changes in global patterns of gene expression
For more reading:
Integration of ATAC-seq and RNA-seq to generate dynamic gene regulatory networks:
A Transcriptional Time Course of Myeloid Differentiation
(Ramirez et al., 2017, Cell Systems)
Combining ATAC-Seq with
RNA-Seq:
https://link.springer.com/protocol/10.1007% 2F978-1-4939-8618-7_15
“The chromatin accessibility landscape of primary human cancers”
Science (2018)
Applications of ATAC-seq
Infer footprints of DNA-protein binding
(genome-wide factor occupancy)
High-throughput single-cell ATAC-seq
Toward Single-cell “Regulomics”?
Single-cell chromatin accessibility can reveals cell-type-specific epigenomic variability
Experimental design for using ATAC-seq to identify epigenomic states of
multiple cell types from human donors:
Schematic of the ATAC-seq workflow on the SMARTer ICELL8 system:
ATAC-seq lecture summary
Дополнительная литература