Transcriptomics and epigenomics
Spring 2024
(Based on slides made by Ivan Antonov and Bareket Dassa)
Part 1. Biological background�Regulatory regions
На прошлом занятии мы изучили ChIP-seq,
Вопрос: когда транскрипционный фактор может связаться с ДНК?
Learning objectives
Simple eukaryotic transcriptional unit (a) vs complex metazoan transcriptional control module (b)
Regulatory elements
Transcription factor
- A protein that binds to specific DNA sequences
- Usually in a complex
- Promote (as an activator) or block (as a repressor) the recruitment of RNA polymerase
- More than 1500 TF in humans
Regulatory elements: promoter
A regulatory region of DNA usually located upstream (or around) TSS, providing a control point for regulated gene transcription initiation
Promoter elements
Not all of these elements are present in all promoters and many of these elements are present in lineage-specific genes.
Genes & Dev. 2002. 16: 2583-2592
Enhancer
Relatively distant regulatory element: can be proximal (~300bp) and distal (up to Mb)
Promoter vs enhancer
Insulator
CTCF
CTCF function
Aire regulates chromatin looping by evicting CTCF from domain boundaries and favoring accumulation of cohesin on superenhancers (2021)
Part 2. Next Generation Sequencing�DNA accessibility analysis�(DNAse-seq and ATAC-seq)
Regulation of transcription
ENCODE project
DNase-seq
Nature Methods 11, 39–40 (2014)
Identification of the regulatory regions, based on the genome-wide sequencing of regions sensitive to cleavage by DNase I.
DNase protocol
For the analysis the same ChIP-seq software can be used
The only difference is that we are not looking for the center but rather for the edge of the fragment
Solution: move reads ½ of the length upstream
Footprinting
Footprinting
v
(Neph 2012)
Dnase-specific protocol: ENCODE
DNase-specific software: HotSpots
Hotspots (height = score)
enrichment of tags relative to a local background model based
on the number of tags in a 50kb surrounding window
Different Types of DNase-Seq
ATAC-seq
ATAC-seq stands for Assay for Transposase-Accessible Chromatin with high throughput sequencing. ATAC-seq employs a mutated hyperactive transposase Tn5. The high activity allows for highly efficient cutting of exposed DNA and simultaneous ligation of specific sequences adapters.
Transposase and trasposones
Nobel prize for the discovery of transposones
Nobel prize lecture:
https://www.nobelprize.org/prizes/medicine/1983/mcclintock/lecture/
DNAse-seq vs ATAC-seq
Why using ATAC-seq?
ATAC-seq provides genome-wide information on open chromatin regions at nucleotide resolution
What information?
accessible to transcription machinery
Why using ATAC-seq?
Is chromatin accessibility indicative of active/functional regulatory regions?
“Patterns of reads in open chromatin regions result from a complex interplay of experimental
effects with TF binding and nucleosome occupancy, among other biological factors”
He, H. H. et al. Refined DNase-seq protocol and data analysis reveals intrinsic bias in transcription factor footprint identification. Nature Methods (2014).
ATAC-seq compared with ChIP-seq
https://www.sciencedirect.com/scienc
e/article/pii/S1350946216300301#fig6
Specifics of peak calling
ATAC-seq analysis workflow
Remove adaptors & quality trimming,
Quality control
Reads mapping to the genome
Remove mitochondrial reads, reduce PCR duplicates
Select nucleosome-free fragments
FastQC
cutadapt
bowtie2
grep, samtools,
Picard tools
awk, samtools
Why?
Why should we reduce reads, which map on mitochondrial DNA?
ATAC-seq samples may contain ~20–80% of mitochondrial sequencing reads, depending on the cell type
Regions of open chromatin of interest are usually located in the nuclear genome
Reduce PCR duplicates
From: softgenetics.com
ATAC-seq analysis workflow
Remove adaptors & quality trimming, Quality control
Reads mapping to the genome (paired-end)
Remove mitochondrial reads, reduce PCR duplicates
Select nucleosome-free fragments
FastQC cutadapt
bowtie2
grep, samtools, Picard tools
awk, samtools
How?
ATAC-seq provides genome-wide information on chromatin compaction
Buenrostro (2013) Nature Methods
nucleosome- free fragments
ATAC-seq analysis workflow (2)
Peak calling
Peaks visualization
Peak annotation, Motif discovery, TF footprinting
NGSplot, IGV
Homer, GREAT, CEAS
macs2
Peak calling and visualization in a genome browser
Identify ATAC-seq bias
treatment
Chung, H.-R. et al. The effect of micrococcal nuclease digestion on nucleosome positioning data. PLoS ONE (2010).
Filter out duplicate reads to avoid calling false peaks (i.e. reads at the exact same genome location and the same strand if their number exceeds the expected redundancy).
High levels of coverage are need for an informative experiment
control
ENCODE consortium’s Standards, Guidelines and Best Practices: https://www.encodeproject.org/atac-seq/
Identify ATAC-seq bias
“Better methods are also needed to compare chromatin profiles between treatment groups and to account for variability in sample quality, enrichment level, batch effect and read depth”
Identifying and mitigating bias in next-generation sequencing methods for chromatin
biology. Clifford A. Meyer & X. Shirley Liu, Nature Reviews Genetics 2014
http://www.nature.com/nrg/journal/v15/n11/full/nrg3788.html#t1
Omni-ATAC, an improved ATAC-seq protocol, with improvement of signal- to-background ratio, identifies more accessible regions, using fewer cells and less amounts of transposase
(Corces et al, Nature Methods 2017)
Additional applications of ATAC-seq
Genome-wide information on
nucleosome positioning in regulatory regions
Integrating ATAC-seq data and other epigenetic information
For studying the relationship between genome structure and changes in global patterns of gene expression
For more reading:
Integration of ATAC-seq and RNA-seq to generate dynamic gene regulatory networks:
A Transcriptional Time Course of Myeloid Differentiation
(Ramirez et al., 2017, Cell Systems)
Combining ATAC-Seq with
RNA-Seq:
https://link.springer.com/protocol/10.1007% 2F978-1-4939-8618-7_15
“The chromatin accessibility landscape of primary human cancers”
Science (2018)
Applications of ATAC-seq
Infer footprints of DNA-protein binding
(genome-wide factor occupancy)
High-throughput single-cell ATAC-seq
Toward Single-cell “Regulomics”?
Single-cell chromatin accessibility can reveals cell-type-specific epigenomic variability
Experimental design for using ATAC-seq to identify epigenomic states of
multiple cell types from human donors:
Schematic of the ATAC-seq workflow on the SMARTer ICELL8 system:
ATAC-seq lecture summary