1 of 32

ChromBPNet: Deep learning models of base-resolution chromatin profiles reveal cis-regulatory syntax and regulatory variation

AGCCAAGCAGCAAAGTTTTGCTGCTGTTTATTTTTGTAGCTCTTACTATATTCTACTTTTACCATTGAAAATATTGAGGAAGTTATTTATATTTCTATTTTTTATATATTATATATTTTATGTATTTTAATATTACTATTACACATAATTATTTTTTATATATATGAAGTACCAATGACTTCCTTTTCCAGAGCAATAATGAAATTTCACAGTATGAAAATGGAAGAAATCAATAAAATTATACGTGACCTGTGGCGAAGTACCTATCGTGGACAAGGTGAGTACCATGGTGTATCACAAATGCTCTTTCCAAAGCCCTCTCCGCAGCTCTTCCCCTTATGACCTCTCATCATGCCAGCATTACCTCCCTGGACCCCTTTCTAAGCATGTCTTTGAGATTTTCTAAGAATTCTTATCTTGGCAACATCTTGTAGCAAGAAAATGTAAAGTTTTCTGTTCCAGAGCCTAACAGGACTTACATATTTGACTGCAGTAGGCATTATATTTAGCTGATGACATAATAGGTTCTGTCATAGTGTAGATAGGGATAAGCCAAAATGCAATAAGAAAAACCATCCAGAGGAAACTCTTTTTTTTTTCTTTTTCTTTTTTTTTTTTCCAGATGGAGTCTCGCACTTCTCTGTCACCCGGGCTGGAGCGCAGTGGTGCAATCTTGGCTCACTGCAACCTCCACCTCCTGGGTTCAGGTGATTCTCCCACCTCAGCCTCCCGAGTAGTAGCTGGAATTACAGGTGCGCGCTCCCACACCTGGCTAATTTTTTGTATTCTTAGTAGAGATGGGGTTTCACCATGTTGGCCAGGCTGGTCTCAAACTCCTGCCCTCAGGTGATCTGCCCACCTTGGCCTCCCAGTGTTGGGTTTACAGGCGTGAGCCACCGCGCCTGGCCTGGAGGAAACTCTTAACAGGGAAACTAAGAAAGAGTTGAGGCTGAGGAACTGGGGCATCTGGGTTGCTTCTGGCCAGACCACCAGGCTCTTGAATCCTCCCAGCCAGAGAAAGAGTTTCCACACCAGCCATTGTTTTCCTCTGGTAATGTCAGCCTCATCTGTTGTTCCTAGGCTTACTTGATATGTTTGTAAATGACAAAAGGCTACAGAGCATAGA

Anna Shcherbina

Anusri Pampari

Anshul Kundaje

2 of 32

Regulatory DNA

Adapted from Shlyueva et al. (2014) Nature Reviews Genetics.

transcription factors

nucleosomes

sequence motifs

histone modifications

Sequence motifs

accessible chromatin

3 of 32

Profiling regulatory DNA

ATAC-seq/DNase-seq

4 of 32

Predictive sequence models of chromatin profiles

…GACAGATAATGCATTGA…

…GACTTGAAACGGCATTG…

No Peak (0) (0.3)

Peak (+1) (20.2)

 

 

Class = +1 (20.2)

Class = +1 (10.6)

Class = +1 (15.8)

Class = 0 (0.3)

Class = 0 (1.2)

Class = 0 (3.5)

 

Peak

No peak

…GACAGATAATGCATTGA…

…ACTGTCATGGATATTCT…

…GACTTGAAACGGCATTG…

…CAGTATGCATACGTGAA…

…CAACCTTGAACGGCATTG…

GATATTCTACTGTAAG…

Arvey et al. 2012

Ghandi et al. 2014

Setty et al. 2015

Alipanahi et al. 2015

Zhou et al. 2015

Kelly et al. 2016, 2018

Avsec et al. 2021

DNase-seq / ATAC-seq data

5 of 32

Complex, multi-resolution ‘shapes’, ‘spans’ and footprints

DNase-seq / �ATAC-seq

6 of 32

Consider an enzyme that only cuts at C

An extreme case

ACGAAACAATTGAGATACCAAAGTAAGTAT

True accessibility

Enzyme bias

Observed accessibility

We are interested in true accessibility,

which unfortunately can get distorted

by the enzyme bias depending on how

strong the enzyme bias is.

Enzyme preference bias

7 of 32

Tn5 and DNase-I enzyme sequence bias (PWM and k-mer models)

Tn5 cleavage logo

Dnase-I cleavage logo

HINT-ATAC (Li et al 2019)

8 of 32

ChromBPNet: Sequence to base-pair chromatin accessibility profiles

C

G

A

T

A

A

C

C

G

A

T

A

T

1 Kb sequence

Based on Avsec et al. Nature Genetics 2021

NN enzyme bias predictor

total Tn5/DNase insertions (1 kb)

base-resolution probability profile (1 kb)

9 of 32

How to estimate Tn5 / DNase bias?

Read distribution in background regions is a function of enzyme bias

Learns multiple Tn5

bias motifs

10 of 32

K562 DNase-seq prediction performance (held-out chromosomes)

Log(observed counts)

Spearman correlation = 0.7

Log(predicted total counts)

Total counts prediction performance

Jensen-Shannon Distance

Worst limit

Best limit

Observed vs. predicted profile

Profile prediction performance

11 of 32

ChromBPNet provides denoising and imputation of footprints at individual loci

      Tn5 

denoising!

Using existing work (HINT-ATAC and TOBIAS) 

Observed track

ChromBPNet

 without

Bias correction

ChromBPNet

 bias model

ChromBPNet

with

 HINT-ATAC correction

ChromBPNet 

With

 TOBIAS correction

ChromBPNet

 with

 bias correction

CTCF

Tn5

CTCF Footprint

Sequence contribution scores are obtained using an algorithm called DeepLIFT (Shrikumar et al 2017))

12 of 32

High concordance of footprints and contribution scores after correction in ATAC-seq and DNASE-seq

ATAC-seq

Observed track

Without bias correction predictions

bias corrected predictions

bias corrected contribution scores

DNase-seq

Observed track

Without bias correction predictions

bias corrected predictions

bias corrected contribution scores

K562 Models

NRF1

NRF1

KLF12

13 of 32

Deciphering cell-type specific motifs and footprints

14 of 32

TF-MoDISCO: Cluster and consolidate predictive subsequences into contribution weight matrix (CWM) motifs

14

Insight: conv. filter contributions are integrated at the nucleotide level

Sequence 1

Sequence 2

Sequence 3

DeepLIFT

DeepLIFT

DeepLIFT

Shrikumar et al. 2018, arxiv

Avanti Shrikumar

Alex Tseng

15 of 32

GM12878

IRF

SP/KLF

RUNX

SPI1

ELK/ETS

NFKB

AP1

ATF

NFY

K562

GATA+TAL

SP/KLF

CTCF

ELK/ETS

AP1

NRF1

NFY

ATF

ETV

HepG2

KLF

FOXA

HNF4G

CTCF

GABPA

CEBPB

AP1

NFY

TCF7L2

H1-hESC

KLF

OCT-SOX

CTCF

ZIC3

SOX2

SP

TEAD

NRF1

NFY

Soumya Kundu

16 of 32

NfkB

GATA

HNF4A

ChromBPNet can predict marginal footprints of cell-type specific TFs

SP1

Uncorrected

Profile probability prediction

GM1287 

ATAC-seq

K562 

ATAC-seq

HEPG2 

ATAC-seq

200bp surrounding the motif insertion site

17 of 32

NfkB

GATA

HNF4A

ChromBPNet can predict marginal footprints of cell-type specific TFs

SP1

Uncorrected

Profile probability prediction

200bp surrounding the motif insertion site

Corrected

GM1287 

ATAC-seq

K562 

ATAC-seq

HEPG2 

ATAC-seq

18 of 32

GATA+TAL

ChromBPNet allows systematic comparison of DNASE & ATAC-seq footprints

Uncorrected

Profile probability prediction

200bp surrounding the motif insertion site

K562

ATAC-seq

K562 

DNASE-seq

Corrected

NFYB

GABPA

BACH

NRF1

HNF4A

(control)

K562

ATAC-seq

K562 

DNASE-seq

200bp surrounding the motif insertion site

19 of 32

Cooperative effects on footprints

AP1 + TEAD in fibroblasts

AP1

TEAD

AP1+TEAD

Surag Nair

Profile probability prediction

200bp surrounding the motif insertion site

20 of 32

Regulatory variant effect prediction

21 of 32

ChromBPNet can predict TF binding QTLs in LCLs for multiple chromatin read outs

SPI1 TF ChIP-seq

DNase-seq

H3K27ac

ChIP-seq

1 Kb

5 Kb

1 Kb

rs5764238 (ref=C)

SPI1 bQTs: Tehranchi et al. 2016

22 of 32

Example: SPI1 bQTL in LCLs

rs5764238 (alt=G)

rs5764238 (ref=C)

SPI1 TF ChIP-seq

DNase-seq

H3K27ac

ChIP-seq

1 Kb

5 Kb

1 Kb

23 of 32

ChromBPNet reveals “blast radius” of regulatory variants on diff. chromatin readouts

“Blast radius” of a variant

1 Kb

5 Kb

1 Kb

SPI1 TF ChIP-seq

DNase-seq

H3K27ac

ChIP-seq

24 of 32

DeepLIFT provides insights into motifs disrupted by

ref=C

ref=C

ref=C

alt=G

alt=G

alt=G

200 bp

200 bp

200 bp

1 Kb

5 Kb

1 Kb

SPI1 motifs

SPI1 TF ChIP-seq

DNase-seq

H3K27ac

ChIP-seq

25 of 32

ChromBPNet accurately predicts effect sizes and directions of LCL SPI1 bQTLs

SPI1 bQTs: Tehranchi et al. 2016

Showing 100 most significant and 100 most non-significant (chosen randomly from 200 most non-significant) variants from SPI1 bQTL

GM12878 ATAC-seq model

26 of 32

ChromBPNet outperforms deltaSVM for predicting dsQTLs in LCLs

dsQTLs: Degner et al. 2012

deltaSVM: Lee et al. 2015

Predictions on 579 significant dsQTL SNPs and 30K non-dsQTL SNPs (50 times more than the significant set) used in deltaSVM

27 of 32

Summary

  • ChromBPNet: Accurate base-resolution profile model for ATAC-seq and DNase-seq
  • Superior enzyme bias correction to previous methods
  • Reveals predictive motifs, motif instances & denoised footprints at bp resolution at individual enhancers
  • Can infer marginal synthetic footprints of motifs & cooperative effects of syntax
  • Can reveal high precision effects of genetic variants on footprints and accessibility

28 of 32

Acknowledgements

R01ES02500902

1U24HG009446

1U01HG009431

Funding

1DP2OD022870

1R01HG009674

Surag Nair

Aman Patel

Soumya Kundu

Anshul Kundaje

Jacob Schreiber

Austin Wang

Avanti Shrikumar

29 of 32

Additional Slides

30 of 32

31 of 32

32 of 32