1 of 32

Evaluating the effects of Hi-C and TF on cis-regulation of gene expression

Team leads: Alireza Karbalayghareh (MSKCC), Rui Yang (MSKCC)

Team members: Iryna Irkliyenko (UCSF), Bharath Saravanan (UCSD), Joel Pepper (Drexel)

3 of 32

Model Introduction

GraphReg - predict CAGE-seq expression level using epigenomic signals + Hi-C significant loops

Epiphany - predict Hi-C contact maps using epigenomic signals

1D TF ChIP-seq (x38)

Hi-C

True
predicted

4 of 32

Project Overview

Use pre-trained deep learning models to study the role of Hi-C and TF towards gene expression (CAGE-seq) prediction

Question 1. How well does the model perform using experimental Hi-C vs. predicted Hi-C

How much can predicted Hi-C reconstruct for Enhancer-Promoter Interactions?
For predicted Hi-C - using different tracks: disentangle 3D signals for with/without enhancer/promoter information

Question 2. Does the model need 3D information to correctly predict gene expression?

Question 3. What are the roles of transcription factor towards gene expression prediction?

In-silico TF knock out vs. experimental CRISPR TF knock out
Which gene expression change after TF KO?
Interpret feature attributions

Only focused on one cell line - K562

5 of 32

Preliminary Results

6 of 32

Project Overview

Use pre-trained deep learning models to study the role of Hi-C and TF towards gene expression (CAGE-seq) prediction

Question 1. How well does the model perform using experimental Hi-C vs. predicted Hi-C

How much can predicted Hi-C reconstruct for Enhancer-Promoter Interactions?
For predicted Hi-C - using different tracks: disentangle 3D signals for with/without enhancer/promoter information

Question 2. Does the model need 3D information to correctly predict gene expression?

Question 3. What are the roles of transcription factor towards gene expression prediction?

In-silico TF knock out vs. experimental CRISPR TF knock out
Which gene expression change after TF KO?
Interpret feature attributions

7 of 32

Prediction from MSE loss model recalls ~50% of significant loops compared with real Hi-C

Recall:

At each genomic distance, how much proportion of the significant loops can be recalled from Hi-C prediction
Significant loop from real Hi-C: p-value < 0.1 from HiC-DC+
Significant loops from predicted Hi-C: z-value > 1 (loose criteria)

⇒ MSE loss gives “blobby” predictions

We have more false positives than false negatives

Epiphany: trained with DNaseI + CTCF + H3K27ac + H3K4me3

8 of 32

Prediction from MSE loss model recalls ~90% of Enhancer-Promoter Interactions

Although Epiphany prediction only captures ~50% of the significant interactions from real Hi-C

⇒ it overlaps well with the true E-P interaction

Epiphany predicted interactions

Experimental interactions

Overlap - high proportion of E-P interactions

False predictions: structural interactions?

9 of 32

Predicted Hi-C skews the enhancer-promoter distribution towards more interactions.

Prediction using model trained with MSE loss: “blobby predictions”

10 of 32

Adversarial loss improves the prediction of significant loops on Hi-C

Predictive modeling - choose of loss function

Top 1% interactions in the

Actual Hi-C
Prediction from model trained with MSE
Prediction using model trained with MSE+adversarial loss

⇒ MSE loss would over-smooth the signals during prediction

⇒ Adding adversarial component may help predict consistent distribution

The current results are all based on MSE loss-trained model.

11 of 32

Project Overview

Use pre-trained deep learning models to study the role of Hi-C and TF towards gene expression (CAGE-seq) prediction

Question 1. How well does the model perform using experimental Hi-C vs. predicted Hi-C

How much can predicted Hi-C reconstruct for Enhancer-Promoter Interactions?
For predicted Hi-C - using different tracks: disentangle 3D signals for with/without enhancer/promoter information

Question 2. Does the model need 3D information to correctly predict gene expression?

Question 3. What are the roles of transcription factor towards gene expression prediction?

In-silico TF knock out vs. experimental CRISPR TF knock out
Which gene expression change after TF KO?
Interpret feature attributions

12 of 32

GraphReg predictions using predicted Hi-C are more accurate than CNN predictions without any Hi-C information!

Predicted Hi-C: using MSE model

⇒ Even with non-ideal predictions, the prediction is still better than without 3D information

13 of 32

Project Overview

Use pre-trained deep learning models to study the role of Hi-C and TF towards gene expression (CAGE-seq) prediction

Question 1. How well does the model perform using experimental Hi-C vs. predicted Hi-C

How much can predicted Hi-C reconstruct for Enhancer-Promoter Interactions?
For predicted Hi-C - using different tracks: disentangle 3D signals for with/without enhancer/promoter information

Question 2. Does the model need 3D information to correctly predict gene expression?

Question 3. What are the roles of transcription factor towards gene expression prediction?

In-silico TF knock out vs. experimental CRISPR TF knock out
Which gene expression change after TF KO?
Interpret feature attributions

14 of 32

In-silico TF KO in GraphReg models can better predict the effects on target genes, as evaluated by TF CRISPR KO experiments.

ATF3 / GraphReg - real Hi-C

ATF3 / GraphReg - predicted Hi-C

ATF3 / CNN

15 of 32

Which TFs up/down regulate which genes?

16 of 32

What are the best (top 20) target genes of TFs as predicted correctly by GraphReg?

log2FoldChange

True Hi-C

17 of 32

log2(N+1)

True Hi-C

18 of 32

Activator/Repressor prediction from TF-KO/In-Silico KO

Each point refers to the KO/In-silico mutation of a particular TF
All points below the diagonal refer to transcriptional activators
GraphReg predicts larger significant regulated genes relative to the CNN model

19 of 32

SHAP values for MYC in enhancers matches the TF KO experiment results.

20 of 32

SHAP values for MYC in enhancers matches the TF KO experiment results.

These TFs significantly up-regulate MYC based on the CRISPR KO experiments.

21 of 32

SHAP values for MYC in enhancers matches the TF KO experiment results.

22 of 32

SHAP values for MYC in enhancers matches the TF KO experiment results.

23 of 32

Accuracy of SHAP values predicting TF-KO experimental gene expression changes

Based on the Promoter/Enhancer SHAP values, TFs can be segregated as Promoter acting and Enhancer acting.

24 of 32

Acknowledgement

Ira Irkliyenko - SHAP value interpretation

Bharath Saravanan - ISM (in-silico mutation) analysis

Joel Pepper - Hi-C prediction evaluation

Alireza Karbalayghareh, Rui Yang - Project Leads

26 of 32

Additional Plots

27 of 32

Examples of GraphReg In-silico mutation effects prediction

GraphReg CNN

NRF1

ATF3

28 of 32

GraphReg performs better at higher expression predictions

Each point is a particular TF

Number of In-silico mutation dysregulated genes matched to the true KO genes

29 of 32

GraphReg ISM Predicted Genes

CNN ISM Predicted Genes

GraphReg and CNN ISM Predicted Genes

1 of 32

2 of 32

3 of 32

4 of 32

5 of 32

6 of 32

7 of 32

8 of 32

9 of 32

10 of 32

11 of 32

12 of 32

13 of 32

14 of 32

15 of 32

16 of 32

17 of 32

18 of 32

19 of 32

20 of 32

21 of 32

22 of 32

23 of 32

24 of 32

25 of 32

26 of 32

27 of 32

28 of 32

29 of 32

30 of 32

31 of 32

32 of 32