Evaluating the effects of Hi-C and TF on cis-regulation of gene expression
Team leads: Alireza Karbalayghareh (MSKCC), Rui Yang (MSKCC)
Team members: Iryna Irkliyenko (UCSF), Bharath Saravanan (UCSD), Joel Pepper (Drexel)
Team 5
Model Introduction
GraphReg - predict CAGE-seq expression level using epigenomic signals + Hi-C significant loops
Epiphany - predict Hi-C contact maps using epigenomic signals
1D TF ChIP-seq (x38)
Hi-C
Project Overview
Use pre-trained deep learning models to study the role of Hi-C and TF towards gene expression (CAGE-seq) prediction
Question 1. How well does the model perform using experimental Hi-C vs. predicted Hi-C
Question 2. Does the model need 3D information to correctly predict gene expression?
Question 3. What are the roles of transcription factor towards gene expression prediction?
Only focused on one cell line - K562
Preliminary Results
Project Overview
Use pre-trained deep learning models to study the role of Hi-C and TF towards gene expression (CAGE-seq) prediction
Question 1. How well does the model perform using experimental Hi-C vs. predicted Hi-C
Question 2. Does the model need 3D information to correctly predict gene expression?
Question 3. What are the roles of transcription factor towards gene expression prediction?
Prediction from MSE loss model recalls ~50% of significant loops compared with real Hi-C
Recall:
⇒ MSE loss gives “blobby” predictions
Epiphany: trained with DNaseI + CTCF + H3K27ac + H3K4me3
Prediction from MSE loss model recalls ~90% of Enhancer-Promoter Interactions
Although Epiphany prediction only captures ~50% of the significant interactions from real Hi-C
⇒ it overlaps well with the true E-P interaction
Epiphany predicted interactions
Experimental interactions
Overlap - high proportion of E-P interactions
False predictions: structural interactions?
Predicted Hi-C skews the enhancer-promoter distribution towards more interactions.
Prediction using model trained with MSE loss: “blobby predictions”
Adversarial loss improves the prediction of significant loops on Hi-C
Predictive modeling - choose of loss function
Top 1% interactions in the
⇒ MSE loss would over-smooth the signals during prediction
⇒ Adding adversarial component may help predict consistent distribution
The current results are all based on MSE loss-trained model.
Project Overview
Use pre-trained deep learning models to study the role of Hi-C and TF towards gene expression (CAGE-seq) prediction
Question 1. How well does the model perform using experimental Hi-C vs. predicted Hi-C
Question 2. Does the model need 3D information to correctly predict gene expression?
Question 3. What are the roles of transcription factor towards gene expression prediction?
GraphReg predictions using predicted Hi-C are more accurate than CNN predictions without any Hi-C information!
Predicted Hi-C: using MSE model
⇒ Even with non-ideal predictions, the prediction is still better than without 3D information
Project Overview
Use pre-trained deep learning models to study the role of Hi-C and TF towards gene expression (CAGE-seq) prediction
Question 1. How well does the model perform using experimental Hi-C vs. predicted Hi-C
Question 2. Does the model need 3D information to correctly predict gene expression?
Question 3. What are the roles of transcription factor towards gene expression prediction?
In-silico TF KO in GraphReg models can better predict the effects on target genes, as evaluated by TF CRISPR KO experiments.
ATF3 / GraphReg - real Hi-C
ATF3 / GraphReg - predicted Hi-C
ATF3 / CNN
Which TFs up/down regulate which genes?
What are the best (top 20) target genes of TFs as predicted correctly by GraphReg?
log2FoldChange
True Hi-C
log2(N+1)
True Hi-C
Activator/Repressor prediction from TF-KO/In-Silico KO
SHAP values for MYC in enhancers matches the TF KO experiment results.
SHAP values for MYC in enhancers matches the TF KO experiment results.
SHAP values for MYC in enhancers matches the TF KO experiment results.
SHAP values for MYC in enhancers matches the TF KO experiment results.
Accuracy of SHAP values predicting TF-KO experimental gene expression changes
Based on the Promoter/Enhancer SHAP values, TFs can be segregated as Promoter acting and Enhancer acting.
Acknowledgement
Ira Irkliyenko - SHAP value interpretation
Bharath Saravanan - ISM (in-silico mutation) analysis
Joel Pepper - Hi-C prediction evaluation
Alireza Karbalayghareh, Rui Yang - Project Leads
Thank you!
Additional Plots
Examples of GraphReg In-silico mutation effects prediction
GraphReg CNN
NRF1
ATF3
GraphReg performs better at higher expression predictions
Each point is a particular TF
Number of In-silico mutation dysregulated genes matched to the true KO genes
GraphReg ISM Predicted Genes
CNN ISM Predicted Genes
GraphReg and CNN ISM Predicted Genes