1 of 17

Unveiling the role of epigenomic features on RNA splicing throughout neural development using machine learning.

4DN Hackathon: Team 7

2 of 17

DNA transcription and RNA splicing were thought to be two independent processes

3 of 17

The DNA/chromatin state can affect RNA splicing

Nucleic Acids Res. 2022 Nov 11; 50(20): 11563–11579.

4 of 17

Can we predict RNA splicing events using genomic and epigenomic features?

5 of 17

Excitatory and interneuron were selected for training

PLAC-seq (H3K4me3)

RNA-seq

6 of 17

PSI score was used to quantify RNA splicing events

  • High PSI → intron retained
  • Low PSI → intron excluded

Ref (Methods in Molecular Biology ((MIMB,volume 2117)))

7 of 17

Splicing changes between excitatory and interneuron

Retained (PSI >=0.1):

  • New born: 5,964
  • Adolescent: 5,500

Non retained (PSI < 0.02):

  • New born: 25,057
  • Adolescent: 19,927

Retained (PSI >=0.1):

  • New born: 7,569
  • Adolescent: 4,186

Non retained (PSI < 0.02):

  • New born: 12,704
  • Adolescent: 17,841

Interneuron

Excitatory neuron

8 of 17

Splicing prediction setup

Input modality:

  1. ATAC-seq features:
    1. Average signal at 5’ exon
    2. Average signal at 3’ exon
    3. Average signal at intron
  2. PLAC-seq
    • Number of interactions
    • Interaction “strength”, -log10(p-val)

9 of 17

Cell type specific epigenetic signatures predict splicing levels

Model : XGBoost (regression)

10 of 17

Diverse machine learning models can be employed to predict splicing levels

Random Forest

Regressor

Ridge

XGB-BCE

Gradient models outperform other models in predicting splicing

11 of 17

Sequence only and multi-modal SpliceNet

12 of 17

Model Training & Metrics

Sequence-only SpliceNet (EN cell type):

  • 900/17000 pos/neg samples are balanced to 2000/2000.
  • Trained for 100 epochs.
  • Validation 0.8264 (AUROC), 0.8 (AP).

Multiple-modal SpliceNet (EN and IN cell types):

  • 500/11000 pos/neg samples balanced to 1000/1000 (Less overlapping introns between EN and IN)
  • Trained for 100 epochs, overfitting with augmentations.
  • Validation 0.8316 (AUROC), 0.8053 (AP).

13 of 17

In silico saturation mutagenesis using SpliceNet

Junction center ±150bp at both 5’ and 3’ end, categorized by High-intron-retaining and Low-intron-retaining groups. Aggregated 300 samples from validation set.

5’

3’

HR

LR

HR

LR

14 of 17

Intron retention is associated with lower DNA methylation levels near splice junctions and within retained introns.

Justin J. -L. Wong. Nature Communications volume 8, 15134 (2017)

15 of 17

Multi-modal SpliceNet captures cell-type-specific events

16 of 17

Conclusions and next steps

Next steps

  1. Epigenomic features can predict intron retention across locus (AUROC 0.78) and across cell types (Pearson’s R 0.43), with chromatin accessibility as the major contributor.
  2. Sequence context in intronic region (GC content) influence intron retention.
  • SpliceNet
    • Genome-wide perturbations.
    • Locating common motifs.
    • Identification of SNP-induced splicing changes (SpliceAI, MMSplice).
  • Multi-modal SpliceNet
    • Reduce overfitting.
    • Train across cell types and stages.
    • Perturbation on epigenomic features.
  • New prediction target
    • Exon skipping, and other alternative splicing events.

17 of 17

Thank you!

Team 7:

  • Zong Ming Chua (SBP)
  • Ian Jones (UCSF)
  • Charlene Miciano (UCSD)
  • Jimin Tan (NYU)
  • Jing Wang (UCSF)

Trainee Organizers