1 of 17

4D Nucleome Hackathon

Project 3: In silico variant prioritization using sequence-based predictive models

March 21st, 2024

Seattle, WA

Shu, Katie, Julia, Yang, Leo

2 of 17

SuPreMo generates mutated sequences for ISM

Gjoni and Pollard, bioRxiv (2023)

3 of 17

Akita predicts 3D genome folding from DNA sequence alone

Fudenberg et al, Nature Methods (2020)

4 of 17

Our goals for the Hackathon

  1. Incorporate SuPreMo with additional models (Leo, Yang)
    1. Enformer
    2. ExPecto

  • Add additional functionality to SuPreMo (Julia, Katie, Shu)
    • Custom reference fasta
    • P-values for disruption scores
    • Additional cell types
    • Weighted disruption scores
    • Sequence mutagenesis

5 of 17

Implementing SuPreMo-Enformer

6 of 17

Testing SuPreMo-Enformer

Data: tumor SV called by Manta

Method: Enformer applied to investigate SVs’ influence on CTCF ChIP assay in HFF cells

*BND is forced to have a length of 30000 for the visualization purpose

7 of 17

Closer look of the representative case

González-Rico, Francisco J., et al. "Alu retrotransposons modulate Nanog expression through dynamic changes in regional chromatin conformation via aryl hydrocarbon receptor." Epigenetics & Chromatin 13 (2020): 1-13.

8 of 17

Implementing SuPreMo-ExPecto

  • ExPecto (Zhou et al 2018)
    • DNA sequence -> Gene expression
  • Input
    • SNP, small indel (paper)
    • SNP, indels, SV (eg, BND)
    • Variable shift size
    • A JSON for customized parameters
  • Output
    • Predicted variant scores for all genes within +- 20kb of variants
    • 20min for 115,452 prediction using 16 CPU threads

{

"beluga_model_file": "/home/yang/Project/SuPreMo/ExPecto_model/resources/deepsea.beluga.pth",

"is_cuda": "False",

"model_list_file": "/home/yang/Project/SuPreMo/ExPecto_model/resources/modellist",

"gene_tss_file": "/home/yang/Project/SuPreMo/ExPecto_model/resources/geneanno.hg38.sorted.bed.gz",

"model_dir": "/home/yang/Project/SuPreMo/ExPecto_model",

"model_name": ["Fibroblast of Lung", "Small Intestine", "Small Intestine Terminal Ileum"],

"threads":16,

"n_features": 2002,

"fixed_dist": 100,

"maxshift": 800,

"split_flag": 1,

"split_index": 0,

"split_fold": 10,

"verbose_level": 0

}

9 of 17

Implementing SuPreMo-ExPecto - Cont’d

  • ExPecto (Zhou et al 2018)
    • DNA sequence -> Gene expression
  • Input
    • SNP, small indel (paper)
    • SNP, indels, SV
    • Variable shift size
    • A JSON for customized parameters
  • Output
    • Predicted variant scores for all genes within +- 20kb of variants
    • 20min for 115,452 prediction using 16 CPU threads

IRGM1 gene

10 of 17

Incorporating personalized genome into SuPreMo

  • provide VCF with patient specific SNVs
  • Incorporate patient SNVs into personalized reference genome
  • Use personalized reference genome when scoring individual SVs

11 of 17

Adding P-values for disruption scores

  • Take set of control scores generated from control variants
  • Compare scores of target variants to control scores to determine significance

12 of 17

Calculating SuPreMo scores with functional weights

Weights examples

  • Gene promoters
  • ATACseq peaks
  • H3K4me3 HiChIP

SuPreMo implementation

Regions of interest (--roi):

  • ‘genes’
  • Path to file with roi

Weights (--roi_weights):

  • Values 0-1
  • 0: no weights applied
  • 0.5: half of the score is driven by roi
  • 1: only roi are scored

13 of 17

Weighted scores highlight variants that disrupt relevant regions

1.283e-6

1.282e-6

0.052

0.087

0.065

0.049

Disruption track

ATACseq weights

14 of 17

Sequence mutagenesis options with SuPreMo

Parameters: (1) % GC deletion (2) mutate flanking regions of the variant

Future parameters: (1) Apply to ALT/ALT-mutated sequences

SuPreMo-Akita scores

Variant (CTCF site) GC and nucleotide shuffling mutagenesis

15 of 17

TFBS mutagenesis options with SuPreMo

Parameters: (1) INV/shuffling of TFBS (2) mutate flanking regions of the variant

Future parameters: (1) DEL/DUP of TFBS (2) Apply to ALT/ALT-mutated sequences

SuPreMo-Akita scores

TFBS INV and shuffling mutagenesis

16 of 17

Conclusions

  • SuPreMo generates mutated sequences for ISM, which can be directly input into the following models:
    • Previously: Akita, DeepSEA
    • New: Enformer, ExPecto
  • We can obtain disruption scores to quantify the effect of individual variants on genome folding, chromatin states, gene expression, etc

  • SuPreMo is extremely customizable with new options to:
    • Select cell types for predictions
    • Add additional sequence perturbations
    • Use a custom reference sequence
    • Weight disruption scores based on regions of interest
    • Generate p-values for disruption scores

17 of 17

THANK YOU!

Acknowledgements

Katie Pollard (UCSF)

Peter J. Park (HMS)

Alexey I. Nesvizhskii (UMich)

Jian Ma (CMU)