1 of 21

SmellEnhancer: Integrating Machine Learning for

Enhancer-based Olfactory Receptor Gene Regulation

Yilong(Anthony) Qu, Willard Ford, Ming-Ching Crystal Wen, Katie Alltop, Isabella Pirozzolo, Miao Wang

2 of 21

Eggan K, et al. Nature (2004)

Spehr M, et al. J Neurochem (2009)

1000 olfactory receptor (OR) genes → one receptor/neuron!

Olfactory Neurons

→ 1 OR/ cell

→ choosing OR

3 of 21

Pourmorady A et al. Curr Opin Genet Dev. 2022

Pourmorady A et al. Nature. 2023

Greek islands (GIs): 63 known enhancers

  • Required for OR choice
  • Scattered across chromosomes

Transcription factors: Lhx2, Ebf1

Coactivators: Ldb1

  • Help form hubs
  • Required for OR choice
  • Binding sites = regulatory elements (RE)

Olfactory receptor (OR) protein

  • Reinforces mature cell identity

How do we pick 1 allele from 2000 choices?

4 of 21

Data set: Single Cell Multiome - ATAC + nuclear RNAseq

Unify the transcriptome and epigenome in every cell

Each cell has same 10x Barcode

5 of 21

Key hackathon efforts:

  • Predict olfactory receptor gene expression level
    • From accessible enhancers, from accessible TF binding sites
  • Predict olfactory receptor gene choice
    • From accessible enhancers, from accessible TF binding sites
  • Predict olfactory sensory neuron developmental stage
    • From accessible TF binding sites, all ATAC peaks

6 of 21

Linear Models can’t predict Olfr Gene Expression from Greek Island Accessibility information at single cell level

Random: 1/655 = 0.002

7 of 21

Mature Olfactory Neurons have few accessible GI Peaks

Not only are single cell RNA and ATAC seq data incredibly sparse.

Most Olfactory Neurons have only a few Greek Island Peaks accessible.

8 of 21

Expression does not correlate with total accessibility

Olfactory Gene Expression does not correlate with Greek Island enhancer accessibility in Olfactory Neurons.

9 of 21

Model for OR gene choice prediction

scATAC-seq

Predicted OR chosen = Olfr17

0.8

0.2

Olfr17 classifier

0.45

0.55

Olfr536 classifier

0.5

0.5

Olfr1033 classifier

0.5

0.5

Olfr1320 classifier

0.3

0.7

Olfr728 classifier

Stacked One vs. rest model

chromosome accessibility profile (input)

highest

prob

10 of 21

Model training

  1. Each classifier performs binary classification.
  2. Logistic regression with LASSO feature selection was selected, since coefficients are readily available.

scATAC-seq

X

y

choice vector: [Olfr17, Olfr536,...]

scRNA-seq

y_binary

11 of 21

One vs. rest model break down

12 of 21

Random forest prediction on expression of target genes

Method breakdown:

~ Challenges: 1) Too many genes and peaks; 2) Sparse data matrices

~ Workaround:

  1. Select top 10 highly expressed OR genes and extract the gene count matrix
  2. Bin peaks to 21 features and sum the peak counts for each bin
  3. Predict the expression level of target genes using 21 bin features and the random forest approach

13 of 21

Random forest prediction on expression of target genes (cont.)

Random forest modeling steps:

  1. Split the data into 0.75/0.25 for training/ test sets
  2. n_estimators=12, random_state=0, oob_score=True, max_depth=30

14 of 21

Random forest prediction on expression of target genes (cont.)

  • Accuracy: 0.766
  • Out-of-Bag Score: -0.257
  • Mean Squared Error: 3.701
  • R-squared: 0.728

15 of 21

Prediction of olfactory sensory neuron cell type

A decision tree model to predict neuron identity

Predictor: Regulatory elements (accessible transcription factor binding peaks )

Response: mOSN(mature olfactory sensory neurons ) and iOSN (immature olfactory sensory neurons )

16 of 21

The distinct role of Greek islands on OR choice in OSNs

Top detected GIs probably regulate every OR in OSNs

The lowest detected GIs maybe regulate specific OR gene

63 Greek islands

17 of 21

Decision Tree Model for Neuron Identity using RE peaks

Test Accuracy = 0. 76

Cross validation score= 0. 73

18 of 21

Summary

  1. The quantity of enhancers does not influence gene expression levels at the single-cell level.

  1. Regulatory element peaks successfully predict the selection of singular olfactory receptor genes. This suggests that the chromosomal structure plays a crucial role in determining neuron identity.

  1. Regulatory element peaks accurately predict immature neurons and mature neurons. Chromosomal architecture is highly specific to cell types, even within closely related lineages.

19 of 21

Limitations of project

  1. scATAC data is sparse

  1. ATAC peaks indicate that the genomic regions are accessible, but

They do not provide direct information regarding their interaction with genes.

20 of 21

Future Directions

  1. Identify the important features of regulatory elements that determine cell identify.

  1. Improve the model performance by incorporating supporting data from other omic dataset.

  1. Elucidate the molecular mechanism of singular olfactory receptor choice in neurons.

21 of 21

Acknowledge

4D Nucleosome

4DN Hackathon Planning Team

Lomvardas Lab (Columbia University)

PI: Stavros Lomvardas

Lab member : Ariel Pourmorady

Team Members:

Yilong (Anthony) Qu (Duke University )

Willard Ford (UCSD)

Ming-Ching Crystal Wen (University of Michigan)

Katie Alltop (University of Michigan)

Isabella Pirozzolo (Columbia University)

Miao Wang (Columbia University)