SmellEnhancer: Integrating Machine Learning for
Enhancer-based Olfactory Receptor Gene Regulation
Yilong(Anthony) Qu, Willard Ford, Ming-Ching Crystal Wen, Katie Alltop, Isabella Pirozzolo, Miao Wang
Eggan K, et al. Nature (2004)
Spehr M, et al. J Neurochem (2009)
1000 olfactory receptor (OR) genes → one receptor/neuron!
Olfactory Neurons
→ 1 OR/ cell
→ choosing OR
Pourmorady A et al. Curr Opin Genet Dev. 2022
Pourmorady A et al. Nature. 2023
Greek islands (GIs): 63 known enhancers
Transcription factors: Lhx2, Ebf1
Coactivators: Ldb1
Olfactory receptor (OR) protein
How do we pick 1 allele from 2000 choices?
Data set: Single Cell Multiome - ATAC + nuclear RNAseq
Unify the transcriptome and epigenome in every cell
Each cell has same 10x Barcode
Key hackathon efforts:
Linear Models can’t predict Olfr Gene Expression from Greek Island Accessibility information at single cell level
Random: 1/655 = 0.002
Mature Olfactory Neurons have few accessible GI Peaks
Not only are single cell RNA and ATAC seq data incredibly sparse.
Most Olfactory Neurons have only a few Greek Island Peaks accessible.
Expression does not correlate with total accessibility
Olfactory Gene Expression does not correlate with Greek Island enhancer accessibility in Olfactory Neurons.
Model for OR gene choice prediction
scATAC-seq
Predicted OR chosen = Olfr17
0.8
0.2
Olfr17 classifier
0.45
0.55
Olfr536 classifier
0.5
0.5
Olfr1033 classifier
0.5
0.5
Olfr1320 classifier
0.3
0.7
Olfr728 classifier
Stacked One vs. rest model
chromosome accessibility profile (input)
highest
prob
Model training
scATAC-seq
X
y
choice vector: [Olfr17, Olfr536,...]
scRNA-seq
y_binary
One vs. rest model break down
Random forest prediction on expression of target genes
Method breakdown:
~ Challenges: 1) Too many genes and peaks; 2) Sparse data matrices
~ Workaround:
Random forest prediction on expression of target genes (cont.)
Random forest modeling steps:
Random forest prediction on expression of target genes (cont.)
Prediction of olfactory sensory neuron cell type
A decision tree model to predict neuron identity
Predictor: Regulatory elements (accessible transcription factor binding peaks )
Response: mOSN(mature olfactory sensory neurons ) and iOSN (immature olfactory sensory neurons )
The distinct role of Greek islands on OR choice in OSNs
Top detected GIs probably regulate every OR in OSNs
The lowest detected GIs maybe regulate specific OR gene
63 Greek islands
Decision Tree Model for Neuron Identity using RE peaks
Test Accuracy = 0. 76
Cross validation score= 0. 73
Summary
Limitations of project
They do not provide direct information regarding their interaction with genes.
Future Directions
Acknowledge
4D Nucleosome
4DN Hackathon Planning Team
Lomvardas Lab (Columbia University)
PI: Stavros Lomvardas
Lab member : Ariel Pourmorady
Team Members:
Yilong (Anthony) Qu (Duke University )
Willard Ford (UCSD)
Ming-Ching Crystal Wen (University of Michigan)
Katie Alltop (University of Michigan)
Isabella Pirozzolo (Columbia University)
Miao Wang (Columbia University)