VIRTUAL COURSE�Single cell RNA-seq analysis using Python
Anna Vathrakokoili Pournara
February 2025
Feature Selection, Dimensionality reduction,
Clustering and Annotation
A bit about myself
gene expression
Previously…
From raw sequencing files... to count matrix
From QC of count matrix… to Normalization
low-quality cells(QC)
ambient RNA(SoupX)
Doublet detection
Normalization
Coming up…
Today’s Lecture Outline |
Feature Selection
|
Dimensionality Reduction
|
Clustering
|
Cell-type annotation
|
Feature selection in single-cell analysis �
PCA
PCs
clustering
~30,000 genes
~500-2,000 selected genes
Feature selection methods implemented in scanpy�
2. Variance-based : Seurat v3 [Stuart etal].
Flavor = “seuratV3
scanpy.pp.highly_variable_gene()
Mean expression=0.5
Mean expression=10
Mean expression=100
Mean expression=200
a) Calculate dispersion of each gene in each bin
b) Calculate the mean and the standard deviation of the dispersions in each bin
c) Normalise the dispersion of each gene by using the mean and the standard deviation from b
d) genes within each bin are ranked based on their normalized dispersion values --> Highly variable genes
a) Expects raw counts ( not normalised or log-transformed)
b) variance-stabilising transformation is applied to the raw data.
c) Highly variable genes are selected based on the variance of the standardised values ( mean-variance relationship is taken into account)
d) 2,000 highly variable genes selected
Dimensionality reduction
Curse of Dimensionality:
PCA
Each cell in a single-cell dataset is represented as a point in a high-dimensional space with many features (genes).
PC1
PC2
t-SNE (t-Distributed Stochastic Neighbor Embedding)
UMAP(Uniform Manifold Approximation and Projection):
https://blog.bioturing.com/2022/01/14/umap-vs-t-sne-single-cell-rna-seq-data-visualization/#:~:text=Thanks%20to%20the%20solution%20in,it%20took%20t%2DSNE%2045!
Clustering
The goal in single-cell RNA sequencing (scRNA-seq) analysis is to uncover cellular structures and identify cell identities within the dataset.
Clustering
The goal in single-cell RNA sequencing (scRNA-seq) analysis is to uncover cellular structures and identify cell identities within the dataset.
Cell type annotation
- Cell types are robust cellular phenotypes identifiable based on the expression of specific markers (e.g., proteins or gene transcripts).
- They are often linked to specific functions and remain consistent across datasets.
- Cell categorisation is subjective and may change over time due to technological advancements or discoveries of sub-phenotypes.
- Cell types can be further classified into subtypes or cell states, and the term "cell identity" is sometimes used to avoid arbitrary distinctions.
- Cell types may exist along a continuum, where cells transition or differentiate into one another.
- Differentiation coordinates can provide a more accurate description of cell states, especially in processes like haematopoiesis.
Cell-type annotation methods
Rely on transcriptomic similarity between cells.
Types of cell-types annotation:
Manual annotation
From known markers to cluster annotation
literature
annotate clusters
Manual annotation
From differentially expressed(DE) genes to cluster annotation
Differential
expression(DE)
analysis
Find marker
genes/cluster
annotate clusters
Literature/available datasets + studies
Automated annotation
Marker-gene Database-based:
Correlation-based(query-reference):
Supervised classification-based:
Others: scANVI
Take home message
Feature Selection
Dimensionality Reduction
Clustering
Cell Annotation
Useful links
https://www.youtube.com/watch?v=FgakZw6K1QQ&ab_channel=StatQuestwithJoshStarmer (UMAP video)
https://www.youtube.com/watch?v=eN0wFzBA4Sc&ab_channel=StatQuestwithJoshStarmer (PCA video)
https://www.youtube.com/watch?v=NEaUSP4YerM&ab_channel=StatQuestwithJoshStarmer (t-SNE video)
https://www.sc-best-practices.org/preamble.html (single-cell practices tutorial)
https://scanpy.readthedocs.io/en/stable/tutorials.html (scanpy docs)
https://anndata.readthedocs.io/en/latest/ (anndata docs)
https://scverse.org/packages/#core-packages (scverse community)
https://www.nature.com/articles/s41576-018-0088-9 (Challenges in unsupervised clustering of single-cell RNA-seq data)
https://www.nature.com/articles/s41576-023-00586-w
https://www.sciencedirect.com/science/article/pii/S2001037021000192#:~:text=Automatic%20cell%20type%20annotation%20methods,found%20across%20scRNA%2Dseq%20datasets. (Automated methods in cell-type annotation)
https://github.com/seandavi/awesome-single-cell?tab=readme-ov-file