scRNA-seq : visualization
École de bioinformatique AVIESAN-IFB-INSERM 2022
2
scRNA-Seq pipeline overview
biological sample
sequencer output
unfiltered count matrix
filtered count matrix
normalized matrix
reduced space
cells visualization
Dimensions
Cells
Cells
Cells
Genes
Genes
Genes
Dim. 1
Dim. 2
Cells
HVG selection + scaling
Cells
Genes
Cells
We want a visual summary of thousands cells’ gene expression.
3
normalized matrix
reduced space
cells visualization
Dimensions
Cells
Cells
Genes
Dim. 1
Dim. 2
How ?
Why not ?
How ?
How ?
Why an intermediate step is done ?
We will summarize genes expression in few dimensions, before building the 2D projection.
http://cmdlinetips.com/wp-content/uploads/2018/03/Sparse_Matrix.png
scRNA-Seq data are sparse
> 70 % of the expression matrix is 0 : not very informative
Data are noisy
Some genes are more informative than some other.
There is biological / technical noise in gene expression.
Computational time
prop(expr_mat == 0)
4
Challenges
normalized matrix
reduced space
cells visualization
Dimensions
Cells
Cells
Genes
Dim. 1
Dim. 2
Cells
We want a visual summary of thousands cells’ gene expression.
5
normalized matrix
reduced space
cells visualization
Dimensions
Cells
Cells
Genes
Dim. 1
Dim. 2
How ?
Why not ?
How ?
How ?
Dimensionality reduction
Overview
Commonly used dimensionality reduction methods
Important parameters
HVG selection
Cells
Genes
(≈15,000)
HVG
“constant” genes
Cells
HVG
(≈3,000)
Cells
scaling
reduced space
Dimensions
(≈50)
Cells
HVG
(≈3,000)
6
normalized matrix
reduced space
cells visualization
Dimensions
Cells
Cells
Genes
Dim. 1
Dim. 2
How ?
Why not ?
How ?
How ?
Dimensionality reduction
Principal Component Analysis - principle
gene 1
gene 2
gene 3
gene 4
gene 5
gene 6
dim 1
dim 2
dim 3
7
normalized matrix
reduced space
cells visualization
Dimensions
Cells
Cells
Genes
Dim. 1
Dim. 2
How ?
Why not ?
How ?
How ?
Dimensionality reduction
Principal Component Analysis - visualization
Now, we will use the reduced space to make a 2D representation.
8
normalized matrix
reduced space
cells visualization
Dimensions
Cells
Cells
Genes
Dim. 1
Dim. 2
How ?
Why not ?
How ?
How ?
2D space for cells visualization
Commonly used 2D space
The same cells can be represented using different 2D spaces.
Do not make to many interpretations from the 2D space, it is an over-simplified representation of cells.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6417818/
Important parameters
9
normalized matrix
reduced space
cells visualization
Dimensions
Cells
Cells
Genes
Dim. 1
Dim. 2
How ?
Why not ?
How ?
How ?
Clustering
Clustering is made on expression matrix or reduced space, not on the 2D projection.
The 2D projection is not a clustering. A clustering is an annotation.
Commonly used methods
Important parameters
k-nearest neighbors (kNN)
k = 3
k = 6
shared nearest neighbors (SNN)
clustering
(from SNN graph)
10
Summary
HVG selection
normalized matrix
Cells
Axis 1
Axis 2
Cluster
Cells
Cells
Genes
(≈15,000)
HVG
Cells
HVG
(≈3,000)
Cells
scaled matrix
reduced space
Dimensions
(≈50)
Cells
HVG
(≈3,000)
Genes
(≈15,000)
UMAP
tSNE
others…
11
Take Home Messages
Advice :
The goal is to generate a quick representation for your cells. Run your favorite analyses and represent results on the representation. Do not make to many interpretations from the 2D representation itself.
12
Let’s go to practice
normalized matrix
reduced space
PCA
cells visualization
UMAP
Dimensions
Cells
Cells
Genes
Dim. 1
Dim. 2
Cells
13
| 500 | 2000 | 5000 |
5 | | | |
15 | | | |
50 | | | |
Number of variable features
Number of PC (/50) to make the UMAP
14
0.1 | 0.5 | 1 | 5 |
| | | |
Resolution