Single-cell RNA-seq Data in R:
Import, QC, Normalize, & Visualize
The Data Lab
Before we begin, an RStudio primer/review
New R features that you will see: new pipe |>
New R features that you will see: function shortcut \(x)
Single sample scRNA-seq overview
Preprocess & Import
QC, Filter, �& Normalize
Dimension reduction
Cluster
Find markers
Gene set analysis
Cell type
Quantify expression for each gene and droplet
Select droplets containing intact cells
Correct for differences in sequencing depth among cells
Focus on major axes of variation to reduce noise &
simplify visualization
Group cells with similar expression patterns
Identify genes that distinguish clusters
Look for enrichment of known markers or pathways
Assign labels to each cell using known references or manual annotation
Preprocess & Import
Preprocess & Import
QC, Filter, �& Normalize
Dimension reduction
Cluster
Find markers
Gene set analysis
Cell type
Mapping & Quantification
Cell Ranger
salmon/alevin-fry
Import to R
read10xCounts()
tximeta()
The SingleCellExperiment class
Importing Data
* we are not covering preprocessing here, but ask us about it!
Preprocess & Import
QC, Filter, �& Normalize
Dimension reduction
Cluster
Find markers
Gene set analysis
Cell type
QC, Filter, & Normalize
Cell-level statistics
addPerCellQC()
Filter disrupted cells�miQC
Remove empty droplets
emptyDropsCellRanger()
Normalize counts�logNormCounts()
Initial Quality Control
Filtering damaged/disrupted/dying cells
Normalization
Preprocess & Import
QC, Filter, �& Normalize
Dimension reduction
Cluster
Find markers
Gene set analysis
Cell type
Dimension reduction
PCA
runPCA()
UMAP
runUMAP()
Identify variable genes
modelGeneVar()
Dimensionality Reduction Methods
Clustering Cells
Dimensionality reduction often results in visible “clusters”, but how do we define those?
Many methods!
Graph-based Clustering
Step 1: Calculate similarity matrix among points
Step 2: Build a weighted network graph connecting points to their neighbors
Step 3: Divide network graph into “neighborhoods” based on connection patterns
Many options at each step! The algorithms can determine how many clusters to assign.
What do the clusters represent?