Single-cell RNA-seq analysis using R
Raw reads to expression matrix
Oct-04-2022
Iguaracy Pinheiro-de-Sousa
(iguaracy@ebi.ac.uk)
Daniel O’Hanlon
(dohanlon@ebi.ac.uk)
Single cell RNA-seq overview
Frontiers in Cardiovascular Medicine. 7. 10.3389/fcvm.2020.00042.
Single cell RNA-seq overview
From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html
From: https://support.10xgenomics.com/single-cell-gene expression/software/pipelines/latest/using/mkfastq
Single cell RNA-seq fastq content (10x)
From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html
MDPI. 7. 56. 10.3390/info7040056.
Single cell RNA-seq - cell isolation (10x)
From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html
From: https://www.10xgenomics.com/instruments/chromium-controller
Hwang et al.Experimental & Molecular Medicine(2018) 50:96
Single cell RNA-seq – barcodes and UMIs (10x)
From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html
From: http://data-science-sequencing.github.io/Win2018/lectures/lecture16/
Front. Genet., 28 May 2021
Single cell RNA-seq - fastqc
From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html
Single cell RNA-seq alignment (10x)
From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html
Genome Biology volume 22, Article number: 339 (2021)
Single cell RNA-seq alignment (10x)
From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html
From: https://slideplayer.com/slide/17832415/
(GRCh37, GRCh38, GRCm38)
(hg19, hg38, mm10)
two popular sources of reference
Single cell RNA-seq alignment (10x)
From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html
From: https://slideplayer.com/slide/17832415/
From: https://www.singlecellcourse.org/processing-raw-scrna-seq-sequencing-data-from-reads-to-a-count-matrix.html#reference-genome-and-its-annotation
Single cell RNA-seq alignment (10x)
From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html
From: https://slideplayer.com/slide/17832415/
STAR
Single cell RNA-seq alignment (10x)
From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html
From: https://slideplayer.com/slide/17832415/
Single cell RNA-seq - deduplication (10x)
From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html
From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html
limitations
Single cell RNA-seq - deduplication (10x)
From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html
From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html
Front. Genet., 28 May 2021
Single cell RNA-seq - deduplication (10x)
From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html
From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html
Single cell RNA-seq – cell filtering (10x)
From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html
Lun et al., 2018
Single cell RNA-seq – web summary (10x)
From: https://slideplayer.com/slide/17832415/
Single cell RNA-seq – web summary (10x)
Single cell RNA-seq – web summary (10x)
Single cell RNA-seq – web summary (10x)
Single cell RNA-seq – cellranger count output
Single cell RNA-seq – cellranger aggr
Single cell RNA-seq – gene matrix
Hands on in real data
Downloading raw data from NCBI-SRA
fasterq-dump {SRR ID} --split-files
For paired-end reads
Running FastQC
HTML file
Running FastQC
Good
Bad
Running FastQC
Should have little position specific base bias
Running FastQC
Matches overrepresented sequences > 20bp
with a known database
Trimming the adaptor region
trimGalore --paired [files]
Both the *_1, and *_2
FASTQ files
Performing the alignment
cellranger count --transcriptome [path to GRCh38] \
--id SRR9130254 --sample SRR9130254 \
--localcores=8 --localmem=32 \
--fastqs [files]
Corresponds to file names
GB
Merging the HDF5 output files
cellranger aggr --id=mtx_h5 \
--csv=cellranger-aggr-files.csv \
–-localcores=8 –-localmem=32
GB
Can also include additional metadata here, to be included in the aggregated file
Cellranger output report
Cellranger output report
Cellranger output report
Cellranger output report
Pitfalls
rename .fq .fastq *.fq
[Sample Name]_S1_L00[Lane Number]_[Read Type]_001.fastq
python renameFASTQ.py --inputDir [dir] --outputDir [dir]
Useful links
https://www.singlecellcourse.org/processing-raw-scrna-seq-sequencing-data-from-reads-to-a-count-matrix.html#reference-genome-and-its-annotation
https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/tutorial_in
https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
https://github.com/FelixKrueger/TrimGalore/blob/master/Docs/Trim_Galore_User_Guide.md#paired-end-specific-options
https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/algorithms/overview#alignment