1 of 38

Single-cell RNA-seq analysis using R

Raw reads to expression matrix

Oct-04-2022

Iguaracy Pinheiro-de-Sousa

(iguaracy@ebi.ac.uk)

Daniel O’Hanlon

(dohanlon@ebi.ac.uk)

2 of 38

Single cell RNA-seq overview

Frontiers in Cardiovascular Medicine. 7. 10.3389/fcvm.2020.00042.

3 of 38

Single cell RNA-seq overview

From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html

From: https://support.10xgenomics.com/single-cell-gene expression/software/pipelines/latest/using/mkfastq

4 of 38

Single cell RNA-seq fastq content (10x)

From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html

MDPI. 7. 56. 10.3390/info7040056.

  • Sample barcode – same for all cells and RNAs in a library

  • Cell barcode (16bp) – same for all RNAs in a cell

  • Unique molecular identifier (UMI 10-12bp) – unique for one RNA in one cell

5 of 38

Single cell RNA-seq - cell isolation (10x)

From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html

From: https://www.10xgenomics.com/instruments/chromium-controller

Hwang et al.Experimental & Molecular Medicine(2018) 50:96

6 of 38

Single cell RNA-seq – barcodes and UMIs (10x)

From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html

From: http://data-science-sequencing.github.io/Win2018/lectures/lecture16/

Front. Genet., 28 May 2021

7 of 38

Single cell RNA-seq - fastqc

From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html

8 of 38

Single cell RNA-seq alignment (10x)

From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html

Genome Biology volume 22, Article number: 339 (2021)

9 of 38

Single cell RNA-seq alignment (10x)

From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html

From: https://slideplayer.com/slide/17832415/

(GRCh37, GRCh38, GRCm38)

(hg19, hg38, mm10)

two popular sources of reference

10 of 38

Single cell RNA-seq alignment (10x)

From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html

From: https://slideplayer.com/slide/17832415/

From: https://www.singlecellcourse.org/processing-raw-scrna-seq-sequencing-data-from-reads-to-a-count-matrix.html#reference-genome-and-its-annotation

11 of 38

Single cell RNA-seq alignment (10x)

From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html

From: https://slideplayer.com/slide/17832415/

STAR

12 of 38

Single cell RNA-seq alignment (10x)

From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html

From: https://slideplayer.com/slide/17832415/

13 of 38

Single cell RNA-seq - deduplication (10x)

From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html

From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html

limitations

  1. Some transcripts can become overrepresented due to PCR amplification

  • Low amount of starting material like in scRNA-seq can increased number of PCR cycles

14 of 38

Single cell RNA-seq - deduplication (10x)

From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html

From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html

Front. Genet., 28 May 2021

15 of 38

Single cell RNA-seq - deduplication (10x)

From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html

From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html

16 of 38

Single cell RNA-seq – cell filtering (10x)

From: https://broadinstitute.github.io/2019_scWorkshop/data-preprocessing.html

Lun et al., 2018

17 of 38

Single cell RNA-seq – web summary (10x)

From: https://slideplayer.com/slide/17832415/

18 of 38

Single cell RNA-seq – web summary (10x)

19 of 38

Single cell RNA-seq – web summary (10x)

20 of 38

Single cell RNA-seq – web summary (10x)

21 of 38

Single cell RNA-seq – cellranger count output

22 of 38

Single cell RNA-seq – cellranger aggr

23 of 38

Single cell RNA-seq – gene matrix

24 of 38

Hands on in real data

25 of 38

Downloading raw data from NCBI-SRA

  • SRA tools: https://github.com/ncbi/sra-tools
  • fasterq-dump: SRR ID to FASTQ files

fasterq-dump {SRR ID} --split-files

For paired-end reads

26 of 38

Running FastQC

HTML file

27 of 38

Running FastQC

Good

Bad

28 of 38

Running FastQC

Should have little position specific base bias

29 of 38

Running FastQC

Matches overrepresented sequences > 20bp

with a known database

30 of 38

Trimming the adaptor region

  • Trims adaptor sequences from paired-end read files (amongst other things)

trimGalore --paired [files]

Both the *_1, and *_2

FASTQ files

31 of 38

Performing the alignment

cellranger count --transcriptome [path to GRCh38] \

--id SRR9130254 --sample SRR9130254 \

--localcores=8 --localmem=32 \

--fastqs [files]

Corresponds to file names

GB

32 of 38

Merging the HDF5 output files

cellranger aggr --id=mtx_h5 \

--csv=cellranger-aggr-files.csv \

–-localcores=8 –-localmem=32

GB

Can also include additional metadata here, to be included in the aggregated file

33 of 38

Cellranger output report

34 of 38

Cellranger output report

35 of 38

Cellranger output report

36 of 38

Cellranger output report

37 of 38

Pitfalls

  • By default, TrimGalore outputs .fq files, Cellranger expects .fastq files
  • Rename these!

  • Cellranger also requires a specific naming format for the rest of the file name:

rename .fq .fastq *.fq

[Sample Name]_S1_L00[Lane Number]_[Read Type]_001.fastq

python renameFASTQ.py --inputDir [dir] --outputDir [dir]

38 of 38

Useful links

https://www.singlecellcourse.org/processing-raw-scrna-seq-sequencing-data-from-reads-to-a-count-matrix.html#reference-genome-and-its-annotation

https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/tutorial_in

https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

https://github.com/FelixKrueger/TrimGalore/blob/master/Docs/Trim_Galore_User_Guide.md#paired-end-specific-options

https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/algorithms/overview#alignment