Single-cell RNA-seq analysis using R
Getting set up: infrastructure terms
Oct-04-2022
Iguaracy Pinheiro-de-Sousa
(iguaracy@ebi.ac.uk)
Daniel O’Hanlon
(dohanlon@ebi.ac.uk)
Single cell RNA-seq big data
Single cell RNA-seq big data
Current Opinion in Systems Biology2017,4:85–91
Databases
Single cell RNA-seq – Data generation – cell capture�
Levitin HM, et al. Trends Cancer. 2018 Apr;4(4):264-268.
Single cell RNA-seq – Data generation – transcript quantification �
Hwang et al.Experimental & Molecular Medicine(2018) 50:96
tag-based (10x Chromium)
full-length (SMART-seq2)
From: https://www.10xgenomics.com/instruments/chromium-controller
MATER METHODS 2013;3:203
Single cell RNA-seq – Data processing�
A classical scRNA-seq workflow contains four main steps:
�
https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/algorithms/overview#alignment
Hardware requirements
You won’t be running this on your laptop!
(probably….)
support.10xgenomics.com/single-cell-gene-expression/software/overview/system-requirements
NB: The minimum requirement of 64GB RAM will allow
`cellranger aggr` to aggregate up to 250k cells,
more memory will be required beyond that.
Thanks for the memories
RAM (GB)
Wallclock
time
(h)
RAM (GB)
Wallclock
time
(h)
Wallclock
time
(h)
RAM (GB)
CPUs
(Amazon EC2)
Cellranger ‘multi’
60k cells
Cellranger ‘count’
20k cells
Cellranger ‘aggr’
250k cells
support.10xgenomics.com/single-cell-gene-expression/software/overview/system-requirements
Batch computing
Batch computing
Compute
nodes
.
.
.
(typically at least 32 cores, 128 GB of RAM)
accessible to all nodes (typically many TB)
further (although not often used in bioinformatics)
a batch scheduling system running on a ‘login’ node,
that implements a system to submit jobs to run and share
resources fairly amongst users
Job submission
Number of CPUs
Amount of RAM (MB)
Queue type
Submission command
Output log
Executable
Interfacing with clusters
configurations differ institute to institute
and cores, but require some initial configuration
www.nextflow.io
snakemake.readthedocs.io
www.nextflow.io/docs/latest/executor.htm
github.com/Snakemake-Profiles/
Graphics processing units (GPUs)
‘Warp’ of threads - executes the same
code on different processors in parallel
2 + 7 - 92
4 + 8 - 22
-4 + 1 - 32
3 + 2 - 182
…
Graphics processing units (GPUs)
GeForce RTX 3090
Ampere A100
*More are dedicated to FP64
Graphics processing units (GPUs)
parallelism on GPUs
ml-cheatsheet.readthedocs.io/en/latest/forwardpropagation.html