1 of 15

Single-cell RNA-seq analysis using R

Getting set up: infrastructure terms

Oct-04-2022

Iguaracy Pinheiro-de-Sousa

(iguaracy@ebi.ac.uk)

Daniel O’Hanlon

(dohanlon@ebi.ac.uk)

2 of 15

Single cell RNA-seq big data

3 of 15

Single cell RNA-seq big data

Current Opinion in Systems Biology2017,4:85–91

Databases

4 of 15

Single cell RNA-seq – Data generation – cell capture�

Levitin HM, et al. Trends Cancer. 2018 Apr;4(4):264-268.

5 of 15

Single cell RNA-seq – Data generation – transcript quantification �

Hwang et al.Experimental & Molecular Medicine(2018) 50:96

tag-based (10x Chromium)

full-length (SMART-seq2)

From: https://www.10xgenomics.com/instruments/chromium-controller

MATER METHODS 2013;3:203

6 of 15

Single cell RNA-seq – Data processing�

A classical scRNA-seq workflow contains four main steps:

  • Mapping the cDNA fragments to a reference;

  • Assigning reads to genes;

  • Assigning reads to cells (cell barcode demultiplexing);

  • Counting the number of unique RNA molecules (UMI deduplication).

https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/algorithms/overview#alignment

7 of 15

Hardware requirements

  • 8-core Intel or AMD processor (16 cores recommended)
  • 64GB RAM (128GB recommended)
  • 1TB free disk space
  • 64-bit CentOS/RedHat 7.0 or Ubuntu 14.04

You won’t be running this on your laptop!

(probably….)

support.10xgenomics.com/single-cell-gene-expression/software/overview/system-requirements

NB: The minimum requirement of 64GB RAM will allow

`cellranger aggr` to aggregate up to 250k cells,

more memory will be required beyond that.

8 of 15

Thanks for the memories

RAM (GB)

Wallclock

time

(h)

RAM (GB)

Wallclock

time

(h)

Wallclock

time

(h)

RAM (GB)

CPUs

(Amazon EC2)

Cellranger ‘multi’

60k cells

Cellranger ‘count’

20k cells

Cellranger ‘aggr’

250k cells

  • RAM amount matters for speed, but only up to ~ 256GB

support.10xgenomics.com/single-cell-gene-expression/software/overview/system-requirements

9 of 15

Batch computing

10 of 15

Batch computing

Compute

nodes

.

.

.

  • Multi-core CPUs with lots of RAM

(typically at least 32 cores, 128 GB of RAM)

  • Connected to one or more storage servers to store files

accessible to all nodes (typically many TB)

  • Also connected to each other to pool resources even

further (although not often used in bioinformatics)

  • Interfaces to these ‘compute’ nodes is commonly through

a batch scheduling system running on a ‘login’ node,

that implements a system to submit jobs to run and share

resources fairly amongst users

11 of 15

Job submission

  • For example, for LSF:

Number of CPUs

Amount of RAM (MB)

Queue type

Submission command

Output log

Executable

  • You might also specify other resources (GPUs, etc)
  • You might submit many of these jobs over subsets of data

12 of 15

Interfacing with clusters

  • Impossible to make many general statements - hardware and software

configurations differ institute to institute

  • There are workflow configuration packages that abstract this to some degree:

  • Specify your input files, scripts, etc, and these will split and run jobs across nodes

and cores, but require some initial configuration

www.nextflow.io

snakemake.readthedocs.io

www.nextflow.io/docs/latest/executor.htm

github.com/Snakemake-Profiles/

13 of 15

Graphics processing units (GPUs)

  • Originally invented for rendering and video games
  • Essentially big ‘vector’ processors

‘Warp’ of threads - executes the same

code on different processors in parallel

2 + 7 - 92

4 + 8 - 22

-4 + 1 - 32

3 + 2 - 182

  • Have to be specifically programmed for specific tasks - not all are suitable!

14 of 15

Graphics processing units (GPUs)

  • One dominant manufacturer - Nvidia
  • Drivers and platform (CUDA) specific to manufacturer
  • Consumer and enterprise grade hardware:

GeForce RTX 3090

  • 24GB GDDR6X, 1008 GB/s
  • 10496 CUDA cores (FP32)
  • 30 TFLOPS (FP32)

Ampere A100

  • 80GB GDDR6, 1935 GB/s
  • 6912 CUDA cores* (FP32)
  • 19.5 TFLOPS (FP32)

*More are dedicated to FP64

  • More RAM = Bigger models, less copying data from system memory (faster)

15 of 15

Graphics processing units (GPUs)

  • Anything that does matrix/vector calculations can be sped up with data

parallelism on GPUs

  • Neural networks in particular can take advantage of this!

ml-cheatsheet.readthedocs.io/en/latest/forwardpropagation.html