1 of 16

Single-cell RNA-seq analysis using CyVerse

Reetu Tuteja and Ruchika Bhat

NCIBC workshop, May 2021

2 of 16

What we’ll cover:

  • What is single-cell and why it is useful
  • Basic steps for the analysis
  • How to run single-cell analysis apps in CyVerse
  • Demonstration using Seurat VICE app
  • Q&A

https://cyverse-scrna-seq.readthedocs-hosted.com

https://de.cyverse.org/

3 of 16

Average population data or data for an individual?

Single-cell analyses to tailor treatments. Shalek and Benson, 2017

BULK

Single-cell

Cell Type-A

Cell Type-B

4 of 16

Single-cell analyses to tailor treatments. Shalek and Benson, 2017

Bulk vs scRNA-Seq in cancer medicine

5 of 16

Exponential scaling of scRNA-Seq

Exponential scaling of single cell RNA-seq in the last decade. Svensson et al., 2018

6 of 16

Type of single-cell RNA-Seq methods

  • Quantification - determines type of analysis
      • Full-length protocols
      • Tag-based protocols
  • Capture - determines throughput
      • Microwell-based
      • Microfluidic-based
      • Droplet-based

7 of 16

Typical steps to analyze single-cell data

  • Quality Control
  • Normalization
  • Dimensionality Reduction
  • Clustering
  • Differential Expression Analysis

Raw

Counts

N cells

J genes

Data Structure

8 of 16

Quality control

  • Filter out low quality cells (or empty droplets):
    • Library size
    • Cell coverage
  • Filter out low abundance genes
  • Cell doublets or multiplets may exhibit an aberrantly high gene count

Adapted from Wolock et al., 2018

9 of 16

Workflow for analyzing scRNA-seq datasets

Bioconductor workflow for single-cell RNA sequencing. Perraudeau et al., 2017

Raw

Counts

N cells

J genes

Normalized

Counts

N cells

J genes

1. Normalization

4. DE analysis

DE genes

3. Clustering

Cluster Labels

Reduced

Matrix

N cells

K < J

2. Dim. Reduction

10 of 16

Computational challenges

  • Multiple large-scale data processing steps
  • Compute and memory-intensive
  • Parameter tuning at multiple steps
  • Reproducibility of analysis, data and computational environment

11 of 16

What is CyVerse?

An open science workspace for collaborative, data-driven discovery

Vision: Transforming science through data-driven discovery

Mission: Design, develop, deploy, and expand a national

cyberinfrastructure for life science research, and train

Scientists in its use

Community: 90K+ users, ~5 PB’s of data, hundreds of scientific applications, learning resources, and trainings through workshops and courses

12 of 16

CyVerse resources

CyVerse helps researchers do their science by providing them powerful computational infrastructure.

  • Analysis platforms
  • Compute resources
  • Secure data storage
  • User support
  • Documentation and training

VICE

13 of 16

Visual and Interactive Computing Environment

  • Zero-install usage

  • Manage large-scale data from complex analysis workflows by maintaining it on an active storage

  • Readily make your tools available to your users/ community
  • VICE can address limitations of hosted web-services - simplifies reproducibility and supports large-scale analysis

14 of 16

Researchers: who want to easily interact with computational environments that others have created.

Authors: who want to create apps that allow users to immediately interact with a computational environment that you specify.

Developers: who want to create their own VICE apps to run on the Discovery Environment or on their own hardware.

Students and classroom instructors: who want to use already available apps in a classroom environment.

Who is VICE for?

15 of 16

Demonstration

16 of 16

DE Team

Sriram Srinivasan

John Wregglesworth

Sarah Roberts

Paul Sarando

Tina Lee

Anastasia Kousa

Janko Nikolich-Zugich

Darren Cusanovich