Persist-seq: a set of reproducible single cell RNA-seq analysis pipelines for understanding early persister cells in cancer.
Pablo Moreno, Anil S Thanki, Anca Farcas, Martin Miller, Ultan McDermott
2
PERSIST-SEQ consortium: understanding persister cells with an end-to-end Single Cell RNA-seq pipeline.
“Persisters”
Resistance
Industry partners
* Consortium co-leads
Academic partners
SME partners
* Unified sequencing
*
* Standardised analysis
*
scRNA-seq pipeline, based on battle tested EBI scRNA-seq pipeline
Processed so far 14 pilots and 16 real experiments
Persist-seq End-to-End scRNA-seq data analysis operational and automated
3
SC Discoveries
Count matrices
Count matrix available
Interactive view of results
Download and decrypt
Main results object available in Data Library
Run standard pipeline
- Matrix to clusters, dimreds and packaged results.
- Battle tested (>350 datasets).
- galaxy-workflow-executor package
- Using Shared Data Library in Galaxy
What are persister cells and what is the context for this work
4
Single cell RNA sequencing has become the best way to characterise mechanism(s) which allow DTPs to survive
Persister datasets sequenced/in the pipeline
5
Cancer cell lines
3D organoids
Mouse xenografts
Patients
Co-culture
CAFs
GEMM
EGFRm/osimertinib
Model
dose clinically equivalent
(2 weeks treatment, dosed 2x week)
DTPs
washout
Persist-seq main pipeline includes best practices and produces an object that can be explored for detailed analysis
Ingest count matrix
Filtering & QC
Normalization, scaling, HVGs
PCA, k-nn graph for UMAP
tSNE & clustering
Marker genes for clusters and metadata fields
Object merging final AnnData
Optional batch correction
Mark cell cycle phase
Pseudo-bulk
pipeline
Persist-seq pseudo-bulk RNA-seq for DE calling fights p-value over-inflation and enables complex modelling
Ingest main pipeline results
Aggregate cells per groups definitions - Decoupler
EdgeR filtering + DE
Volcano for each contrast
GSEA with multiple collections, per contrast -
Fgsea
OncoEnrichR for Top DE, per contrast
Sanitisation for DE
Automatic execution
8
parameters
workflow
inputs.yaml
allowed_errors.yaml
Credentials
Galaxy-workflow-executor (CLI)
bioblend
Kubernetes on OpenStack
Kubernetes on other clouds
Dedicated instance on
AZ Slurm Cluster
results
https://github.com/ebi-gene-expression-group/galaxy-workflow-executor
pip install galaxy-workflow-executor
conda install galaxy-workflow-executor
(Bioconda channels set beforehand)
9
Osimertinib-generated persisters are transcriptionally distinct from untreated cells
HCC827 DMSO (2,411 cells)
HCC827 DTP (2,014 cells)
II18 DMSO (4,787 cells)
II18 DTP (3,771 cells)
NCIH3255 DMSO (3,470 cells)
NCIH3255 DTP (2,196 cells)
PC9 DMSO (12,241 cells)
PC9 DTP (10,181 cells)
UMAP1
UMAP2
(n=25 neighbours)
HUB 07-B2-051 DMSO (5,584 cells)
HUB 07-B2-051 OSI 160nM (6,631 cells)
HUB 07-B2-051 WASHOUT (7,240 cells)
TEMPUS AZ-574812 DMSO (5,794 cells)
TEMPUS AZ-574812 OSI 160nM (3,801 cells)
TEMPUS AZ-574812 WASHOUT (5,182 cells)
DMSO
DTP
WASHOUT
HUB
TEMPUS
DMSO
WASHOUT
DTP
Vehicle (7,271 cells)
2 days treatment (2,840 cells)
7 days treatment (2,552 cells)
10 days treatment (1,335 cells)
14 days treatment (8,597 cells)
Vehicle
Day 2
Day 7
Day 10
Day 14
Cell lines
Organoids
In vivo: PC9-CDX
PC9 DTP
PC9 DMSO
II18 DMSO
II18 DTP
HCC827 DTP
HCC827 DMSO
NCIH3255 DTP
NCIH3255 DMSO
DUSP6
Models activate different biological pathways: are all persisters created equal?
10
metabolic and signaliing pathways
Metabolic and signaling pathways relevant for cell growth are down regulated
Certain reversions, for instance Epithelial mesenchymal transitions is activated in-vitro in some cell lines, but partly deactivated in organoids and in-vivo.
Upregulation of inflammation related pathways (interferons) in many model
Persister cells consistently reduce cell cycling (G1 arrest); a cycling population remains
11
S
G2M
G1
Score every cell for relevant G2M and S genes
Cell population proportions plot
Summary
scRNA-seq pipelines operational, all datasets analyzed and shared (~1 million cells so far)
Persister cells arrested in G1, high heterogeneity.
Programmatic access and UIs (Data Libraries) help to cater for consortium needs.
Infrastructure is modular and re-deployable.
Two large workflows, new Galaxy tools and tool updates contributed.
12
Acknowledgement
13