Increasing data analysis skills in the pediatric cancer community with the Childhood Cancer Data Lab training workshops
Candace L. Savonen, Deepashree Venkatesh Prasad, Casey S. Greene, and Jaclyn N. Taroni
The Childhood Cancer Data Lab -
An initiative of Alex’s Lemonade Stand Foundation
Break down barriers, in access and in knowledge, and put data and powerful methods in the hands of pediatric cancer experts poised for the next big discovery.
The Childhood Cancer Data Lab -
An initiative of Alex’s Lemonade Stand Foundation
Break down barriers, in access and in knowledge, and put data and powerful methods in the hands of pediatric cancer experts poised for the next big discovery.
ccdatalab.org
The bioinformatics knowledge gap
The bioinformatics knowledge gap
The bioinformatics knowledge gap
Bioinformatics Core
Researcher
The bioinformatics knowledge gap
Bioinformatics Core
Researcher
The bioinformatics knowledge gap
147628264_sample1A.sam
472781188_sample1A.bam
147628264_sample1B.bam.bai
123567678_sample1B.fastq
Bioinformatics Core
Researcher
The bioinformatics knowledge gap
147628264_sample1A.sam
472781188_sample1A.bam
147628264_sample1B.bam.bai
123567678_sample1B.fastq
Bioinformatics Core
Researcher
The bioinformatics knowledge gap
id | id2 | p.val | p.val2 |
1234 | ?? | 1525 | 0.095 |
131 | ?? | 333 | 0.02 |
... | ... | ... | ... |
Bioinformatics Core
Researcher
The bioinformatics knowledge gap
id | id2 | p.val | p.val2 |
1234 | ?? | 1525 | 0.095 |
131 | ?? | 333 | 0.02 |
... | ... | ... | ... |
How were these data processed?
How do I use these files?
How do I analyze this to answer my question?
Bioinformatics Core
Researcher
The bioinformatics knowledge gap
id | id2 | p.val | p.val2 |
1234 | ?? | 1525 | 0.095 |
131 | ?? | 333 | 0.02 |
... | ... | ... | ... |
How were these data processed?
How do I use these files?
How do I analyze this to answer my question?
Bioinformatics Core
Researcher
The bioinformatics knowledge gap
id | id2 | p.val | p.val2 |
1234 | ?? | 1525 | 0.095 |
131 | ?? | 333 | 0.02 |
... | ... | ... | ... |
How were these data processed?
How do I use these files?
How do I analyze this to answer my question?
Bioinformatics Core
Researcher
The bioinformatics knowledge gap
id | id2 | p.val | p.val2 |
1234 | ?? | 1525 | 0.095 |
131 | ?? | 333 | 0.02 |
... | ... | ... | ... |
How were these data processed?
How do I use these files?
How do I analyze this to answer my question?
Bioinformatics Core
Researcher
CCDL training workshops goal:
Equip childhood cancer researchers to use data
Goal: equip childhood cancer researchers to use data
Own data
Machine Learning
Data prep
Clustering
PLIER
Plot preparation
Bulk RNA-seq
Pre-processing
Differential expression
Environment set up
Intro to Docker
Day 1 Day 2 Day 3
Talk by CCDL staff
Single-cell RNA-seq
Normalization
Pre-processing droplet-based
Dimension reduction
Participant data show and tell
Intro to R and R Notebooks
Intro to Tidyverse
Schedule Example on GitHub: https://github.com/AlexsLemonade/RNA-Seq-Exercises/blob/master/schedule.md
Goal: equip childhood cancer researchers to use data
Bulk RNA-seq
Pre-processing
Differential expression
Environment set up
Intro to Docker
Day 1
Intro to R and R Notebooks
Intro to Tidyverse
Encourage Reproducible Analyses - with Docker
Workshop participant’s computer
Some Different Operating System
A Computer
An Operating System
Bulk RNA-seq
Pre-processing
Differential expression
Environment set up
Intro to Docker
Day 1
Intro to R and R Notebooks
Intro to Tidyverse
Encourage Reproducible Analyses - with Docker
Workshop participant’s computer
Some Different Operating System
A Computer
An Operating System
Docker Container
Bulk RNA-seq
Pre-processing
Differential expression
Environment set up
Intro to Docker
Day 1
Intro to R and R Notebooks
Intro to Tidyverse
Encourage Reproducible Analyses - with Docker
Workshop participant’s computer
Some Different Operating System
A Computer
An Operating System
Docker Container
Bulk RNA-seq
Pre-processing
Differential expression
Environment set up
Intro to Docker
Day 1
Intro to R and R Notebooks
Intro to Tidyverse
Encourage Reproducible Analyses - with R Notebooks
Output from above code chunk
Executable code chunk
Can click here to run code chunk
Bulk RNA-seq
Pre-processing
Differential expression
Environment set up
Intro to Docker
Day 1
Intro to R and R Notebooks
Intro to Tidyverse
Encourage Reproducible Analyses - with R Notebooks
Bulk RNA-seq
Pre-processing
Differential expression
Environment set up
Intro to Docker
Day 1
Intro to R and R Notebooks
Intro to Tidyverse
Encourage Reproducible Analyses - with R Notebooks
From: https://alexslemonade.github.io/training-modules/intro_to_R_tidyverse/01-intro_to_r.nb.html
From: https://github.com/AlexsLemonade/training-modules/blob/master/RNA-seq/02-reproducibility_cmdline.md
Bulk RNA-seq
Pre-processing
Differential expression
Environment set up
Intro to Docker
Day 1
Intro to R and R Notebooks
Intro to Tidyverse
Encourage Reproducible Analyses - with R Notebooks
Goal: equip childhood cancer researchers to use data
Machine Learning
Data prep
Clustering
PLIER
Plot preparation
Day 2
Single-cell RNA-seq
Normalization
Pre-processing droplet-based
Dimension reduction
Focus on Researcher’s Most Pressing Issues: Surveys said RNA-seq
From: https://github.com/AlexsLemonade/training-modules/blob/master/scRNA-seq/01-normalizing_scRNA-seq.nb.html
Machine Learning
Data prep
Clustering
PLIER
Plot preparation
Day 2
Single-cell RNA-seq
Normalization
Pre-processing droplet-based
Dimension reduction
Focus on Researcher’s Most Pressing Issues
From: https://github.com/AlexsLemonade/training-modules/blob/master/machine-learning/02-medulloblastoma_clustering.nb.html
Machine Learning
Data prep
Clustering
PLIER
Plot preparation
Day 2
Single-cell RNA-seq
Normalization
Pre-processing droplet-based
Dimension reduction
Focus on Researcher’s Most Pressing Issues
From: https://github.com/AlexsLemonade/training-modules/blob/master/machine-learning/04-medulloblastoma_LV_differences.nb.html
Own data
Day 3
Talk by CCDL staff
Participant data show and tell
Focus on Researcher’s Most Pressing Issues:
Goal: equip childhood cancer researchers to use data
Own data
Machine Learning
Intro to R
Intro to Tidyverse
Bulk RNA-seq
Set up
Intro to Docker
Day 1 Day 2 Day 3
Talk by CCDL staff
Single-cell RNA-seq
Participant data show and tell
Create curriculum that is easily accessible and updatable
Goal: equip childhood cancer researchers to use data
Train others to host workshops at their own institutions
Train others to host workshops at their own institutions
Post-training survey results summary
Participant Quotes:
“This workshop was the perfect environment to help me get over the initial hurdle of basic R programming and data processing concepts.”
“I have gained the ability to engage with my bioinformatic colleagues on a new level. Like learning any new language, I feel more comfortable reading than writing for the moment but feel this introduction has given me an opportunity to develop this skill on my own.”
Post-training survey results summary
Net Promoter Score = %Active Promoters - %Detractors
2
1
3
4
5
6
7
8
9
10
Detractors
Passive
Active
Post-training survey results summary
Net Promoter Score = %Active Promoters - %Detractors
2
1
3
4
5
6
7
8
9
10
Detractors
Passive
Active
Overall Range: -100 to 100
> 0 Good
> +50 Very Good
> +70 Excellent
Post-training survey results summary
Net Promoter Score = %Active Promoters - %Detractors
2
1
3
4
5
6
7
8
9
10
Detractors
Passive
Active
Our Net Promoter Score
89
Overall Range: -100 to 100
> 0 Good
> +50 Very Good
> +70 Excellent
Our goal: equip childhood cancer researchers to use data
Bioinformatics Core
Childhood Cancer Researcher
Our goal: equip childhood cancer researchers to use data
Data Analysis Toolbox
Bioinformatics Core
Childhood Cancer Researcher
CCDL Workshops Next Steps
We encourage other fields to start similar workshops to spread bioinformatics skills.
Casey Greene,
PhD
Director
Deepa Prasad
UX Designer
Dongbo Hu
Software Engineer
Rich Jones
Software Engineer
Ariel Rodriguez
Software Engineer
Kurt Wheeler
Software Engineer
Jaclyn Taroni, PhD
Data Scientist
Candace Savonen
Biological Data Analyst
Come see us at Table 1!
Website: ccdatalab.org
github.com/AlexsLemonade
@CancerDataLab