1 of 43

Increasing data analysis skills in the pediatric cancer community with the Childhood Cancer Data Lab training workshops

Candace L. Savonen, Deepashree Venkatesh Prasad, Casey S. Greene, and Jaclyn N. Taroni

2 of 43

The Childhood Cancer Data Lab -

An initiative of Alex’s Lemonade Stand Foundation

Break down barriers, in access and in knowledge, and put data and powerful methods in the hands of pediatric cancer experts poised for the next big discovery.

3 of 43

The Childhood Cancer Data Lab -

An initiative of Alex’s Lemonade Stand Foundation

Break down barriers, in access and in knowledge, and put data and powerful methods in the hands of pediatric cancer experts poised for the next big discovery.

  1. Software Tools
  2. Scientific Workflows
  3. Training Workshops

ccdatalab.org

4 of 43

The bioinformatics knowledge gap

  • Big data can hold valuable information about the underlying biology of complex diseases

5 of 43

The bioinformatics knowledge gap

  • Big data can hold valuable information about the underlying biology of complex diseases

  • Training opportunities for using these data are sparse in many fields

6 of 43

The bioinformatics knowledge gap

Bioinformatics Core

Researcher

7 of 43

The bioinformatics knowledge gap

Bioinformatics Core

Researcher

8 of 43

The bioinformatics knowledge gap

147628264_sample1A.sam

472781188_sample1A.bam

147628264_sample1B.bam.bai

123567678_sample1B.fastq

Bioinformatics Core

Researcher

9 of 43

The bioinformatics knowledge gap

147628264_sample1A.sam

472781188_sample1A.bam

147628264_sample1B.bam.bai

123567678_sample1B.fastq

Bioinformatics Core

Researcher

10 of 43

The bioinformatics knowledge gap

id

id2

p.val

p.val2

1234

??

1525

0.095

131

??

333

0.02

...

...

...

...

Bioinformatics Core

Researcher

11 of 43

The bioinformatics knowledge gap

id

id2

p.val

p.val2

1234

??

1525

0.095

131

??

333

0.02

...

...

...

...

How were these data processed?

How do I use these files?

How do I analyze this to answer my question?

Bioinformatics Core

Researcher

12 of 43

The bioinformatics knowledge gap

id

id2

p.val

p.val2

1234

??

1525

0.095

131

??

333

0.02

...

...

...

...

How were these data processed?

How do I use these files?

How do I analyze this to answer my question?

Bioinformatics Core

Researcher

13 of 43

The bioinformatics knowledge gap

id

id2

p.val

p.val2

1234

??

1525

0.095

131

??

333

0.02

...

...

...

...

How were these data processed?

How do I use these files?

How do I analyze this to answer my question?

Bioinformatics Core

Researcher

14 of 43

The bioinformatics knowledge gap

id

id2

p.val

p.val2

1234

??

1525

0.095

131

??

333

0.02

...

...

...

...

How were these data processed?

How do I use these files?

How do I analyze this to answer my question?

Bioinformatics Core

Researcher

15 of 43

CCDL training workshops goal:

Equip childhood cancer researchers to use data

16 of 43

Goal: equip childhood cancer researchers to use data

  1. Focus on researchers’ most pressing data issues
  2. Encourage reproducible analyses
  3. Create curriculum that is easily accessible and updatable
  4. Train others to host workshops at their own institutions

17 of 43

Own data

Machine Learning

Data prep

Clustering

PLIER

Plot preparation

Bulk RNA-seq

Pre-processing

Differential expression

Environment set up

Intro to Docker

Day 1 Day 2 Day 3

Talk by CCDL staff

Single-cell RNA-seq

Normalization

Pre-processing droplet-based

Dimension reduction

Participant data show and tell

Intro to R and R Notebooks

Intro to Tidyverse

18 of 43

Goal: equip childhood cancer researchers to use data

  • Encourage reproducible analyses
  • Focus on researchers’ most pressing data issues
  • Create curriculum that is easily accessible and updatable
  • Train others to host workshops at their own institutions

19 of 43

Bulk RNA-seq

Pre-processing

Differential expression

Environment set up

Intro to Docker

Day 1

Intro to R and R Notebooks

Intro to Tidyverse

Encourage Reproducible Analyses - with Docker

Workshop participant’s computer

Some Different Operating System

A Computer

An Operating System

20 of 43

Bulk RNA-seq

Pre-processing

Differential expression

Environment set up

Intro to Docker

Day 1

Intro to R and R Notebooks

Intro to Tidyverse

Encourage Reproducible Analyses - with Docker

Workshop participant’s computer

Some Different Operating System

A Computer

An Operating System

Docker Container

21 of 43

Bulk RNA-seq

Pre-processing

Differential expression

Environment set up

Intro to Docker

Day 1

Intro to R and R Notebooks

Intro to Tidyverse

Encourage Reproducible Analyses - with Docker

Workshop participant’s computer

Some Different Operating System

A Computer

An Operating System

Docker Container

22 of 43

Bulk RNA-seq

Pre-processing

Differential expression

Environment set up

Intro to Docker

Day 1

Intro to R and R Notebooks

Intro to Tidyverse

Encourage Reproducible Analyses - with R Notebooks

Output from above code chunk

Executable code chunk

Can click here to run code chunk

23 of 43

Bulk RNA-seq

Pre-processing

Differential expression

Environment set up

Intro to Docker

Day 1

Intro to R and R Notebooks

Intro to Tidyverse

Encourage Reproducible Analyses - with R Notebooks

24 of 43

Bulk RNA-seq

Pre-processing

Differential expression

Environment set up

Intro to Docker

Day 1

Intro to R and R Notebooks

Intro to Tidyverse

Encourage Reproducible Analyses - with R Notebooks

From: https://alexslemonade.github.io/training-modules/intro_to_R_tidyverse/01-intro_to_r.nb.html

25 of 43

Bulk RNA-seq

Pre-processing

Differential expression

Environment set up

Intro to Docker

Day 1

Intro to R and R Notebooks

Intro to Tidyverse

Encourage Reproducible Analyses - with R Notebooks

26 of 43

Goal: equip childhood cancer researchers to use data

  • Encourage reproducible analyses
  • Focus on researchers’ most pressing data issues
  • Create curriculum that is easily accessible and updatable
  • Train others to host workshops at their own institutions

27 of 43

Machine Learning

Data prep

Clustering

PLIER

Plot preparation

Day 2

Single-cell RNA-seq

Normalization

Pre-processing droplet-based

Dimension reduction

Focus on Researcher’s Most Pressing Issues: Surveys said RNA-seq

28 of 43

Machine Learning

Data prep

Clustering

PLIER

Plot preparation

Day 2

Single-cell RNA-seq

Normalization

Pre-processing droplet-based

Dimension reduction

Focus on Researcher’s Most Pressing Issues

From: https://github.com/AlexsLemonade/training-modules/blob/master/machine-learning/02-medulloblastoma_clustering.nb.html

29 of 43

Machine Learning

Data prep

Clustering

PLIER

Plot preparation

Day 2

Single-cell RNA-seq

Normalization

Pre-processing droplet-based

Dimension reduction

Focus on Researcher’s Most Pressing Issues

From: https://github.com/AlexsLemonade/training-modules/blob/master/machine-learning/04-medulloblastoma_LV_differences.nb.html

30 of 43

Own data

Day 3

Talk by CCDL staff

Participant data show and tell

Focus on Researcher’s Most Pressing Issues:

  • Designate time for participants to work on their own data

31 of 43

Goal: equip childhood cancer researchers to use data

  • Encourage reproducible analyses
  • Focus on researchers’ most pressing data issues
  • Create curriculum that is easily accessible and updatable
  • Train others to host workshops at their own institutions

32 of 43

Own data

Machine Learning

Intro to R

Intro to Tidyverse

Bulk RNA-seq

Set up

Intro to Docker

Day 1 Day 2 Day 3

Talk by CCDL staff

Single-cell RNA-seq

Participant data show and tell

  • Modularized for flexibility in meeting researchers’ needs

Create curriculum that is easily accessible and updatable

33 of 43

Goal: equip childhood cancer researchers to use data

  • Encourage reproducible analyses
  • Focus on researchers’ most pressing data issues
  • Create curriculum that is easily accessible and updatable
  • Train others to host workshops at their own institutions

34 of 43

Train others to host workshops at their own institutions

  • Scalable solution for the further spread of bioinformatics training

35 of 43

Train others to host workshops at their own institutions

  • Scalable solution for the further spread of bioinformatics training

  • Alex’s Lemonade Stand Foundation encourages this by providing financial and administrative support.

36 of 43

Post-training survey results summary

Participant Quotes:

“This workshop was the perfect environment to help me get over the initial hurdle of basic R programming and data processing concepts.”

I have gained the ability to engage with my bioinformatic colleagues on a new level. Like learning any new language, I feel more comfortable reading than writing for the moment but feel this introduction has given me an opportunity to develop this skill on my own.

37 of 43

Post-training survey results summary

Net Promoter Score = %Active Promoters - %Detractors

2

1

3

4

5

6

7

8

9

10

Detractors

Passive

Active

38 of 43

Post-training survey results summary

Net Promoter Score = %Active Promoters - %Detractors

2

1

3

4

5

6

7

8

9

10

Detractors

Passive

Active

Overall Range: -100 to 100

> 0 Good

> +50 Very Good

> +70 Excellent

39 of 43

Post-training survey results summary

Net Promoter Score = %Active Promoters - %Detractors

2

1

3

4

5

6

7

8

9

10

Detractors

Passive

Active

Our Net Promoter Score

89

Overall Range: -100 to 100

> 0 Good

> +50 Very Good

> +70 Excellent

40 of 43

Our goal: equip childhood cancer researchers to use data

Bioinformatics Core

Childhood Cancer Researcher

41 of 43

Our goal: equip childhood cancer researchers to use data

Data Analysis Toolbox

Bioinformatics Core

Childhood Cancer Researcher

42 of 43

CCDL Workshops Next Steps

We encourage other fields to start similar workshops to spread bioinformatics skills.

43 of 43

Casey Greene,

PhD

Director

Deepa Prasad

UX Designer

Dongbo Hu

Software Engineer

Rich Jones

Software Engineer

Ariel Rodriguez

Software Engineer

Kurt Wheeler

Software Engineer

Jaclyn Taroni, PhD

Data Scientist

Candace Savonen

Biological Data Analyst

Come see us at Table 1!

Website: ccdatalab.org

github.com/AlexsLemonade

@CancerDataLab