1 of 34

Reproducibility - Intro to R

Candace Savonen

2 of 34

Patil, Peng, Leek (2016) https://www.biorxiv.org/content/10.1101/066803v1

Reproducibility:

a different analyst re­-performs the analysis with

the same code and

the same data and obtains

the same result.

3 of 34

Image created by Candace Savonen using Avataars.

My data analysis is showing a pattern that is very

informative for the ongoing research in my field.

Variable A

Variable B

Ruby the Researcher

4 of 34

Image created by Candace Savonen using Avataars.

Data

Code

Variable A

Variable B

Results

Ruby the Researcher

Repeatable: keeping everything the same but repeating the analysis - do we get the same results?

5 of 34

Image created by Candace Savonen using Avataars.

Data

Code

Variable A

Variable B

Results

Ruby the Researcher

Reproducible: using the same data and analysis but in the hands of another researcher - do we get the same results?

Avi the Associate

Data

Code

6 of 34

Image created by Candace Savonen using Avataars.

New Data

Same Code

Ruby the Researcher

Replicable: with new data do we obtain the same inferences?

Avi the Associate

Variable A and B are positively correlated

Code

7 of 34

Based off of a figure from Essawy et al, 2020 https://doi.org/10.1016/j.envsoft.2020.104753

Effort

Time

Replicability

new researcher, new data

Reproducibility

new researcher, same data

Repeatability

same researcher, same data

8 of 34

Image created by Candace Savonen using Avataars.

Ruby’s findings are super relevant to my work and I’m interested in using her methods!

Variable A

Variable B

Ruby the Researcher

Avi the Associate

9 of 34

Image created by Candace Savonen using Avataars.

Ruby the Researcher

Avi the Associate

Variable A

Variable B

Ruby’s computer

Avi’s computer

R = 0.893

Here, Avi, this code runs well on my computer, let me email it to you!

So exciting!

10 of 34

Image created by Candace Savonen using Avataars.

Avi the Associate

ERROR ERROR ERROR ERROR ERROR ERROR �ERROR ERROR

ERROR ERROR

Avi’s computer

Ruby’s code and data

11 of 34

Image created by Candace Savonen using Avataars.

Ruby the Researcher

Avi the Associate

Error: file path “Ruby’s computer/Ruby’s file/final_version10.R” not found

Avi’s computer

Re:Re:Re: Data

Hi Ruby, I don’t understand what this code is supposed to be doing...

Re:Re:Re: Data

Hi Avi, It works for me?

12 of 34

Image created by Candace Savonen using Avataars.

Ruby the Researcher

Avi the Associate

Variable A

Variable B

Variable A

Variable B

Avi’s computer

R = 0.893

R = 0.891

Ruby’s

computer

Ruby’s code and data

13 of 34

Reproducibility is a tortoise’s game - it’s an incremental and slow process but it has high payoffs!

14 of 34

Reproducible analyses save everyone time and effort!

15 of 34

Image created by Candace Savonen using Avataars.

Ruby’s code

ERROR

Ruby’s code

Now Ruby

Future Ruby

16 of 34

Image created by Candace Savonen using Avataars.

Ruby’s code - not as reproducible

ERROR

ERROR

ERROR

ERROR

ERROR

ERROR

ERROR

17 of 34

Image created by Candace Savonen using Avataars.

Ruby’s code - made reproducibly

ERROR

18 of 34

Patil, Peng, Leek (2016) https://www.biorxiv.org/content/10.1101/066803v1

If your results are not repeatable they will NOT be reproducible.

In other words, if you can’t get the same answer twice, other researchers won’t be able to get your answer reliably either.

19 of 34

Step 1) Get your code to work once

Step 2) Get your code to work reliably for you

Step 3) Get your code to work for someone else

20 of 34

Image created by Candace Savonen.

Ran once

Re-runs sometimes

Re-runs in every situation and gets the same result every time

Re-runs reliably in most contexts

Perfectly reproducible

No analysis reaches here

Not repeatable

Every analysis starts here

21 of 34

R Markdown notebooks are a handy tool for reproducibility!

22 of 34

Image created by Candace Savonen using Avataars.

Ruby the Researcher

Working from this notebook allows me to interactively develop on my data analysis and write down my thoughts about the process all in one place!

RMarkdown is conducive to interactive development!

23 of 34

Image created by Candace Savonen using Avataars.

Ruby the Researcher

Avi the Associate

R = 0.893

Avi, here’s some output from this scientific notebook I’ve been developing from!

This is so easy to follow and read, even though I didn’t write the code. Thanks for sharing your exciting results!

RMarkdown creates easily shareable output!

24 of 34

Image created by Candace Savonen using Avataars.

Ruby the Researcher

Yay! I just got the data for 5 more samples. Because of my handy notebook set up, I can easily call one command and re-run the analysis so it is updated with the new samples included!

RMarkdown is handy for creating updateable reports!

25 of 34

Screenshot by Candace Savonen

26 of 34

Package versions affect reproducibility!

27 of 34

Created by Candace Savonen

Ruby’s local computing environment

4.1.1

4.2.1

2.1.0

5.1.0

2.0.1

3.9.0

Avi’s local computing environment

3.1.0

2.1.0

1.0.1

3.0.1

R = 0.893

R = 0.891

28 of 34

Image by Candace Savonen

Ruby’s session info print out

Avi’s session info print out

rmarkdown 2.4 vs 2.10

R version 4.0.2 vs 4.0.5

If Avi and Ruby have discrepancies in their results, the session info print out gives a record which may have clues to why that might be!

Different operating systems!

29 of 34

DRY code: Don’t repeat yourself!

30 of 34

DRY code is easier on readers because they don't have to review the same thing twice, but also because they don't have to review the same thing twice.

31 of 34

DRY code is easier on readers because they don't have to review the same thing twice, but also because they don't have to review the same thing twice.

32 of 34

After you get your code working, next step is to make it:

  1. Readable - can others understand what you are doing?
  2. Efficient - Is this the best way to do this?

33 of 34

For more on these topics:

34 of 34

Introduction to Reproducibility in Cancer Informatics