Reproducibility - Intro to R
Candace Savonen
Patil, Peng, Leek (2016) https://www.biorxiv.org/content/10.1101/066803v1
Reproducibility:
a different analyst re-performs the analysis with
the same code and
the same data and obtains
the same result.
Image created by Candace Savonen using Avataars.
My data analysis is showing a pattern that is very
informative for the ongoing research in my field.
Variable A
Variable B
Ruby the Researcher
Image created by Candace Savonen using Avataars.
Data
Code
Variable A
Variable B
Results
Ruby the Researcher
Repeatable: keeping everything the same but repeating the analysis - do we get the same results?
Image created by Candace Savonen using Avataars.
Data
Code
Variable A
Variable B
Results
Ruby the Researcher
Reproducible: using the same data and analysis but in the hands of another researcher - do we get the same results?
Avi the Associate
Data
Code
Image created by Candace Savonen using Avataars.
New Data
Same Code
Ruby the Researcher
Replicable: with new data do we obtain the same inferences?
Avi the Associate
Variable A and B are positively correlated
Code
Based off of a figure from Essawy et al, 2020 https://doi.org/10.1016/j.envsoft.2020.104753
Effort
Time
Replicability
new researcher, new data
Reproducibility
new researcher, same data
Repeatability
same researcher, same data
Image created by Candace Savonen using Avataars.
Ruby’s findings are super relevant to my work and I’m interested in using her methods!
Variable A
Variable B
Ruby the Researcher
Avi the Associate
Image created by Candace Savonen using Avataars.
Ruby the Researcher
Avi the Associate
Variable A
Variable B
Ruby’s computer
Avi’s computer
R = 0.893
Here, Avi, this code runs well on my computer, let me email it to you!
So exciting!
Image created by Candace Savonen using Avataars.
Avi the Associate
ERROR ERROR ERROR ERROR ERROR ERROR �ERROR ERROR
ERROR ERROR
Avi’s computer
Ruby’s code and data
Image created by Candace Savonen using Avataars.
Ruby the Researcher
Avi the Associate
Error: file path “Ruby’s computer/Ruby’s file/final_version10.R” not found
Avi’s computer
Re:Re:Re: Data
Hi Ruby, I don’t understand what this code is supposed to be doing...
Re:Re:Re: Data
Hi Avi, It works for me?
Image created by Candace Savonen using Avataars.
Ruby the Researcher
Avi the Associate
Variable A
Variable B
Variable A
Variable B
Avi’s computer
R = 0.893
R = 0.891
Ruby’s
computer
Ruby’s code and data
Reproducibility is a tortoise’s game - it’s an incremental and slow process but it has high payoffs!
Reproducible analyses save everyone time and effort!
Image created by Candace Savonen using Avataars.
Ruby’s code
ERROR
Ruby’s code
Now Ruby
Future Ruby
Image created by Candace Savonen using Avataars.
Ruby’s code - not as reproducible
ERROR
ERROR
ERROR
ERROR
ERROR
ERROR
ERROR
Image created by Candace Savonen using Avataars.
Ruby’s code - made reproducibly
ERROR
Patil, Peng, Leek (2016) https://www.biorxiv.org/content/10.1101/066803v1
If your results are not repeatable they will NOT be reproducible.
In other words, if you can’t get the same answer twice, other researchers won’t be able to get your answer reliably either.
Step 1) Get your code to work once
Step 2) Get your code to work reliably for you
Step 3) Get your code to work for someone else
Image created by Candace Savonen.
Ran once
Re-runs sometimes
Re-runs in every situation and gets the same result every time
Re-runs reliably in most contexts
Perfectly reproducible
No analysis reaches here
Not repeatable
Every analysis starts here
R Markdown notebooks are a handy tool for reproducibility!
Image created by Candace Savonen using Avataars.
Ruby the Researcher
Working from this notebook allows me to interactively develop on my data analysis and write down my thoughts about the process all in one place!
RMarkdown is conducive to interactive development!
Image created by Candace Savonen using Avataars.
Ruby the Researcher
Avi the Associate
R = 0.893
Avi, here’s some output from this scientific notebook I’ve been developing from!
This is so easy to follow and read, even though I didn’t write the code. Thanks for sharing your exciting results!
RMarkdown creates easily shareable output!
Image created by Candace Savonen using Avataars.
Ruby the Researcher
Yay! I just got the data for 5 more samples. Because of my handy notebook set up, I can easily call one command and re-run the analysis so it is updated with the new samples included!
RMarkdown is handy for creating updateable reports!
Screenshot by Candace Savonen
Package versions affect reproducibility!
Created by Candace Savonen
Ruby’s local computing environment
4.1.1
4.2.1
2.1.0
5.1.0
2.0.1
3.9.0
Avi’s local computing environment
3.1.0
2.1.0
1.0.1
3.0.1
R = 0.893
R = 0.891
Image by Candace Savonen
Ruby’s session info print out
Avi’s session info print out
rmarkdown 2.4 vs 2.10
R version 4.0.2 vs 4.0.5
If Avi and Ruby have discrepancies in their results, the session info print out gives a record which may have clues to why that might be!
Different operating systems!
DRY code: Don’t repeat yourself!
DRY code is easier on readers because they don't have to review the same thing twice, but also because they don't have to review the same thing twice.
DRY code is easier on readers because they don't have to review the same thing twice, but also because they don't have to review the same thing twice.
After you get your code working, next step is to make it:
Introduction to Reproducibility in Cancer Informatics