Reproducibility for everyone
1
https://tinyurl.com/plantbio-repo
CC BY 4.0
Why does reproducibility matter to you?
2
3
Goals and objectives
4
Introduction |
5
Introduction |
What does reproducibility mean?
6
Introduction |
What does reproducibility mean?
7
Schloss, 2018
10.1128/mBio.00525-18
Reproducible research: Authors provide all the necessary data and the computer codes to run
the analysis again, re-creating the results.
Barba, 2018
https://arxiv.org/abs/1802.03311
Replication: A study that arrives at the same scientific findings as another study, collecting new data and completing new analyses.
Introduction |
8
Methods | Same experimental system | Different experimental system |
Same methods | Reproducibility | Replicability |
Different methods | Robustness | Generalizability |
Schloss, 2018
What are the different modes of ‘reproducibility’?
Introduction |
9
Methods | Same experimental system | Different experimental system |
Same methods | Reproducibility | Replicability |
Different methods | Robustness | Generalizability |
Schloss, 2018
What are the different modes of ‘reproducibility’?
Introduction |
Reproducibility is the minimum standard for science.
Is reproducibility all that matters?
10
Introduction |
11
Casadevall and Fang, 2016
10.1128/mBio.01902-16
Introduction |
Every little helps!
No one is perfect!
Everyone starts somewhere!
Transparent and open science!
12
Casadevall and Fang, 2016
Introduction |
Factors decreasing reproducibility
13
Introduction |
Factors decreasing reproducibility
14
Introduction |
Factors decreasing reproducibility
15
Introduction |
Factors decreasing reproducibility
16
Introduction |
Factors decreasing reproducibility
17
Introduction |
Factors decreasing reproducibility
18
Introduction |
Factors decreasing reproducibility
19
There is good news!
20
“So what’s a careful scientist to do? First and foremost, be aware of the conditions around you that may increase the risk of irreproducible results, whether they are bad ingredients, dubious statistical traditions, or outside pressures that can shape behavior. Also take heart. This reproducibility “crisis” isn’t really a crisis at all. These are not new problems. Rather, I think of this moment as an awakening. And that’s a good thing, because we need to recognize that a problem exists before we can seek solutions.”
Introduction |
Richard Harris,
NPR science journalist
Author of “Rigor Mortis”
(book on biomed
reproducibility)
There is good news!
21
“So what’s a careful scientist to do? First and foremost, be aware of the conditions around you that may increase the risk of irreproducible results, whether they are bad ingredients, dubious statistical traditions, or outside pressures that can shape behavior. Also take heart. This reproducibility “crisis” isn’t really a crisis at all. These are not new problems. Rather, I think of this moment as an awakening. And that’s a good thing, because we need to recognize that a problem exists before we can seek solutions.”
Introduction |
Richard Harris,
NPR science journalist
Author of “Rigor Mortis”
(book on biomed
reproducibility)
What can we do by the end of the century?
22
Cori Bargmann, HHMI investigator, President CZI Science
"82 years ago, there were no antibiotics & we didn't know that smoking causes lung cancer... We can expect a lot from the next 82 years." ��
Introduction |
What can we do by the end of the century?
23
Cori Bargmann, HHMI investigator, President CZI Science
"82 years ago, there were no antibiotics & we didn't know that smoking causes lung cancer... We can expect a lot from the next 82 years." ��"Where can we be in 82 years if we accelerate science?"
Introduction |
Where is your greatest potential for growth?
24
Introduction |
Where is your greatest potential for growth?
25
Introduction |
More detailed methods, analysis and record keeping
More publicly available data including meta-data
Better reagent sharing e.g. plasmids, antibodies ...
Fewer incentives to be first rather than right
Where is your greatest potential for growth?
26
Introduction |
More detailed methods, analysis and record keeping
More publicly available data including meta-data
Better reagent sharing e.g. plasmids, antibodies ...
Fewer incentives to be first rather than right
Adopting some of these best practices isn’t just good for other scientists; it’s good for you and will save you time in the long term.
Data management�
27
organization | documentation | analysis | dissemination |
http://kbroman.org/dataorg/
https://dmptool.org/
28
organization | documentation | analysis | dissemination |
Have a plan! Be happy!
I cannot find this file!
Where is my file?
What version was it?
What did I call that file again?
Was this the wild type picture or the mutant one?
Where is my RAW!!! data?
29
organization | documentation | analysis | dissemination |
Think about….
https://www.dataone.org/best-practices
http://guides.lib.purdue.edu/c.php?g=353013&p=2378292
30
organization | documentation | analysis | dissemination |
Project directory structure
Project_1
methods
raw_data
analysis
scripts
manuscript
readme and/or ELN link
Inspired by ‘Bioinformatic data skills’ Vincent Buffalo
31
organization | documentation | analysis | dissemination |
Project directory structure
Project_1
methods
raw_data
readme
analysis
analysis_method_1
2017
2018
analysis_method_2
scripts
manuscript
text
version_1
readme and/or ELN link
Inspired by ‘Bioinformatic data skills’ Vincent Buffalo
32
organization | documentation | analysis | dissemination |
Project directory structure
Always keep raw data!
Project_1
methods
raw_data
readme
analysis
analysis_method_1
2017
2018
analysis_method_2
scripts
manuscript
text
version_1
readme and/or ELN link
Always backup your data! X 3
Inspired by ‘Bioinformatic data skills’ Vincent Buffalo
33
organization | documentation | analysis | dissemination |
Project directory structure
Always keep raw data!
Project_1
methods
raw_data
readme
analysis
analysis_method_1
2017
2018
analysis_method_2
scripts
manuscript
text
version_1
readme and/or ELN link
Inspired by ‘Bioinformatic data skills’ Vincent Buffalo
34
organization | documentation | analysis | dissemination |
http://guides.lib.purdue.edu/c.php?g=353013&p=2378292
http://kbroman.org/dataorg/
How did you call the last file you generated?
Did you have a plan?
35
organization | documentation | analysis | dissemination |
http://guides.lib.purdue.edu/c.php?g=353013&p=2378292
http://kbroman.org/dataorg/
File naming convention (FNC)
36
organization | documentation | analysis | dissemination |
File naming convention (FNC)
http://guides.lib.purdue.edu/c.php?g=353013&p=2378292
http://kbroman.org/dataorg/
37
organization | documentation | analysis | dissemination |
20130825_DOEProject_Ex1Test1_Data_Gonzalez_v3-03.xlsx
File naming convention (FNC)
Date
Project
Experiment
Type
ID
Version
Specificity
General
Specific
http://guides.lib.purdue.edu/c.php?g=353013&p=2378292
38
organization | documentation | analysis | dissemination |
Get organized! Be happy!
Findable
Accessible
Interoperable
Reusable
https://tinyurl.com/plantae-FAIR
Electronic Notebooks�
39
organization | documentation | analysis | dissemination |
40
organization | documentation | analysis | dissemination |
Paper Lab-notebooks - in use since the 15th Century!
Leonardo da Vinci’s notebook, Codex Arundel c. 1458-1518 British Library
Good record keeping is important for
•Dissemination of ideas, findings
•Legally binding record that protects intellectual property
Not searchable!
Hard to share with collaborators.
Can be easily damaged, misplaced. �Not easy to back up.
41
organization | documentation | analysis | dissemination |
Why should you use an Electronic Lab Notebook?
Store text electronically, Attach Images,
15 GB data limit, Multiple authors possible
Good for sharing data, 2 GB cloud storage limit (free version)
Use the mobile App to quickly upload images
Easily accessible world over
Many more Features!
+
Embed high res images, protocols etc
Searchable
Easily shareable
Export data as PDF �(must back-up data regularly)
42
organization | documentation | analysis | dissemination |
Cost considerations - Softwares available
Paid for— Bio-Itech, LabArchives, LabGuru�
Paid (with free version)— SciNote, Benchling
Open source— Open wet ware, ELOG
Free— OSF (Open Science Framework), LocalWiki
Kanza, 10.1186/s13321-017-0221-3
43
organization | documentation | analysis | dissemination |
One-size does not fit all!
https://datamanagement.hms.harvard.edu/electronic-lab-notebooks
Parameters to consider
Available lab notebooks
ELN Features Matrix
44
organization | documentation | analysis | dissemination |
Basic features of an Electronic Lab notebook
45
organization | documentation | analysis | dissemination |
General tips on electronic record keeping
‘Wet lab’ protocol sharing�
46
organization | documentation | analysis | dissemination |
47
organization | documentation | analysis | dissemination |
Description Unavailable
48
organization | documentation | analysis | dissemination |
49
organization | documentation | analysis | dissemination |
Description Ambiguous
How to overcome this problem? - Don’t contribute to it!
50
organization | documentation | analysis | dissemination |
Description Insufficient
Write Detailed Protocols
https://www.protocols.io/view/how-to-make-your-protocol-more-reproducible-discov-g7vbzn6�https://www.aje.com/en/arc/how-to-write-an-easily-reproducible-protocol/
51
organization | documentation | analysis | dissemination |
Share protocols on the right platforms
52
organization | documentation | analysis | dissemination |
Share protocols on the right platforms
53
organization | documentation | analysis | dissemination |
What should Ben do?
54
Ben is really excited to join a new team that is performing a chemical screen of plant growth regulators on root architecture. However,
Identify the problem(s)?
Suggest a solution.
Materials and Methods
Plants were grown on appropriate media and roots photographed. Images were analyzed using WinRhizo (Arsenault, J-L., et al. 1995) and data presented as graphs.
‘Wet lab’ reagent sharing�
55
organization | documentation | analysis | dissemination |
56
Problems with wet-lab reagent availability
Scientist creates & publishes on a reagent
Scientist leaves the lab and stores reagent in freezer
???
???
Other scientists request the reagents, but no one remaining remembers where they are
organization | documentation | analysis | dissemination |
57
Problems with wet-lab reagent availability
organization | documentation | analysis | dissemination |
58
Functions of reagent repositories
They:
organization | documentation | analysis | dissemination |
Process is easier if you:
(All of these are facilitated by other tools discussed in this workshop)
59
Examples of Reagent repositories
organization | documentation | analysis | dissemination |
60
organization | documentation | analysis | dissemination |
Incentivizing reagent sharing
Direct
Indirect
61
Addgene: The nonprofit plasmid repository
Goal: To accelerate science by improving access to research materials and information
Issues Addressed: Difficulties in obtaining, verifying, and using plasmids from other labs
Audience: Academic and nonprofit institutions doing biology research and using plasmids
Services:
organization | documentation | analysis | dissemination |
Bioinformatic tools�
62
organization | documentation | analysis | dissemination |
63
organization | documentation | analysis | dissemination |
Dependency hell
What version of the program, data etc… did I use?
Why did I do this?
https://software-carpentry.org/
‘Bioinformatic data skills’ Vincent Buffalo
Notebooks�
64
organization | documentation | analysis | dissemination |
https://jupyter.org/documentation
https://www.rstudio.com/
Notebooks�
65
organization | documentation | analysis | dissemination |
https://jupyter.org/documentation
https://www.rstudio.com/
66
organization | documentation | analysis | dissemination |
https://jupyter.org/documentation
https://www.rstudio.com/
Version control�
67
organization | documentation | analysis | dissemination |
http://smutch.github.io/VersionControlTutorial//
https://git-scm.com/doc
Version control�
68
organization | documentation | analysis | dissemination |
http://smutch.github.io/VersionControlTutorial//
https://git-scm.com/doc
Version control�
69
organization | documentation | analysis | dissemination |
http://smutch.github.io/VersionControlTutorial//
https://git-scm.com/doc
Version control�
70
organization | documentation | analysis | dissemination |
http://smutch.github.io/VersionControlTutorial//
https://git-scm.com/doc
Version control�
71
organization | documentation | analysis | dissemination |
http://smutch.github.io/VersionControlTutorial//
https://git-scm.com/doc
Google docs does history tracking.
72
organization | documentation | analysis | dissemination |
Dependency hell
Version conflict
How to I install all these different software packages???
Package, dependency, and environment manager�
73
organization | documentation | analysis | dissemination |
https://bioconda.github.io/
https://conda.io/docs/
Containers�
74
organization | documentation | analysis | dissemination |
https://docs.docker.com/
Docker runs images as containers that are
Biocontainers
75
organization | documentation | analysis | dissemination |
Data sharing�
76
organization | documentation | analysis | dissemination |
Data sharing
77
What to share?
Why share?
organization | documentation | analysis | dissemination |
How to share?
Data sharing
78
organization | documentation | analysis | dissemination |
Jonathan D. Wren; URL decay in MEDLINE—a 4-year follow-up study, 2008 Bioinformatics
Data sharing
Specify a data licence:
Specify a code licence:
79
organization | documentation | analysis | dissemination |
Data sharing
Identify mandated or disciplinary repository:
In addition to a specified data repository, you can make a deposit to a general purpose repository:
80
organization | documentation | analysis | dissemination |
Image handling and analysis�
81
organization | documentation | analysis | dissemination |
Data Analysis and Visualization�
82
organization | documentation | analysis | dissemination |
Data presentation is the foundation of our collective scientific knowledge…
Figures are especially important. They often show data for key findings.
83
organization | documentation | analysis | dissemination |
What is good DataViz?
Effective figures should:
84
organization | documentation | analysis | dissemination |
The usual way and its flaws
Issues:
85
organization | documentation | analysis | dissemination |
Why does DataViz matter for reproducibility?
86
organization | documentation | analysis | dissemination |
Why does DataViz matter for reproducibility?
Bar Charts Don’t Allow You to Critically Evaluate Continuous Data
87
Weissgerber et al., 2017 JBC, http://www.jbc.org/content/292/50/20592.full
organization | documentation | analysis | dissemination |
How to Choose the Right Plot
88
organization | documentation | analysis | dissemination |
One Step Further
89
organization | documentation | analysis | dissemination |
Interactive Dot Plot - http://statistika.mfub.bg.ac.rs/interactive-dotplot/
Interactive Line Graph - http://statistika.mfub.bg.ac.rs/interactive-linegraph/
Some Intermediate Options
90
organization | documentation | analysis | dissemination |
https://plot.ly/create/#/
https://www.datawrapper.de/
Which programming language should I use?
91
organization | documentation | analysis | dissemination |
Programming Languages
92
organization | documentation | analysis | dissemination |
Dealing with Data
93
organization | documentation | analysis | dissemination |
Summary
Reproducible research practices enable you to:
The tools discussed here should provide you with the framework to make you research more reproducible and will save you time and resources in the long term
94
organization | documentation | analysis | dissemination |
95
organization | documentation | analysis | dissemination |
Contributors
Sonali Roy
Lenny Teytelman
Sarah Robinson
Benjamin Schwessinger
April Clyburne-Sherin
Nicolas Schmelling
Tracey Weissgerber
Tyler Ford
Joanne Kamens
Steven Burgess
What is one thing that you can do today to start making your research more reproducible?
@eLife
@Addgene
@CodeOceanHQ
@protocols.io
@ASPB
https://tinyurl.com/plantbio-repo
CC BY 4.0