1 of 95

Reproducibility for everyone

1

2 of 95

Why does reproducibility matter to you?

2

3 of 95

3

4 of 95

Goals and objectives

4

  • ‘Reproducibility’ framework

  • ‘Reproducibility’ tools

  • Starting point of a ‘lifelong’ journey

Introduction

5 of 95

5

  • What does reproducibility mean?
  • What are the different modes of reproducibility?
  • Is reproducibility all that matters?

  • ‘Reproducibility’ tool shed.
  • organization
  • documentation
  • analysis
  • dissemination

Introduction

6 of 95

What does reproducibility mean?

6

Introduction

7 of 95

What does reproducibility mean?

7

Schloss, 2018

10.1128/mBio.00525-18

Reproducible research: Authors provide all the necessary data and the computer codes to run

the analysis again, re-creating the results.

Barba, 2018

https://arxiv.org/abs/1802.03311

Replication: A study that arrives at the same scientific findings as another study, collecting new data and completing new analyses.

Introduction

8 of 95

8

Methods

Same experimental system

Different experimental system

Same methods

Reproducibility

Replicability

Different methods

Robustness

Generalizability

What are the different modes of ‘reproducibility’?

Introduction

9 of 95

9

Methods

Same experimental system

Different experimental system

Same methods

Reproducibility

Replicability

Different methods

Robustness

Generalizability

What are the different modes of ‘reproducibility’?

Introduction

Reproducibility is the minimum standard for science.

10 of 95

Is reproducibility all that matters?

10

Introduction

11 of 95

11

Casadevall and Fang, 2016

10.1128/mBio.01902-16

Introduction

12 of 95

Every little helps!

No one is perfect!

Everyone starts somewhere!

Transparent and open science!

12

Casadevall and Fang, 2016

10.1128/mBio.01902-16

Introduction

13 of 95

Factors decreasing reproducibility

13

Introduction

14 of 95

Factors decreasing reproducibility

14

Introduction

15 of 95

Factors decreasing reproducibility

15

Introduction

16 of 95

Factors decreasing reproducibility

16

Introduction

17 of 95

Factors decreasing reproducibility

17

Introduction

18 of 95

Factors decreasing reproducibility

18

Introduction

19 of 95

Factors decreasing reproducibility

19

20 of 95

There is good news!

20

“So what’s a careful scientist to do? First and foremost, be aware of the conditions around you that may increase the risk of irreproducible results, whether they are bad ingredients, dubious statistical traditions, or outside pressures that can shape behavior. Also take heart. This reproducibility “crisis” isn’t really a crisis at all. These are not new problems. Rather, I think of this moment as an awakening. And that’s a good thing, because we need to recognize that a problem exists before we can seek solutions.”

Introduction

Richard Harris,

NPR science journalist

Author of “Rigor Mortis”

(book on biomed

reproducibility)

21 of 95

There is good news!

21

“So what’s a careful scientist to do? First and foremost, be aware of the conditions around you that may increase the risk of irreproducible results, whether they are bad ingredients, dubious statistical traditions, or outside pressures that can shape behavior. Also take heart. This reproducibility “crisis” isn’t really a crisis at all. These are not new problems. Rather, I think of this moment as an awakening. And that’s a good thing, because we need to recognize that a problem exists before we can seek solutions.”

Introduction

Richard Harris,

NPR science journalist

Author of “Rigor Mortis”

(book on biomed

reproducibility)

22 of 95

What can we do by the end of the century?

22

Cori Bargmann, HHMI investigator, President CZI Science

"82 years ago, there were no antibiotics & we didn't know that smoking causes lung cancer... We can expect a lot from the next 82 years." ��

Introduction

23 of 95

What can we do by the end of the century?

23

Cori Bargmann, HHMI investigator, President CZI Science

"82 years ago, there were no antibiotics & we didn't know that smoking causes lung cancer... We can expect a lot from the next 82 years." ��"Where can we be in 82 years if we accelerate science?"

Introduction

24 of 95

Where is your greatest potential for growth?

24

Introduction

25 of 95

Where is your greatest potential for growth?

25

Introduction

More detailed methods, analysis and record keeping

More publicly available data including meta-data

Better reagent sharing e.g. plasmids, antibodies ...

Fewer incentives to be first rather than right

26 of 95

Where is your greatest potential for growth?

26

Introduction

More detailed methods, analysis and record keeping

More publicly available data including meta-data

Better reagent sharing e.g. plasmids, antibodies ...

Fewer incentives to be first rather than right

Adopting some of these best practices isn’t just good for other scientists; it’s good for you and will save you time in the long term.

27 of 95

Data management�

27

organization

documentation

analysis

dissemination

http://kbroman.org/dataorg/

https://dmptool.org/

28 of 95

28

organization

documentation

analysis

dissemination

Have a plan! Be happy!

I cannot find this file!

Where is my file?

What version was it?

What did I call that file again?

Was this the wild type picture or the mutant one?

Where is my RAW!!! data?

29 of 95

29

organization

documentation

analysis

dissemination

Think about….

  • What data will be produced as a part of the project
  • How each type of data will be organized, documented, standardized, stored, protected, shared and archived
  • Who will take responsibility for carrying out the activities listed above, and
  • When these activities will take place over the course of the project (and beyond)
  • Metadata

https://www.dataone.org/best-practices

http://guides.lib.purdue.edu/c.php?g=353013&p=2378292

30 of 95

30

organization

documentation

analysis

dissemination

Project directory structure

Project_1

methods

raw_data

analysis

scripts

manuscript

readme and/or ELN link

Inspired by ‘Bioinformatic data skills’ Vincent Buffalo

31 of 95

31

organization

documentation

analysis

dissemination

Project directory structure

Project_1

methods

raw_data

readme

analysis

analysis_method_1

2017

2018

analysis_method_2

scripts

manuscript

text

version_1

readme and/or ELN link

Inspired by ‘Bioinformatic data skills’ Vincent Buffalo

32 of 95

32

organization

documentation

analysis

dissemination

Project directory structure

Always keep raw data!

Project_1

methods

raw_data

readme

analysis

analysis_method_1

2017

2018

analysis_method_2

scripts

manuscript

text

version_1

readme and/or ELN link

Always backup your data! X 3

Inspired by ‘Bioinformatic data skills’ Vincent Buffalo

33 of 95

33

organization

documentation

analysis

dissemination

Project directory structure

Always keep raw data!

Project_1

methods

raw_data

readme

analysis

analysis_method_1

2017

2018

analysis_method_2

scripts

manuscript

text

version_1

readme and/or ELN link

Inspired by ‘Bioinformatic data skills’ Vincent Buffalo

34 of 95

34

organization

documentation

analysis

dissemination

http://guides.lib.purdue.edu/c.php?g=353013&p=2378292

http://kbroman.org/dataorg/

How did you call the last file you generated?

Did you have a plan?

35 of 95

35

organization

documentation

analysis

dissemination

  • Test_data_2013
  • Project_Data
  • Design for project.doc
  • Lab_work_Eric
  • Second_test
  • Meeting Notes Oct 23

http://guides.lib.purdue.edu/c.php?g=353013&p=2378292

http://kbroman.org/dataorg/

File naming convention (FNC)

36 of 95

36

organization

documentation

analysis

dissemination

File naming convention (FNC)

  • Include date in yyyy-mm-dd format
  • Use meaningful abbreviations
  • Have group identifiers
  • Document your decisions
  • Be consistent

http://guides.lib.purdue.edu/c.php?g=353013&p=2378292

http://kbroman.org/dataorg/

37 of 95

37

organization

documentation

analysis

dissemination

20130825_DOEProject_Ex1Test1_Data_Gonzalez_v3-03.xlsx

File naming convention (FNC)

Date

Project

Experiment

Type

ID

Version

Specificity

General

Specific

  • Include data in yyyy-mm-dd format
  • Use meaningful abbreviations
  • Have group identifiers
  • Document your decisions
  • Be consistent

http://guides.lib.purdue.edu/c.php?g=353013&p=2378292

38 of 95

38

organization

documentation

analysis

dissemination

Get organized! Be happy!

Findable

Accessible

Interoperable

Reusable

https://tinyurl.com/plantae-FAIR

39 of 95

Electronic Notebooks�

39

organization

documentation

analysis

dissemination

40 of 95

40

organization

documentation

analysis

dissemination

Paper Lab-notebooks - in use since the 15th Century!

Leonardo da Vinci’s notebook, Codex Arundel c. 1458-1518 British Library

Good record keeping is important for

•Dissemination of ideas, findings

•Legally binding record that protects intellectual property

Not searchable!

Hard to share with collaborators.

Can be easily damaged, misplaced. �Not easy to back up.

41 of 95

41

organization

documentation

analysis

dissemination

Why should you use an Electronic Lab Notebook?

Store text electronically, Attach Images,

15 GB data limit, Multiple authors possible

Good for sharing data, 2 GB cloud storage limit (free version)

Use the mobile App to quickly upload images

Easily accessible world over

Many more Features!

+

Embed high res images, protocols etc

Searchable

Easily shareable

Export data as PDF �(must back-up data regularly)

42 of 95

42

organization

documentation

analysis

dissemination

Cost considerations - Softwares available

Paid for— Bio-Itech, LabArchives, LabGuru�

Paid (with free version)— SciNote, Benchling

Open source— Open wet ware, ELOG

Free— OSF (Open Science Framework), LocalWiki

Kanza, 10.1186/s13321-017-0221-3

43 of 95

43

organization

documentation

analysis

dissemination

One-size does not fit all!

https://datamanagement.hms.harvard.edu/electronic-lab-notebooks

Parameters to consider

Available lab notebooks

ELN Features Matrix

44 of 95

44

organization

documentation

analysis

dissemination

Basic features of an Electronic Lab notebook

45 of 95

45

organization

documentation

analysis

dissemination

General tips on electronic record keeping

  • Back-up data regularly

  • Maintain a physical notebook in parallel

  • Mobile apps provide added portability

  • If using free ELNs, check privacy policies

46 of 95

‘Wet lab’ protocol sharing�

46

organization

documentation

analysis

dissemination

47 of 95

47

organization

documentation

analysis

dissemination

Description Unavailable

48 of 95

48

organization

documentation

analysis

dissemination

49 of 95

49

organization

documentation

analysis

dissemination

Description Ambiguous

50 of 95

How to overcome this problem? - Don’t contribute to it!

50

organization

documentation

analysis

dissemination

Description Insufficient

51 of 95

Write Detailed Protocols

51

organization

documentation

analysis

dissemination

  • Think of a protocol as a brief, modular and self-contained scientific publication.
  • Include a 3-4 sentence abstract that puts the methodology in context.
  • Include as much detail as possible (Duration/time per step, Reagent Amount, vendor name, Catalog number, Expected result, Safety information, Software package)
  • Chronology of steps.
  • Notes, recipes, tips, and tricks

52 of 95

Share protocols on the right platforms

52

organization

documentation

analysis

dissemination

53 of 95

Share protocols on the right platforms

  • Bio-protocol (free to read & publish, but need invitation or pre-submission inquiry)�2. JOVE - Journal Of Visual Experiments (nice videos but costly and not open access)�3. protocols.io (free to read & publish but not peer-reviewed)

53

organization

documentation

analysis

dissemination

54 of 95

What should Ben do?

54

Ben is really excited to join a new team that is performing a chemical screen of plant growth regulators on root architecture. However,

  • The previous Postdoc started a new job and refuses to respond to his emails.
  • The technician on the project was only involved in the data acquisition steps.
  • Unfortunately, the lab notebook went missing in a recent move to a new floor.
  • The methods section in a previous paper reads like this -

Identify the problem(s)?

Suggest a solution.

Materials and Methods

Plants were grown on appropriate media and roots photographed. Images were analyzed using WinRhizo (Arsenault, J-L., et al. 1995) and data presented as graphs.

55 of 95

‘Wet lab’ reagent sharing�

55

organization

documentation

analysis

dissemination

56 of 95

56

Problems with wet-lab reagent availability

Scientist creates & publishes on a reagent

Scientist leaves the lab and stores reagent in freezer

???

???

Other scientists request the reagents, but no one remaining remembers where they are

organization

documentation

analysis

dissemination

57 of 95

57

Problems with wet-lab reagent availability

  • Wasted time, money, and resources when reagents are recreated
  • Mistakes in recreation can lead to spurious results
  • Individual labs don’t usually have the resources to:
    • Keep track of all reagents created in lab
    • Consistently validate all reagents in the lab
    • Properly label and store all reagents
    • (Legally) distribute all reagents to interested researchers
  • Reagents repositories are part of the solution!

organization

documentation

analysis

dissemination

58 of 95

58

Functions of reagent repositories

They:

  • Verify reagents
  • Curate reagents
  • Facilitate and track shipping
  • Protect IP

organization

documentation

analysis

dissemination

Process is easier if you:

  • Record how a reagent was created
  • Provided associated publications
  • Provided associated protocols

(All of these are facilitated by other tools discussed in this workshop)

59 of 95

59

Examples of Reagent repositories

  • Addgene
  • DNASU
  • ATCC
  • NCI Mouse Repository
  • Coriell Institute
  • ABRC
  • The Bloomington Drosophila Stock Center
  • Developmental Studies Hybridoma bank

organization

documentation

analysis

dissemination

60 of 95

60

organization

documentation

analysis

dissemination

Incentivizing reagent sharing

Direct

  • Archiving
  • Reducing time spend sending out reagents
  • Occasional monetary benefits

Indirect

  • Creation of educational content
  • Direct promotion
  • Analysis of reagent distribution

61 of 95

61

Addgene: The nonprofit plasmid repository

Goal: To accelerate science by improving access to research materials and information

Issues Addressed: Difficulties in obtaining, verifying, and using plasmids from other labs

Audience: Academic and nonprofit institutions doing biology research and using plasmids

Services:

  • Stores and distributes plasmids and viral vectors
  • Verifies plasmids and viral vectors through DNA sequencing with some functional testing
  • Collates/curates information about plasmids and viral vectors
  • Produces and freely distributes educational content to make it easier for scientists to learn about and use new technologies

organization

documentation

analysis

dissemination

62 of 95

Bioinformatic tools�

62

organization

documentation

analysis

dissemination

63 of 95

63

organization

documentation

analysis

dissemination

Dependency hell

What version of the program, data etc… did I use?

Why did I do this?

https://software-carpentry.org/

64 of 95

Notebooks�

64

organization

documentation

analysis

dissemination

https://jupyter.org/documentation

https://www.rstudio.com/

  • Keep track of analysis
  • Interactive coding
  • Interactive data exploration
  • Imbedded visualization
  • Easy access to docstrings
  • Mix of code and documentation

65 of 95

Notebooks�

65

organization

documentation

analysis

dissemination

https://jupyter.org/documentation

https://www.rstudio.com/

  • Keep track of analysis
  • Interactive coding
  • Interactive data exploration
  • Imbedded visualization
  • Easy access to docstrings
  • Mix of code and documentation

  • Over 40 programming languages
  • Easily shared
  • Widgets
  • Interactive plots
  • Run remotely on server

66 of 95

66

organization

documentation

analysis

dissemination

https://jupyter.org/documentation

https://www.rstudio.com/

67 of 95

Version control�

67

organization

documentation

analysis

dissemination

http://smutch.github.io/VersionControlTutorial//

https://git-scm.com/doc

  • Records changes
  • Keeps track of change history
  • Illustrates changes between versions
  • Lets you share your code easily
  • Lets you collaborate on your code more easily

68 of 95

Version control�

68

organization

documentation

analysis

dissemination

http://smutch.github.io/VersionControlTutorial//

https://git-scm.com/doc

69 of 95

Version control�

69

organization

documentation

analysis

dissemination

http://smutch.github.io/VersionControlTutorial//

https://git-scm.com/doc

70 of 95

Version control�

70

organization

documentation

analysis

dissemination

http://smutch.github.io/VersionControlTutorial//

https://git-scm.com/doc

71 of 95

Version control�

71

organization

documentation

analysis

dissemination

http://smutch.github.io/VersionControlTutorial//

https://git-scm.com/doc

Google docs does history tracking.

72 of 95

72

organization

documentation

analysis

dissemination

Dependency hell

Version conflict

How to I install all these different software packages???

73 of 95

Package, dependency, and environment manager�

73

organization

documentation

analysis

dissemination

https://bioconda.github.io/

https://conda.io/docs/

  • Handles installs and dependencies
  • Allows for multiple independent environments
  • Easily configurable
  • Allows for manual installs as well
  • Runs on all three major systems
  • Open source

  • You can package your own work and contribute

74 of 95

Containers�

74

organization

documentation

analysis

dissemination

https://docs.docker.com/

Docker runs images as containers that are

  • self contained with all code, programs, libraries included. No subsequent installation required.
  • Isolated
  • Portable including dissemination
  • Lightweight

Biocontainers

75 of 95

75

organization

documentation

analysis

dissemination

76 of 95

Data sharing�

76

organization

documentation

analysis

dissemination

77 of 95

Data sharing

77

What to share?

  • Share research data and code that is necessary to validate findings & reproduce results of research outputs
  • Share data and code that might be valuable to other researchers or policy-makers
  • Share data and code which cannot be (easily) re-generated

Why share?

  • Funder or publisher mandates
  • Citation benefits (Piwowar 2013, https://doi.org/10.7717/peerj.175)
  • Preserve long-term access to data

organization

documentation

analysis

dissemination

How to share?

  • Choose open, persistent, and non-proprietary file formats
  • Create and share documentation to enable reuse
  • Include data citations of source data
  • Create rich metadata

78 of 95

Data sharing

78

organization

documentation

analysis

dissemination

Jonathan D. Wren; URL decay in MEDLINE—a 4-year follow-up study, 2008 Bioinformatics

https://doi.org/10.1093/bioinformatics/btn127

79 of 95

Data sharing

  • Use a data repository, not your website!
    • Repositories provide
      • Persistent identifiers for your data like a DOI
        • Unique and citable
        • Prevents “link rot”
      • Persistent access
      • Preservation
      • Backup
      • Management of access
      • Versioning
      • Licensing

Specify a data licence:

Specify a code licence:

79

organization

documentation

analysis

dissemination

80 of 95

Data sharing

Identify mandated or disciplinary repository:

  • Funder specified repository
  • Institutionally specified data repository
  • Domain or discipline-specific data repository
    • Find and compare disciplinary repositories using the Repository of Research Data Repositories https://www.re3data.org/

In addition to a specified data repository, you can make a deposit to a general purpose repository:

  • DataDryad http://datadryad.org/ (curated digital repository; free to access, $120 to publish dataset up to 20GB)
  • Figshare https://figshare.com/ (free digital repository, 5GB per file limit)
  • Zenodo https://zenodo.org/ (free digital repository; 50GB per dataset limit)

80

organization

documentation

analysis

dissemination

81 of 95

Image handling and analysis�

81

organization

documentation

analysis

dissemination

82 of 95

Data Analysis and Visualization�

82

organization

documentation

analysis

dissemination

83 of 95

Data presentation is the foundation of our collective scientific knowledge…

Figures are especially important. They often show data for key findings.

83

organization

documentation

analysis

dissemination

84 of 95

What is good DataViz?

Effective figures should:

  • Immediately convey information about the study design
  • Illustrate important findings
  • Allow the reader to critically evaluate the data

84

organization

documentation

analysis

dissemination

85 of 95

The usual way and its flaws

Issues:

  • Reproducible Workflows?
    • Problems can be avoided by using macros or dashboards
    • However, who uses these?
  • Excel Renames Genes
    • Ziemann et al., 2016 - https://doi.org/10.1186/s13059-016-1044-7
    • 20% of papers in leading genomic journals contain gene list errors
  • Default Plots are often Bar Charts and Line Plots

85

organization

documentation

analysis

dissemination

86 of 95

Why does DataViz matter for reproducibility?

86

organization

documentation

analysis

dissemination

87 of 95

Why does DataViz matter for reproducibility?

Bar Charts Don’t Allow You to Critically Evaluate Continuous Data

87

Weissgerber et al., 2017 JBC, http://www.jbc.org/content/292/50/20592.full

organization

documentation

analysis

dissemination

88 of 95

How to Choose the Right Plot

88

organization

documentation

analysis

dissemination

89 of 95

One Step Further

89

organization

documentation

analysis

dissemination

90 of 95

Some Intermediate Options

90

organization

documentation

analysis

dissemination

https://plot.ly/create/#/

https://www.datawrapper.de/

91 of 95

Which programming language should I use?

  • Select a language that is used in your lab or community
  • *Select a general purpose language such as Python to start with if you don’t have a specific problem. That way you learn basic programming skills, which allows you to switch to other languages more easily, and you can tackle different problems. You usually learn multiple languages anyway.

91

organization

documentation

analysis

dissemination

92 of 95

Programming Languages

  • Anaconda (Distribution)
  • Numpy & Pandas (Data Wrangling)
  • Scipy (Higher Math)
  • Matplotlib (Basic Graphs)
  • Seaborn, Bokeh, Altair, Plotly (Advanced Statistical & Interactive Graphs)
  • Jupyter notebook / lab (Interactive Notebook)
  • tidyverse (Distribution)
  • dplyr & tidyr (Data Wrangling)
  • ggplot / ggplot2 (Basic Graphs)
  • shiny / RMarkdown (Interactive Notebook)
  • RStudio (Interactive Notebook)

92

organization

documentation

analysis

dissemination

93 of 95

Dealing with Data

  • Provide Open-Source Data (Rule 2 of Enable Multi-site Collaborations through Data Sharing)
  • Keep Raw Data Raw (Rule 3 of Digital Data Storage)
  • Store Data in Open Formats (Rule 4 of Digital Data Storage)
  • Data Should Be Structured for Analysis (Rule 5 of Digital Data Storage)
  • Data Should Be Uniquely Identifiable (Rule 6 of Digital Data Storage)
  • Link Relevant Metadata (Rule 7 of Digital Data Storage)
  • Have a Systematic Backup Scheme (Rule 9 of Digital Data Storage)
  • Archive The Data Appropriately

93

organization

documentation

analysis

dissemination

94 of 95

Summary

Reproducible research practices enable you to:

  • Organize experiments productively
  • Accurately analyze results
  • Share results with future researchers
  • Share techniques
  • Share reagents with future researchers
  • Accelerate science!

The tools discussed here should provide you with the framework to make you research more reproducible and will save you time and resources in the long term

94

organization

documentation

analysis

dissemination

95 of 95

95

organization

documentation

analysis

dissemination

Contributors

Sonali Roy

Lenny Teytelman

Sarah Robinson

Benjamin Schwessinger

April Clyburne-Sherin

Nicolas Schmelling

Tracey Weissgerber

Tyler Ford

Joanne Kamens

Steven Burgess

What is one thing that you can do today to start making your research more reproducible?

@eLife

@Addgene

@CodeOceanHQ

@protocols.io

@ASPB