1 of 9

Computing Workflows for Biologists

2 of 9

Consider the Overarching Goals �of the Analysis

  • Working to address a given hypothesis will motivate different analysis strategies than conducting data exploration

3 of 9

Reproducibility Checkpoints�

Reproducibility checkpoints are places in a workflow devoted to scrutinizing its integrity

  • the workflow (or step in the workflow) can be seamlessly used (it doesn’t crash halfway or return error messages)
  • the outcomes are consistent and validated across multiple, identical iterations
  • results should make biological sense

4 of 9

A Roadmap for the Computing Biologist

  • Consider the overarching goals of the analysis
  • Adopt an Iterative, Branching Pattern to Systematically Explore Options
  • Reproducibility Checkpoints
  • Taking Notes for Computational Analysis
  • Shared Responsibility: The Team Approach to Reproducibility and Data Management

Shade and Teal, Computing Workflows for Biologists: A Roadmap

http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002303

5 of 9

Adopt an Iterative, Branching Pattern to Systematically Explore Options�

6 of 9

Taking Notes for Computational Analysis�

  • Take notes like you would for experimental work
  • Comment code
  • Use version control (Github/Gitlab)

7 of 9

What needs to go in notes:

  • Software versions used
  • Description of what the software is doing/goal of that step
  • Brief notes on deviations from default options
  • Workflows can include different software (e.g., PANDAseq to QIIME to R), and should also include all “formatting steps” needed to move between tools hopefully you don’t need to manually format too much; avoid if possible

8 of 9

Shared Responsibility: The Team Approach to Reproducibility and Data Management

We posit that integrity in computational analysis of biological data is enhanced if there is a sense of shared responsibility for ensuring reproducible workflows.

Research teams that work together to develop and debug code, perform internal reproducibility checkpoints for each other, and generally hold one another accountable for high-quality results likely will enjoy a low manuscript retraction rate, high level of confidence in their results, and strong sense of collaboration.

You, your lab mates and PI need to value the time it takes to do analyses reproducibly and correctly

9 of 9

Shared responsibility

  • Shared storage and workspace can facilitate access to all group data
  • Using version control repositories can provide access to code and documentation (Github, Dropbox)
  • Setting expectations for ‘reproducibility checkpoints’ (team “hackathons”: open-computer group meetings dedicated to analysis)
  • Paper reviews
  • Looking for help/support outside the lab (bioinformatics or user groups, office hours, StackOverflow)