Software preservation is necessary for reproducibility
Vicky Rampin
@VickyRampin | vicky.rampin@nyu.edu
New York University
What is reproducibility?
Reproducibility on a spectrum
Reviewable
Process & tools archived and re-usable
Replicable
Confirmable
Auditable
Open/Reproducible
Auditable research made openly available
Main conclusion can be reached without original materials
Original results can be reached with original materials
Sufficient detail for peer review (default)
COMPUTATIONAL
ENVIRONMENT
DOCUMENTATION
CODE & DATA
ARTICLE
REVIEWABLE RESEARCH
REPLICABLE RESEARCH
AUDITABLE RESEARCH
CONFIRMABLE RESEARCH
COMPUTATIONAL
ENVIRONMENT
DOCUMENTATION
CODE & DATA
ARTICLE
REVIEWABLE RESEARCH
REPLICABLE RESEARCH
AUDITABLE RESEARCH
CONFIRMABLE RESEARCH
Entry points for sustainability efforts
Exact computational environments matter!
“The scripts [...] were found to return correct results on macOS Mavericks and Windows 10. But on macOS Mojave and Ubuntu, the results were off by nearly a full percent.”
Another example of dependency hell...
ReproZip - the reproducibility packer!
How ReproZip is helps with sustainability
Well-bundled:
Generalizable:
Future-proofed:
Some reasons to use ReproZip!
ReproZip Ecosystem
High-level overview of ReproZip in workflow
Once you’re done, use ReproZip!
Work normally
In an institutional, disciplinary, or general repository!
Publish RPZ
Use ReproZip trace to find all dependencies
ReproZip Trace
Make RPZ bundle with everything
ReproZip Pack
1
4
3
2
What can ReproZip pack?
… and more! If you can run it, ReproZip can probably pack it
Packing
Research Process (e.g. a website with DB)
Computational Environment E (Linux)
reprozip
Executing
Tracing
Creating�Configuration
Configuration�File
Reproducible Bundle
(.rpz file)
Configuring
Packing
Input files, output files, parameters
Data
Executable programs and steps
Workflow
Environment variables, software used, dependencies, …
Environment
What ReproZip tracks & keeps:
Original
Author(s)
Unpacking
reprounzip
Unpacking
Computational Environment E’ (potentially different than E)
directory
Linux
chroot
vagrant
Linux�macOS�Windows
docker
Provenance�Graph
VisTrails
Linux
Linux�macOS�Windows
Reproducible
Bundle
(.rpz file)
Singularity (upcoming)
ReproServer
Secondary�User(s)
Some current uses of ReproZip & ReproServer
Facilitating peer review | |
Sharing reproducible research | |
Backend reproducibility | |
Computational science tools | NeuroDocker to minify docker containers�Spot to reconstruct provenance graphs |
Metadata capture & query | |
Digital Preservation |
Example: packing digital humanities plots
Example: unpacking digital humanities plots
BONUS: unpacking digital humanities plots in-browser
Overall summary
Thank you!
Happy to take questions!