Reproducible research, algorithms, and data
June 25, 2016
Benjamin Haibe-Kains
Princess Margaret Cancer Centre
University Health Network
University of Toronto
Ontario Institute of Cancer Research
Bioinformatics and Computational Genomics Laboratory
Replicability, reproducibility and reusability
→ Replicability
→ Reproducibility
→ Reusability
Building upon previous work
If you can do it with your own functions, you can do it with published algorithms
→ genefu R package reproducing published molecular subtyping classifiers and gene “signatures” with common interface
+ my own models
This holds true for datasets too
→ MetaGxData data packages for breast (n=10,004) and ovarian (n=3,752) cancers
Hard to fully replicate results!
Same algorithm, different implementations, different results
Meta-analysis and comparative studies
With functions and data in hand, hard to resist the temptation to further challenge your model:
→ survcomp R package to compare the prognostic value of published and new gene signatures
Conclusion
Prototyping, implementing, documenting, testing, sharing, fixing, testing, extending, sharing, …
This cycle is vital in my lab where code is scrutinized and tested by multiple members before public release
This helped me improve my Science and truly value the benefits of data and code sharing
Acknowledgements
BHK lab
Princess Margaret Cancer Centre
Collaborators
Thank you
for your attention!
Questions?