1 of 9

Reproducible research, algorithms, and data

June 25, 2016

Benjamin Haibe-Kains

Princess Margaret Cancer Centre

University Health Network

University of Toronto

Ontario Institute of Cancer Research

Bioinformatics and Computational Genomics Laboratory

2 of 9

Replicability, reproducibility and reusability

  • Implement and document your functions

→ Replicability

  • Adapt your functions to similar datasets

→ Reproducibility

  • Extend your functions to datasets generated in different settings (samples, platforms, normalization, ...)

→ Reusability

3 of 9

Building upon previous work

If you can do it with your own functions, you can do it with published algorithms

genefu R package reproducing published molecular subtyping classifiers and gene “signatures” with common interface

+ my own models

This holds true for datasets too

MetaGxData data packages for breast (n=10,004) and ovarian (n=3,752) cancers

4 of 9

Hard to fully replicate results!

  • Devil is in the details
  • Try to reproduce the figures of the main paper
    • Exact same results ~10%
    • Approximately the same ~50%
    • The remaining 40%, well… I guess we are not smart enough to understand the methods section...
  • Start communicating with the authors early on, most are willing to help
  • Tons of unit testing and documentation
  • Make your code and documentation publicly available to get the community to scrutinize your work

5 of 9

Same algorithm, different implementations, different results

6 of 9

Meta-analysis and comparative studies

With functions and data in hand, hard to resist the temptation to further challenge your model:

  • Is my model robust?
  • Is my model’s performance reproducible in multiple independent datasets?
  • How does my model compare to competitors?

survcomp R package to compare the prognostic value of published and new gene signatures

7 of 9

Conclusion

Prototyping, implementing, documenting, testing, sharing, fixing, testing, extending, sharing, …

This cycle is vital in my lab where code is scrutinized and tested by multiple members before public release

This helped me improve my Science and truly value the benefits of data and code sharing

8 of 9

Acknowledgements

BHK lab

Princess Margaret Cancer Centre

  • Deena Gendoo
  • Gregory Chen
  • Natchar Ratanasirigulchai

Collaborators

  • Markus Schroeder
  • Levi Waldron
  • Aleix Prat
  • Joel Parker

9 of 9

Thank you

for your attention!

Questions?