1 of 12

The Project Pythia Cookbook Initiative

Building an Inclusive Geoscience Community through Accessible, Reusable, and Reproducible Workflows

Brian E. J. Rose, John Clyne, Ryan May, James Munroe, Amelia Snyder, Orhan Eroglu, Kevin Tyle, Drew Camron, Max Grover, Julia Kent, Robert Ford

2 of 12

Project Pythia: what / why?

A Community Learning Resource for Geoscientists

  1. Geoscience community moving to open source software and cloud computing for analysis to better support open, reproducible science
  2. Python ecosystem, cloud computing are complex and dynamic environments
  3. Geoscientists are not computer scientists (but are increasingly reliant on a complex and ever-changing collection of computing technologies)
  4. Little training material exists that is focused specifically on needs of geoscientists
  • Wide. Open. Science. requires new skills!

3 of 12

Pythia Foundations - what every geoscientist should know

Binderized for one-click interactive learning

4 of 12

Pythia Cookbooks

Cookbooks are community-contributed collections of advanced or domain-specific tutorials and example workflows

Essential features of Pythia Cookbooks:

  • Explicitly build upon Foundations
  • Demonstrate real workflows on publicly available data
  • Binderized for interactive learning
  • Backed by automated testing infrastructure to ensure that the example code “just works” and stays relevant

Starting points for new geoscience analysis using the Python stack

5 of 12

Why Cookbooks?

What problems are Cookbooks trying to solve?

Jupyter Notebooks are an awesome way to share scientific workflows, but…

6 of 12

What problems are Cookbooks trying to solve?

Jupyter Notebooks are awesome, but…

  • Ambiguity: Jupyter notebooks don’t fully describe their own execution environment

A great tool for packaging Notebooks and conda environment descriptions into easy-to-navigate Web pages, with Binder links for execution

Cookbooks are executable and reproducible

7 of 12

What problems are Cookbooks trying to solve?

Jupyter Notebooks are awesome, but…

  • Obsolescence: most Notebooks found “in the wild” will not run and/or will not reproduce themselves

We need a CI service that can perform regular “health-checking” of notebook code!

Cookbooks are versioned and maintained

8 of 12

What problems are Cookbooks trying to solve?

Jupyter Notebooks are awesome, but…

  • Collaboration and Attribution: Notebooks don’t play very well with GitHub pull requests

We need to execute notebooks and generate + deploy a preview of the rendered book to facilitate review and merge cycles

Cookbooks are collaborative scholarly objects

9 of 12

What problems are Cookbooks trying to solve?

Jupyter Notebooks are awesome, but…

  • GitHub pull requests
  • Findable and Accessible: using Notebooks to share knowledge about scientific workflows requires an audience!

We should have a community repository for sharing workflows that represent established best practices! �

And it should be organized and filterable

Cookbooks are open and community-owned

10 of 12

What problems are Cookbooks trying to solve?

Jupyter Notebooks are awesome, but…

  • Scalability: tutorials that run in a limited sandbox don’t offer clearest paths to doing new science on real data

We need to be able to route notebook execution to the appropriate compute resource for its content!

Cookbooks are portable – bring the compute to the data

11 of 12

Pythia Cookbook Gallery example

12 of 12

How might Project Pythia be useful to the Digital Earths Global Hackathon?

  1. Educational resource for participants that are new to Python, Xarray, Zarr, GitHub, and many more technologies

  • Hackathon outcome: Useful workflows developed by participants could be turned into Pythia Cookbooks and benefit a much broader community