1 of 12

Project Pythia

A Community Learning Resource for Geoscientists

2 of 12

Background

Python has become the number one programming language for science [source: IEEE Spectrum 2019, opensource.com]

The “Scientific Python Ecosystem” is enormous and ever growing

Jupyter Notebooks have emerged as a simplified, web-based mechanism for programming, creating shareable workflows, and supporting back-end computation on “Big Iron”

There is an increasing demand for moving analysis workflows to public and private clouds, and make HPC workflows more public and portable

3 of 12

A snapshot of the Scientific Python Ecosystem

Source: VanderPlas 2017, slide 52

4 of 12

So what is the problem?

  1. Scientific Python Ecosystem is like a two-edged sword with an overwhelming number of choices confronting the user:
  2. Which package(s) do I use?
  3. Do they inter-operate together, and which versions are compatible?
  4. Will they be around a year from now?
  5. Can I trust the results?
  6. Where do I turn for help?
  7. How do I make a general purpose function work with my data?
  8. Cloud computing for science is still an emerging technology trend with not a lot of standardization across providers. I.e. it’s complicated!

5 of 12

Project Pythia Goals

  1. The Pythia Portal: Develop and deploy a searchable online web portal that provides geoscientists at any point in their career with the educational content and real-world examples needed to learn how to navigate and integrate the myriad packages within the Scientific Python Ecosystem

  • Cloud-Deployable Pythia Platforms: Develop a light-weight, Binder-based (or Binder-like on HPC) platform that will make it possible to launch portal content in customizable executable environments in the Cloud with a “single click.”

6 of 12

The Pythia Portal

Training resources

  • Reproducible Jupyter Notebooks & scripts
  • Sample data
  • Tutorials (on-line and videos)
  • Communication forums
  • Many, many links to external content

Sample content

  • Introductory
    • NumPy
    • Conda
    • Git &Github
    • Jupyter Notebooks
  • Geoscience-focused packages
    • Xarray
    • MetPy
    • GeoCAT
  • Scalable workflows
    • Dask
    • Using Cloud resources

Content developed and vetted through coursework at the University at Albany

7 of 12

The Pythia Platform

Binder-like utilities that support interactive cloud execution environments for each Jupyter Notebook

  • Example notebooks on Pythia Portal will be executable on cloud resources
  • Cloud resource instance may be determined by data locality

Targeted Cloud resource providers:

  • AWS, GCP, Azure, XSEDE Jetstream2, and HPC (e.g. Casper/Cheyenne)

8 of 12

Pythia Portal + Pythia Platforms

9 of 12

Open Development

Project Pythia will be a community-owned resource and will follow an Open Development model. The user community is expected to contribute by:

  1. Providing feedback on Project Pythia resources
  2. Helping identify and prioritize content needs
  3. Helping develop new content or identify existing content for inclusion
  4. Responding to questions from other users
  5. Reporting or correcting problems
  6. … and more

All Project Pythia-developed content will be hosted on GitHub

10 of 12

Project Pythia sounds awesome! When can I or my students use it?

  • Draft version of the Pythia Portal available now:

https://projectpythia.github.io

  • Version 1.0 of Pythia Portal: summer of 2021

  • First release of Pythia Platform: summer 2022

11 of 12

Summary

Project Pythia will be a community-owned educational resource for helping geoscientists at all levels of their career become proficient with the Scientific Python Ecosystem

A particular focus will be scalable, cloud-ready workflows

Community engagement will be essential for the success of the project

Please get involved!!!

https://projectpythia.github.io

12 of 12

Acknowledgements

NSF Earth Cube program (award #2026899)

Pangeo community

Numerous technical staff doing the heavy lifting at NCAR, Unidata, University at Albany