1 of 8

NASA Openscapes

Science in The Cloud

Luis López et. al.

Software Engineer @ NSIDC

2 of 8

A tale of two workflows � "This story is based on actual events. Characters and timelines have been changed for dramatic purposes."

  1. Coffee
  2. Open a data provider website
      • Only works on Chrome
  3. Click here, click there
      • Not very reproducible
  4. Download and analyze data using GIS tools
      • If the data required is not that big (terabytes? Forget it!)
      • maybe compile some code
      • Install a library… oh wait what? A compiler error.
  1. Coffee
  2. Go to openscapes.2i2c.cloud
    • No need to install anything
    • Science!

3 of 8

The future: Cloud Deployments

Source: Ryan Abernathey/Pangeo

  • Big Data: datasets are growing too rapidly and legacy software tools for scientific analysis can’t handle them. This is a major obstacle to scientific progress.

  • Technology Gap: a growing gap between the technological sophistication of industry solutions (high) and scientific software (low).

  • Skills Gap: a growing gap between technical skills required to use the cloud.

  • Reproducibility: a fragmentation of software tools and environments renders most geoscience research effectively unreproducible and prone to failure.

4 of 8

2i2c cloud infrastructure

Right to replicate

5 of 8

Openscapes environment

Jupyterlab for the Geosciences

  • Github authentication
  • Session persistence
  • Deployed to us-west-2
  • Reproducible Conda environment
  • Extensibility
  • Multiple kernels are supported
  • Dask-kubernetes!

6 of 8

Integration between Openscapes and 2i2c

Github action

A Github action will be trigger for any change to the Dockerfile or environment.yml. A new conda-lock environment will be created and a new base image build based on this environment (only for linux-64)

Deploy

The updated Docker image can be deployed to the Jupyterhub using its configuration API. A team can be in control of their environment and deploy it in a matter of minutes.

Update environment

We use a CI pipeline that can build multiple Jupyter kernels for our Pangeo deployment. If a team needs a particular Python version or library not included in our base environment they can simple add theirs with an easy “bring your own environment” approach.

7 of 8

Observations (mine)

  • Ready to use cloud environments are very useful and valued by scientists -and developers-.

  • There is a need for this kind of infrastructure on a permanent basis.

  • There are tradeoffs, the cloud is not a silver bullet.
    • Costs
    • Complexity

8 of 8

https://nasa-openscapes.github.io�https://github.com/nasa-openscapes

THANKS!

Artwork by Allison Horst.