1 of 11

Cloud Environment Opportunities:

Managed JupyterHub options for Cryosphere and Earthdata user communities

Amy Steiker, Andrew Barrett, & Luis López

NSIDC DAAC UWG

September 2023

2 of 11

When to Cloud?

  • What is the data volume?
  • How long will it take to download?
  • Can you store all that data (cost and space)?
  • Do you have the computing power for processing?
  • Does your team need a common computing environment?
  • Do you need to share data at each step or just an end product?

3 of 11

Solutions for Accessing Earthdata Cloud

Do it yourself

  • Create an AWS Account
  • Connect to EC2 instance
  • See Earthdata Cloud Primer for more setup and cost management information

Managed Cloud Service

  • Organizations like 2i2c provide cloud-hosted JupyterHubs for research and education
  • Your institution may also support smaller or larger scale options

https://2i2c.org/service/

Other platforms and services exist to customize and reduce barriers

4 of 11

NASA Openscapes 2i2c Hub

Who is it for? Users learning to transition workflows to the cloud via Openscapes Science Champions and Openscapes training events; DAAC user support

Why was it established? A common, easy to access AWS us-west-2 environment did not exist within EOSDIS prior to this

Who is building or supporting this? NASA-funded grant; supported and maintained by NASA Openscapes Mentors

What are some key features? Easy web-based access using GitHub auth; Support for Python, Matlab, R; reproducible Conda environment with multiple kernels

How do I get started? Join an upcoming Openscapes event or Science Champions cohort!

5 of 11

CryoCloud

Who is it for? NASA Cryospheric scientists seeking a tailored, community-driven shared cloud environment

Why was it established? Addressing cloud computing barriers identified by ICESat-2 Science Team: Cost opacity, deployment complexity, lack of community knowledge

Who is building or supporting this? NASA-funded: ICESat-2 Project, TOPS

What are some key features? Similar features to Openscapes Hub; Responsive, collaborative team to support needs of the Cryosphere community

How do I get started? Sign up as part of UWG meeting prep; See the Getting Started page for more info

6 of 11

Earthdata Cloud Playground

Who is it for? Users seeking a “playground” to test their workloads running in the cloud and then transition to their own separately-funded cloud environment; NASA Earthdata learning events

Why was it established? A need for a persistent Earthdata Cloud environment as a core service came about through NASA Openscapes event feedback and subsequent mentor advocacy.

Who is building or supporting this? Supported by NASA SMCE account; Cross-DAAC and ESDIS-provided infrastructure, management, and user resources via NASA Openscapes

What are some key features? Similar features to Openscapes Hub; persistent (long-term) support

How do I get started? “Friends and family” release coming soon; more public release in 2024

7 of 11

ASF’s OpenScienceLab

Who is it for? Targeted but not limited to SAR users; Classroom settings where a shared compute environment is warranted

Why was it established? Addressing SAR data science challenges including SAR-specific and interdependent Python packages, collaboration pitfalls with varying environments, large SAR data volume, limited resources of SAR scientists

Who is building or supporting this? Alaska Satellite Facility

What are some key features? Free limited access to a cloud-hosted JupyterHub; Free fast data transfer to users’ storage from ASF AWS archives; Identical, fully configured, persistent computing software and hardware environments that can be shared by multiple users; An open library of data recipes

How do I get started? Email OSL admin: https://opensciencelab.asf.alaska.edu/portal/hub/login?next=%2Fportal%2Fhub%2Fhome

8 of 11

Coiled

Who is it for? Reduces DevOps hurdles when working with AWS directly; Hub users looking to scale or transition to a more permanent cloud workflow

Why was it established? Addressing cost, DevOps challenges that scientists face when trying to deploy Dask

Who is building or supporting this? The Coiled company

What are some key features? Offers cloud compute via script on local machine or Hub, replicating your software; Spending caps and Cost observability for teams; Computation observability; Responsive support

How do I get started? Testing opportunity via NASA Openscapes. We’ll cover this in more detail during our demo session.

Read more about how 250TB of data were processed in ~20 min for ~$25 using Dask, Coiled, and xarray: Medium article

9 of 11

Nebari

Who is it for? Research teams with DevOps resources interested in setting up and managing custom cloud deployment

Why was it established? Intended for more custom, autonomous, larger-scale JupyterHub deployments

Who is building or supporting this? Community-led, open source. See How To Contribute

What are some key features? Full fledged Jupyterhub deployment with Kubernetes (container orchestration) integration; Managed by the person or organization who deploys it

How do I get started? See Nebari Getting Started documentation

10 of 11

Recap

  • First ask yourself: When to cloud?
    • You may continue to download data, or work locally using cloud-based service outputs, and optionally take advantage of cloud
  • If you “passed go”, there are a growing number of options to easily onboard to a cloud environment
  • These are not exhaustive! We want to hear from you on other options you are pursuing and how your cloud transition is going

11 of 11

Thank you for listening