Cloud Environment Opportunities:
Managed JupyterHub options for Cryosphere and Earthdata user communities
Amy Steiker, Andrew Barrett, & Luis López
NSIDC DAAC UWG
September 2023
When to Cloud?
Solutions for Accessing Earthdata Cloud
Do it yourself
Managed Cloud Service
https://2i2c.org/service/
Other platforms and services exist to customize and reduce barriers
NASA Openscapes 2i2c Hub
Who is it for? Users learning to transition workflows to the cloud via Openscapes Science Champions and Openscapes training events; DAAC user support
Why was it established? A common, easy to access AWS us-west-2 environment did not exist within EOSDIS prior to this
Who is building or supporting this? NASA-funded grant; supported and maintained by NASA Openscapes Mentors
What are some key features? Easy web-based access using GitHub auth; Support for Python, Matlab, R; reproducible Conda environment with multiple kernels
How do I get started? Join an upcoming Openscapes event or Science Champions cohort!
CryoCloud
Who is it for? NASA Cryospheric scientists seeking a tailored, community-driven shared cloud environment
Why was it established? Addressing cloud computing barriers identified by ICESat-2 Science Team: Cost opacity, deployment complexity, lack of community knowledge
Who is building or supporting this? NASA-funded: ICESat-2 Project, TOPS
What are some key features? Similar features to Openscapes Hub; Responsive, collaborative team to support needs of the Cryosphere community
How do I get started? Sign up as part of UWG meeting prep; See the Getting Started page for more info
Earthdata Cloud Playground
Who is it for? Users seeking a “playground” to test their workloads running in the cloud and then transition to their own separately-funded cloud environment; NASA Earthdata learning events
Why was it established? A need for a persistent Earthdata Cloud environment as a core service came about through NASA Openscapes event feedback and subsequent mentor advocacy.
Who is building or supporting this? Supported by NASA SMCE account; Cross-DAAC and ESDIS-provided infrastructure, management, and user resources via NASA Openscapes
What are some key features? Similar features to Openscapes Hub; persistent (long-term) support
How do I get started? “Friends and family” release coming soon; more public release in 2024
ASF’s OpenScienceLab
Who is it for? Targeted but not limited to SAR users; Classroom settings where a shared compute environment is warranted
Why was it established? Addressing SAR data science challenges including SAR-specific and interdependent Python packages, collaboration pitfalls with varying environments, large SAR data volume, limited resources of SAR scientists
Who is building or supporting this? Alaska Satellite Facility
What are some key features? Free limited access to a cloud-hosted JupyterHub; Free fast data transfer to users’ storage from ASF AWS archives; Identical, fully configured, persistent computing software and hardware environments that can be shared by multiple users; An open library of data recipes
How do I get started? Email OSL admin: https://opensciencelab.asf.alaska.edu/portal/hub/login?next=%2Fportal%2Fhub%2Fhome
Coiled
Who is it for? Reduces DevOps hurdles when working with AWS directly; Hub users looking to scale or transition to a more permanent cloud workflow
Why was it established? Addressing cost, DevOps challenges that scientists face when trying to deploy Dask
Who is building or supporting this? The Coiled company
What are some key features? Offers cloud compute via script on local machine or Hub, replicating your software; Spending caps and Cost observability for teams; Computation observability; Responsive support
How do I get started? Testing opportunity via NASA Openscapes. We’ll cover this in more detail during our demo session.
Read more about how 250TB of data were processed in ~20 min for ~$25 using Dask, Coiled, and xarray: Medium article
Nebari
Who is it for? Research teams with DevOps resources interested in setting up and managing custom cloud deployment
Why was it established? Intended for more custom, autonomous, larger-scale JupyterHub deployments
Who is building or supporting this? Community-led, open source. See How To Contribute
What are some key features? Full fledged Jupyterhub deployment with Kubernetes (container orchestration) integration; Managed by the person or organization who deploys it
How do I get started? See Nebari Getting Started documentation
Recap
Thank you for listening