Efforts to support end users in the journey to the cloud
Open Source Science Data Repositories Workshop
Amy Steiker • Alexis Hunzinger • Luis Lopez • Catalina Oaida Taglialatela • Aaron Friesz
and the NASA Openscapes Mentors
OSS Data Repositories Workshop
September 27 2022
NASA Award# 20-TWSC20-2-0003 Leads: Julia Stewart Lowndes & Erin Robinson
Openscapes artwork by Allison Horst; @allison_horst
slides: https://nasa-openscapes.github.io/about
We believe Open Science can accelerate data- driven solutions and increase diversity, equity, inclusion, and belonging in research and beyond.
NASA Openscapes
We are a mentor community across
NASA Earth science data centers (DAACs)
We are co-creating and teaching common tutorials to support researchers as they migrate analytical workflows to the Cloud
Agenda: short lightning talks
Perspectives from the NASA Openscapes DAAC mentors
Extra slides also showcase Earthdata Cloud Cookbook - Cheatsheets (Catalina Oaida Taglialatela & Cassandra Nickles, PO.DAAC); see https://nasa-openscapes.github.io/about.html#slides
Time | Topic | Presenter |
6 mins | DAAC internal training | Alexis Hunzinger, Christine Smit (GES DISC) |
6 mins | End-user training events | Amy Steiker (NSIDC DAAC) |
6 mins | earthaccess python library | Luis Lopez (NSIDC DAAC) |
What: Brief overview on some of the ways DAACs have started supporting the end-users transition to the cloud paradigm
Why: Share and learn from each other, grow and improve DAAC support of cloud-archived science and applications users, while following open source, open science best practices.
DAAC Internal Training �
Knowledge From Within
Giving DAAC Staff Hands-On Experience in the Cloud
Presenters:
Alexis Hunzinger, Chris Battisto
Helpers:
Allison Alcott, Binita KC, Christine Smit
Teach cloud basics and definitions
Grant access to cloud workspace
Interact with one cloud data access method: direct S3 access
Walk a mile in a user’s shoes
GESDISC
How did we do it?
1. Educate
Present and trial cloud user resources at weekly meetings ahead of workshop
2. Prepare
Split participants by skill and experience level with Python and cloud computing
3. Interact
Teach with bite-sized lessons using Jupyter Notebooks��Encourage type-along during interactive workshop
Tutorial
Template
How did it go?
Grant access to cloud workspace
Cloud Understanding
Python/Jupyter Notebook
Python
Jupyter Notebook
LEAST CONFIDENT
MOST CONFIDENT
NOVICE
EXPERT
Skill/Experience
Confidence
after workshop
Beginner: 12
Intermediate: 14
Advanced: 7
Total: 33
}
Prerequisite Knowledge?
Cloud Understanding
What did we learn?
Learning curve is STEEP!
No one left an expert and we continue to help staff who are experimenting in the cloud
Continued support and education are critical
Necessary to host refresher workshops to exercise the knowledge and introduce new tools and methods
Provide resources that are easy to revisit
Website, slides, instructions, recordings are all useful for staff who spend more time with the material
Lay a foundation with cloud basics and terminologies
Introduce terms and concepts early, perhaps in a separate meeting or clinic, and continue defining them throughout the workshop
End-user Training Events
Outcomes & Lessons Learned
Cloud Training Events
Openscapes Year 1
Event | Date | Focus Area / Goals |
November 2021 | Five day collaborative open science learning experience aimed at exploring, creating, and promoting effective cloud-based science and applications workflows using NASA Earthdata Cloud data, tools, and services (among others). | |
December 2021 | Half-day workshop focused on enabling Analysis in the Cloud using NASA Earth Science Data | |
March 2022 | Preparing for Surface Water and Ocean Topography (SWOT) and enable the (oceanography) science team to be ready for processing and handling the large volumes of SWOT SSH data in the cloud. | |
April 2022 | Exposing ECOSTRESS data users to ECOSTRESS version 2 (v2) data products in the cloud. Learning objectives focus on how to find and access ECOSTRESS v2 data from Earthdata Cloud either by downloading or accessing the data on the cloud. | |
April, May 2022 | A series of Jupyter Notebooks, written in Python, demonstrating how to get started with NASA Earthdata in the cloud. Topics include: Cloud Data Access in AWS, Cloud Optimized Data, Data Discovery using STAC via NASA’s CMR-STAC API, Working with Cloud Data |
Outcome: These events markedly raise cloud comfort level
"[We need] Better documentation/tutorials for how to access data over the cloud. It would have been extremely difficult to do any of this without the help of the hackathon."
Before
After
Outcome: Understanding the why
“...It was really eye-opening to not be constrained by my local computer….”
"... More realistically, I will probably use many of these tools on my local machine unless I'm working with big datasets that really benefit from cloud computing.”
Learners’ takeaways predominantly centered around improved conceptual understanding of why and when to use, or not use, the cloud…
… While also recognizing that there is a significant learning curve and time investment required for adoption
Credit: Open Architecture for scalable cloud-based data analytics. From Abernathey, Ryan (2020): Data Access Modes in Science.
Common Pain Points
→ Leads to difficulties reusing a given workflow
"Cloud-based tools are not mature enough to use in a research-focused applications. It seems like the API's are being developed to be extremely flexible and powerful, but the use-cases for any particular researcher are much more narrow. The API examples use an already complex toolchain (Jupyter, cloud, python3 stack, etc.) to call a complex API (harmony or cmr) and perform a simple task"
→ Learners struggle to know when to use a given workflow or tool/service
Moving forward
"It would be great to see a tutorial or detailed example of how to set up our own jupyter environment. Is there a way we can track how much the work we're doing using this 2i2c environment costs, to give us a better idea of eventual charges for data processing?"
Open Science Enablement
Collaboration tools & methods, supporting interagency and intercloud workflows
Advanced Cloud Processing
Spinning up larger, parallel resources for big data analysis, optimizing & standardizing code
Spinning up a permanent cloud environment
Leveraging 2i2c environment, understanding cost, funding mechanisms
Continuing to support Openscapes 2i2C Hub
Reducing barriers to cloud entry; Meeting users where they are; power in a shared environment
earthaccess
NASA Data Search and Access in Python
Luis López et. al.
Software Engineer @ NSIDC
earthaccess
Overview
Reproducible workflows are extremely important in the age of cloud data access, cloud computing, and open science.
In this context, we are developing earthaccess, a python library that aims to simplify data discovery and access for those using the PyData ecosystem (xarray, dask, numpy).
Using this library eliminates the need to know the intricacies of NASA’s Application Programming Interfaces (APIs) and cloud data storage systems.
The Problem: API Fragmentation
In order to programmatically access NASA datasets, users must be familiar with:
Software Engineer
Geo
Scientist
API fragmentation in the cloud
API fragmentation in the notebook�from * import *
Image credit: Patrick Quinn
earthaccess: when to use
earthaccess: simplifying access
See it in action!! Analyzing Sea Level Rise Using Earth Data in the Cloud
Next Steps
from earthaccess.nsidc import atl06
Thanks!
Thanks to the people who made this possible!
Not pictured: more people!
Earthdata Cloud Cookbook:
Workflow and Vocab Cheatsheets
Catalina Oaida Taglialatela (PO.DAAC), Cassie Nickles (PO.DAAC), Julie Lowndes (Openscapes), Amy Steiker (NSIDC), Aaron Friez (LP DAAC), Alexis Hunzinger (GES DISC)
Earthdata Cloud Cookbook
Cheatsheets & Guides
Tools & Services Roadmap
https://nasa-openscapes.github.io/earthdata-cloud-cookbook/cheatsheet.html#tools-services-roadmap
demo
Earthdata Cloud Cookbook
Cheatsheets & Guides
Workflow Cheatsheet
https://nasa-openscapes.github.io/earthdata-cloud-cookbook/cheatsheet.html#workflow-cheatsheet
demo
Earthdata Cloud Cookbook
new Cheatsheets & Guides
Why: Increase accessibility to data & resources; many tools, lots of (new) jargon
What: Conceptual, practical, or reference guides to help users find the paths and tools most useful for a given need; we recognize there is a range of where in the learning process the users find themselves → a range of guides & cheatsheets
How: Developed with NASA Openscapes and other DAAC mentors - consistency across DAACs, in messaging, information, and user experience
Where: Implementation in:
Earthdata Cloud Cookbook
Supporting NASA Earth science research
teams’ migration to the cloud
https://nasa-openscapes.github.io/earthdata-cloud-cookbook/
A place to learn, share, and experiment with NASA Earthdata on the Cloud. We know this has a lot of moving parts, and we are iterating as we go, and welcome feedback and contributions.
Closing