How to accelerate user value through open, collaborative development
Sharing earthaccess community success stories and challenges
Amy Steiker1, Matt Fisher2, Joseph H. Kennedy3, Daniel Kaufman4, �Luis A. López1, and the earthaccess community
ESDIS SE TIM
August 22, 2024
1NSIDC DAAC, 2National Snow and Ice Data Center, 3ASF, 4ASDC
Goals of this session
2
⚠️ When we say “community”, we mean the whole community! We take community to mean fully public, open, and inclusive of anyone interested in engaging; not just groups internal to EOSDIS.
First, what is earthaccess?
4
earthaccess origins
Why now?
NASA Openscapes
Learning from cross-DAAC Hackathons & Tutorials
"Cloud-based tools are not mature enough to use in a research-focused applications. [...] The API examples use an already complex toolchain (Jupyter, cloud, python3 stack, etc.) to call a complex API (harmony or cmr) and perform a simple task"
Luis presenting then-called earthdata python library, Nov 17 2022!
Tom commenting as a 2022 Openscapes Science Champion,
having been in 2021 Hackathon
Community-first approach:
Reducing barriers to enable growth
Since 4 June 2024: Median time to first response decreased from almost 3 days to just over 2 hours
🚀 earthaccess released in alpha as “earthdata”in public NSIDC GH repo
🧑🏽🏫2021 Hackathon
🚀First formal release available in PyPI and Conda Forge under the name “earthaccess”
earthaccess slack channel created in Openscapes workspace
👥First earthaccess biweekly hackday event
🚀v0.10.0 release with seven new contributors
Today: 9 total maintainers (5 outside collaborators); 14 total with repo Triage, Write, Maintain access
📄Documentation updates!
Dev Seed
NSIDC
ASF
ASDC
GES DISC
USGS
Coiled
NASA HQ
UNH / NSIDC UWG
Citizen scientist
ORNL
Openscapes
OB DAAC
ESDIS
How you can do it too
The Contributor Funnel
13
The Contributor Funnel
14
Clear entrypoint
Quick-start
Contributing guide
Welcoming & thanking
Maintainer guide
Governance
Tour of earthaccess’ Contributor Funnel
15
Want to participate? Check out our contributor docs: https://earthaccess.readthedocs.io/en/latest/contributing/
Repercussions
Diversity of perspective / opinion = better outcomes
More contributors = move faster
Quality + good vibes = more fun
16
Resources
Open source guide: https://opensource.guide
“Deep Quality”: https://diataxis.fr/quality
earthaccess source: https://github.com/nsidc/earthaccess
17
Future goals 🚀
Community engagement
Software sustainability �and maintenance
19
Check out our GitHub Issues/Discussion and ROSES proposal for more!
Earthdata Search and earthaccess
$ conda install -c conda-forge \
earthaccess
$ # or: python -m pip install…
$ python
>>> import earthaccess
>>> results = earthaccess.search_data(
short_name='ATL06',
bounding_box=(-10, 20, 10, 50),
temporal=("1999-02", "2019-03"),
count=10
)
Earthdata Search and earthaccess
$ conda install -c conda-forge \
earthaccess
$ # or: python -m pip install…
$ python
>>> import earthaccess
>>> results = earthaccess.search_data(
short_name='ATL06',
bounding_box=(-10, 20, 10, 50),
temporal=("1999-02", "2019-03"),
count=10
)
EXPORT
earthaccess CLI: unified bulk download script
$ # download binary
$ earthaccess download \
--short-name ATL06 \
--bounding_box (-10, 20, 10, 50) \
--temporal "1999-02" "2019-03" \
--count 10�
CLI extension to earthaccess
EXPORT
earthaccess plugin interface
Harmony
HyP3
Sliderule
…��SBAS for InSAR
pair picking
>>> import earthaccess
>>> results = earthaccess.search_data(
short_name='ATL06',
bounding_box=(-10, 20, 10, 50),
temporal=("1999-02", "2019-03"),
count=10
)
>>> transform = earthaccess.reproject(
results,
crs=’EPSG:3413’
)
>>> transform = earthaccess.watch(transform) # Async
[================================================]100%
>>> earthaccess.download(transform)
← abstracts
Expanded language support
Join us for our bi-weekly hackdays, Tuesdays 11-1 MST
Connect with us!
Not pictured: more people!
Discussion
Discussion
27
Art: Allison Horst
Less code makes reproducible science and open science more accessible.
Backup slides
APIs in the Cloud Era
Credit: Patrick Quinn / ESDIS
TIME
Software Engineer
Not a Software Engineer
31
earthaccess is a Python library that simplifies data discovery and access to NASA Earthdata by providing an abstraction layer to CMR and EDL
Earthdata Authentication - Old vs New
32
Earthdata Authentication - Old vs New
33
The power of open science only reaches its full potential if we have easy-to-use workflows that facilitate research in an inclusive, efficient and reproducible way.
Unfortunately —as it stands today— scientists and students alike face a steep learning curve adapting to systems that have grown too complex and end up spending more time on the technicalities of the tools, cloud, and APIs than focusing on their important science.
Problem: data accessibility
Fun fact: Benjamin works at NASA GISS https://www.drbenjamincook.net/
Problem: API fragmentation
In order to programmatically access NASA datasets, new users must be familiar with:
Software Engineer
Not a Software Engineer
Image Credit: Allison Horst
Timeline of events
36
21 Sept, 2021: earthaccess was initially released in alpha as “earthdata” in public NSIDC github repo
15-19 Nov 2021: Openscapes Cloud Hackathon
8 Dec 2022: First formal release available in PyPI and Conda Forge under the name “earthaccess”
1 Aug 2023: v0.5.3 release included documentation overhaul
22 Sept 2023: earthaccess slack channel created in Openscapes workspace
6 Feb 2024: First earthaccess biweekly hackday event
29 Feb 2024: v0.9.0 release with four new contributors
4 June 2024: Monthly metrics report: Median time to first response = 2 days, 21:09:50
30 June 2024: Monthly metrics report: Median time to first response = 1:37:57
19 July 2024: v0.10.0 release with seven new contributors
31 July 2024: Monthly metrics report: Median time to first response = 2:03:32
16 August 2024: 9 total maintainers (5 outside collaborators); 14 total with repo access (4 outside collaborators)
37
earthaccess in action
Time series of sea level rise
Reproducible science with 3 lines of code
2. Data Access
3. Science
1. Data Discovery
Problem: convoluted notebooks
Reproducible workflows are extremely important in the age of cloud computing and open science. In this context, we developed a python library that aims to simplify data discovery and access for those using NASA Earth data with the PyData ecosystem (xarray, dask, numpy).
earthaccess eliminates the need to know the intricacies of NASA’s Application Programming Interfaces (APIs) and cloud data storage systems.
Contributor story: @Sherwin-14
🪄 “Good first issue” is magic ✨
<Placeholder>
Deep Quality
https://diataxis.fr/quality/
Security
😈
“What’s the worst that can happen?”