1 of 44

How to accelerate user value through open, collaborative development

Sharing earthaccess community success stories and challenges

Amy Steiker1, Matt Fisher2, Joseph H. Kennedy3, Daniel Kaufman4, �Luis A. López1, and the earthaccess community

ESDIS SE TIM

August 22, 2024

1NSIDC DAAC, 2National Snow and Ice Data Center, 3ASF, 4ASDC

2 of 44

Goals of this session

2

  • Bring awareness to the earthaccess community model: what works, what we want to do better, how to get involved

  • Hear from you about your community* development experiences:
    • Identify user value “accelerators” and “decelerators”: What are the elements of community development that enable us to quickly deliver user value? What hinders our progress?

⚠️ When we say “community”, we mean the whole community! We take community to mean fully public, open, and inclusive of anyone interested in engaging; not just groups internal to EOSDIS.

3 of 44

First, what is earthaccess?

4 of 44

4

5 of 44

earthaccess origins

6 of 44

Why now?

NASA Openscapes

  • A mentor team across NASA Distributed Active Archive Centers (DAACs)

  • Co-creating and teaching common tutorials alongside researchers as they migrate analytical workflows to the Cloud

7 of 44

Learning from cross-DAAC Hackathons & Tutorials

"Cloud-based tools are not mature enough to use in a research-focused applications. [...] The API examples use an already complex toolchain (Jupyter, cloud, python3 stack, etc.) to call a complex API (harmony or cmr) and perform a simple task"

Luis presenting then-called earthdata python library, Nov 17 2022!

Tom commenting as a 2022 Openscapes Science Champion,

having been in 2021 Hackathon

8 of 44

Community-first approach:

Reducing barriers to enable growth

9 of 44

Since 4 June 2024: Median time to first response decreased from almost 3 days to just over 2 hours

🚀 earthaccess released in alpha as “earthdata”in public NSIDC GH repo

🧑🏽‍🏫2021 Hackathon

🚀First formal release available in PyPI and Conda Forge under the name “earthaccess”

earthaccess slack channel created in Openscapes workspace

👥First earthaccess biweekly hackday event

🚀v0.9.0 release with four new contributors

🚀v0.10.0 release with seven new contributors

Today: 9 total maintainers (5 outside collaborators); 14 total with repo Triage, Write, Maintain access

📄Documentation updates!

10 of 44

11 of 44

Dev Seed

NSIDC

ASF

ASDC

GES DISC

USGS

Coiled

NASA HQ

UNH / NSIDC UWG

Citizen scientist

ORNL

Openscapes

OB DAAC

ESDIS

12 of 44

How you can do it too

13 of 44

The Contributor Funnel

13

14 of 44

The Contributor Funnel

14

Clear entrypoint

Quick-start

Contributing guide

Welcoming & thanking

Maintainer guide

Governance

15 of 44

Tour of earthaccess’ Contributor Funnel

15

Want to participate? Check out our contributor docs: https://earthaccess.readthedocs.io/en/latest/contributing/

16 of 44

Repercussions

Diversity of perspective / opinion = better outcomes

More contributors = move faster

Quality + good vibes = more fun

16

17 of 44

Resources

Open source guide: https://opensource.guide

“Deep Quality”: https://diataxis.fr/quality

earthaccess source: https://github.com/nsidc/earthaccess

17

18 of 44

Future goals 🚀

19 of 44

Community engagement

Software sustainability �and maintenance

19

  • Develop scaffolding to reduce participation barriers

  • Education and outreach:
    • Community calls
    • Expanding tutorials
    • Hackweek events

Check out our GitHub Issues/Discussion and ROSES proposal for more!

  • Simplify contribution processes & development setup
  • Provide easy test execution
  • Streamlining release process
  • End to end integration tests and documentation rendering

20 of 44

Earthdata Search and earthaccess

$ conda install -c conda-forge \

earthaccess

$ # or: python -m pip install…

$ python

>>> import earthaccess

>>> results = earthaccess.search_data(

short_name='ATL06',

bounding_box=(-10, 20, 10, 50),

temporal=("1999-02", "2019-03"),

count=10

)

21 of 44

Earthdata Search and earthaccess

$ conda install -c conda-forge \

earthaccess

$ # or: python -m pip install…

$ python

>>> import earthaccess

>>> results = earthaccess.search_data(

short_name='ATL06',

bounding_box=(-10, 20, 10, 50),

temporal=("1999-02", "2019-03"),

count=10

)

EXPORT

22 of 44

earthaccess CLI: unified bulk download script

$ # download binary

$ earthaccess download \

--short-name ATL06 \

--bounding_box (-10, 20, 10, 50) \

--temporal "1999-02" "2019-03" \

--count 10�

CLI extension to earthaccess

EXPORT

23 of 44

earthaccess plugin interface

Harmony

HyP3

Sliderule

…��SBAS for InSAR

pair picking

>>> import earthaccess

>>> results = earthaccess.search_data(

short_name='ATL06',

bounding_box=(-10, 20, 10, 50),

temporal=("1999-02", "2019-03"),

count=10

)

>>> transform = earthaccess.reproject(

results,

crs=’EPSG:3413’

)

>>> transform = earthaccess.watch(transform) # Async

[================================================]100%

>>> earthaccess.download(transform)

← abstracts

24 of 44

Expanded language support

25 of 44

Join us for our bi-weekly hackdays, Tuesdays 11-1 MST

  • Fostering new contributions through small group work aligning around specific topics or features. Please reach out if you are interested in joining!
  • See our Announcement and ongoing discussions for more info.

Connect with us!

Not pictured: more people!

26 of 44

Discussion

27 of 44

Discussion

27

  • What are the elements of community development that enable us to quickly deliver user value?

  • What hinders our progress?

  • How do we better engage the user community in ESDIS software development?

28 of 44

Art: Allison Horst

Less code makes reproducible science and open science more accessible.

29 of 44

Backup slides

30 of 44

APIs in the Cloud Era

Credit: Patrick Quinn / ESDIS

TIME

Software Engineer

Not a Software Engineer

31 of 44

31

earthaccess is a Python library that simplifies data discovery and access to NASA Earthdata by providing an abstraction layer to CMR and EDL

  • Authentication: earthaccess handles authentication with NASA EDL.

  • Search: earthaccess abstracts CMR into a pythonic module.

  • Access: earthaccess can download or open data for both cloud and on-prem hosted datasets with the same code.

32 of 44

Earthdata Authentication - Old vs New

32

33 of 44

Earthdata Authentication - Old vs New

33

34 of 44

The power of open science only reaches its full potential if we have easy-to-use workflows that facilitate research in an inclusive, efficient and reproducible way.

Unfortunately —as it stands today— scientists and students alike face a steep learning curve adapting to systems that have grown too complex and end up spending more time on the technicalities of the tools, cloud, and APIs than focusing on their important science.

Problem: data accessibility

Fun fact: Benjamin works at NASA GISS https://www.drbenjamincook.net/

35 of 44

Problem: API fragmentation

In order to programmatically access NASA datasets, new users must be familiar with:

  • Earthdata Login (EDL)
    • How to use it with OAuth, CURL, WGET etc.
    • .netrc
  • Common Metadata Repository (CMR)
    • How to query for what we want
    • How to read the metadata that CMR returns.
  • Cloud
    • AWS
    • S3 buckets, S3 credentials

Software Engineer

Not a Software Engineer

Image Credit: Allison Horst

36 of 44

Timeline of events

36

21 Sept, 2021: earthaccess was initially released in alpha as “earthdata” in public NSIDC github repo

15-19 Nov 2021: Openscapes Cloud Hackathon

8 Dec 2022: First formal release available in PyPI and Conda Forge under the name “earthaccess”

1 Aug 2023: v0.5.3 release included documentation overhaul

22 Sept 2023: earthaccess slack channel created in Openscapes workspace

6 Feb 2024: First earthaccess biweekly hackday event

29 Feb 2024: v0.9.0 release with four new contributors

4 June 2024: Monthly metrics report: Median time to first response = 2 days, 21:09:50

30 June 2024: Monthly metrics report: Median time to first response = 1:37:57

19 July 2024: v0.10.0 release with seven new contributors

31 July 2024: Monthly metrics report: Median time to first response = 2:03:32

16 August 2024: 9 total maintainers (5 outside collaborators); 14 total with repo access (4 outside collaborators)

37 of 44

37

38 of 44

earthaccess in action

Time series of sea level rise

39 of 44

Reproducible science with 3 lines of code

40 of 44

2. Data Access

3. Science

1. Data Discovery

41 of 44

Problem: convoluted notebooks

Reproducible workflows are extremely important in the age of cloud computing and open science. In this context, we developed a python library that aims to simplify data discovery and access for those using NASA Earth data with the PyData ecosystem (xarray, dask, numpy).

earthaccess eliminates the need to know the intricacies of NASA’s Application Programming Interfaces (APIs) and cloud data storage systems.

42 of 44

Contributor story: @Sherwin-14

🪄 “Good first issue” is magic ✨

<Placeholder>

43 of 44

Deep Quality

https://diataxis.fr/quality/

  • Does it anticipate user needs?
  • Does it feel good to use?
  • Is it beautiful?
  • Does it flow?
  • Is it accessible?

44 of 44

Security

😈

“What’s the worst that can happen?”

  • Git is “distributed”
  • Open == auditable
  • Separate software & its deployment
  • Cloud or on-prem CI?