1 of 33

OSS Knowledge Management

2 of 33

Background Timeline

December 2018

Request for Information:

Strategic Plan for

Scientific Data and

Computing

Feedback on:

  • Community envisioned a single, web-accessible service for discovering & requesting data across SMD divisions
  • Metadata standards

2019

SMD’s Strategy for Data Management and Computing for Groundbreaking Science

Goal to Develop and Implement Capabilities to Enable Open Science including the recommendation to develop and implement a centralized data discovery and access capability.

August 2018

NASA Archives Processing

and Data Exploitation Workshop

Discussion on needs including:

  • Centralized data discovery & access across SMD
  • Interoperability & interdisciplinary team access to data
  • Sharing data stewardship expertise across SMD
  • Cross-divisional thesauri

October 2018

Workshop on Maximizing

the Scientific Return of

NASA Data

Discussion on:

  • Data discovery in the big data era
  • Metadata standards
  • Open science & data access
  • Breaking down silos between disciplines

Slide courtesy Kaylin Bugbee

3 of 33

3

Goal 1: Develop and Implement Capabilities to Enable Open Science

Goal 2: Continuous Evolution of Data and Computing Systems

Goal 3: Harness the Community and Strategic Partnerships for Innovation

1.1

Develop and implement a consistent open data and software policy tailored for SMD

2.1

Establish standardized approaches for all new missions and sponsored research that encourage the adoption of advanced techniques

3.1

Develop community of practice and standards group

1.2

Upgrade capabilities at existing archives to support machine readable data access using open formats and data services

2.2

Integrate investment decisions in High-End Computing with the strategic needs of the research communities

3.2

Partner with academic, commercial, governmental and international organizations

1.3

Develop and implement a SMD data catalog to support discovery and access to complex scientific data across divisions

2.3

Invest in capabilities to use commercial cloud environments for open science

3.3

Promote opportunities for continuous learning as the field evolves through collaboration

1.4

Increase transparency into how science data are being used through a free and open unified journal server

2.4

Invest in the tools and training necessary to enable breakthrough science through application of AI/ML

SMD Strategy for Data Management and Computing for Groundbreaking Science 2019-2024

4 of 33

Morning agenda – setting the stage for collaboration

  • Overview of the Science Discovery Engine — Kaylin Bugbee
  • Overview of the Astrophysics Data System — Alberto Accomazzi
  • Overview of Open Science Guidelines Process — Mark Parsons
  • Provocateurs:
    • Ensuring sufficient quality metadata for interdisciplinary discovery - Anne Raugh, Matthew Tiscareno
    • Coordinating semantic resources (vocabularies, taxonomies, ontologies) — Ryan McGranaghan, Michael Kurtz
    • How to handle researcher submitted data — Joshua Pepper, David Holibaugh Baker, Jordan Padams
  • Open discussion and Q&A
  • Break
  • Organize afternoon breakouts
    • Lightning talks
    • Pitches
    • Voting
    • Suggest ideas now at https://bit.ly/nasaidea

5 of 33

Open data and open science guidelines

28 September 2022

NASA Open Source Science Data Repositories Workshop

Mark A. Parsons

https://orcid.org/0000-0002-7723-0950

mark.parsons@uah.edu�@chutneyboy

University of Alabama in Huntsville

6 of 33

Why

  • Open data is not enough
  • Open science requires a new culture and a supporting infrastructure

“The basic irony of standards is the simple fact that there is no standard way to create a standard, nor is there even a standard definition of ‘standard’.”

- Andy Russel and Lee Vinsel, NYT, 2019-02-16

7 of 33

A Policy-Driven Vision

SPD-41a:

III.B.a SMD-funded data should follow the FAIR Guiding Principles for scientific data management and stewardship.

III.C.a When released, SMD-funded software should follow best practices in the relevant open source and research communities.

8 of 33

A Pragmatic Mission (to start)

Help make NASA’s science discovery systems work better.

Help SMD repositories meet the new demands of open science.

9 of 33

Who

  • Alan Smale (Astrophysics)
  • David Ciardi (Astrophysics)
  • Bruce Berriman (Astrophysics)
  • Robert Candey (Heliophysics)
  • Brian Thomas (Heliophysics)
  • Dan Berrios (Biological & Physical Sciences)
  • Thomas Morgan (Planetary Science)
  • Steve Hughes (Planetary Science)
  • Bob Downes (Earth Science)
  • Sara Lubkin (Earth Science)
  • Steve Crawford (HQ)
  • Mark Parsons (IMPACT)

All y’all!

10 of 33

What — Objectives

Establish an SMD-wide ‘standards’ guidelines process to help implement the NASA Information Policy:

  • Review and agree on which ‘standards’ are needed to achieve the policy objectives, including the FAIR Principles etc.
  • Foster broad collaboration around and common usage of conventions, agreements, leading practices, specifications, as well as formal standards to create a culture of interoperability.
  • Identify mutually satisfactory ways to align all divisional standards goals with the broader SMD goals.
  • Determine which standards shall be adopted and how (e.g., which profiles, vocabularies, versions, related protocols, data formats, etc).
  • Identify where standards are missing and how that gap should be addressed

11 of 33

How — Principles

  1. Use or adopt existing standards where possible.
  2. There is no one (format) standard to rule them all. Disciplinary standards should be respected, but there will be some level of required commonality or crosswalking.
  3. Any standard must solve a problem and be actively adopted.
  4. Bottom up standards are preferred to top down mandates where possible.
  5. The details of exactly how to actually implement standards are as important as the standard itself.
  6. Reduce total effort. Make it easy for data providers.
  7. The concerns of data providers must be addressed.
  8. Emphasize adding value over meeting requirements. Carrots are better than sticks.

Very Important Slide!

12 of 33

Explicit “community standards” referenced

13 of 33

Scope: Where on the spectrum below?

Figure based on the work of Peter Pulsifer, Carleton University, Ottawa, Canada

14 of 33

Scope: Where on the spectrum below?

“Guidelines”

15 of 33

Reviewed Existing Processes, including

  • ESDIS Standards Coordination Office
  • International Virtual Observatory Alliance
  • International Planetary Data Alliance
  • Space Physics Archive Search and Extract Consortium Data Model (SPASE)
  • Internet Engineering Task Force
  • Research Data Alliance

Seeking a balance of control and flexibility

16 of 33

1a. Identify Need

2. Assess issue and current standards adopted

3. Prepare work proposal

4. (public?) Review proposal

5. Assemble team and do work

6. Propose guideline & implementation (RFC)

8. Public review

7. Policy and/or Technical review

9. Publish final RFC

1b. Identify Champion(s)

Elevate issues as needed

10. Publish summary guideline

go?

A Draft Process

17 of 33

DOIs for data citation as a test case

SPD-41a requires NASA data to be citable “using a persistent identifier”. But which identifier and how?

So we

  • Assembled expert team
  • Developed work proposal including existing community guidance
  • Surveyed existing practice – 28 of 32 archives responded. Most using DOIs registered through DataCite (in addition to other identifiers). The issue often seems to be mapping a set of local IDs to something considered citable, i.e. “DOI-worthy”. Most have published guidance for researchers, archives or both.
  • Developed detailed request for comment
  • WG and Policy Officer reviewed
  • Now out for comment. Please review and comment! https://github.com/nasa/smd-open-science-guidelines

18 of 33

Who

  • Dan Berrios, BPS
  • Robert M. Candey, HPD
  • Mitch Gordon, PSD
  • Nathan James, ESD
  • Steve Joy, PSD
  • Mark Parsons, NASA IMPACT, University of Alabama in Huntsville
  • Josh Peek, ASD
  • Anne C. Raugh, University of Maryland, College Park (PSD)
  • Aaron Roberts, HPD
  • Gerald Steeman, STI

19 of 33

Guideline Background and Context

  • Policy requires SMD-funded data to be citable with a PID.
  • DOIs are commonly used for this in NASA. DataCite is currently the only DOI Registration Authority tailored to data and has three membership types:
    • Member only (don’t create DOIs),
    • Direct member,
    • Consortium member.
  • Some archives are direct members. STI has a consortium membership for NASA and is developing services.
  • Three scenarios for registering DOIs to enable data citation in the literature:
    • Planned
    • Provider request
    • User request

20 of 33

Guideline description (section H)

  • Data intended or used for citation in the scientific literature should have a DOI registered through DataCite.
  • DOI requests for SMD-funded data should be processed through the responsible repository.
  • Metadata requirements should be met by the repository in collaboration with providers.
  • The repository should be responsible for maintaining the metadata for the digital object as well as the landing page and resolvability of the DOI.
  • Repositories establish their own guidelines on what is required to register a DOI.
  • Three options for archives to register DOIs:
    • Request through STI (few)
    • Work through the NASA Consortium Membership (majority)
    • Become a Direct Member (few activists).
  • Repositories and STI coordinate to implement an agency registry of DOIs.

21 of 33

Some lessons learned

  • Scoping is essential
  • Context is critical
  • Champions and facilitators are critical
  • Revealed underlying friction of (any) data policy – user needs vs. NASA needs with repositories in the middle.

22 of 33

Friction is inevitable and necessary

23 of 33

Figure based on Yarmey and Baker (2013). https://doi.org/10.2218/ijdc.v8i1.252

Working at multiple scales

— glocally —

builds knowledge and interconnection to address shared problems.

24 of 33

Figure based on Yarmey and Baker (2013). https://doi.org/10.2218/ijdc.v8i1.252

Working at multiple scales

— glocally —

builds knowledge and interconnection to address shared problems.

25 of 33

Figure based on Yarmey and Baker (2013). https://doi.org/10.2218/ijdc.v8i1.252

Working at multiple scales

— glocally —

builds knowledge and interconnection to address shared problems.

26 of 33

Figure based on Yarmey and Baker (2013). https://doi.org/10.2218/ijdc.v8i1.252

Working at multiple scales

— glocally —

builds knowledge and interconnection to address shared problems.

27 of 33

Figure based on Yarmey and Baker (2013). https://doi.org/10.2218/ijdc.v8i1.252

Working at multiple scales

— glocally —

builds knowledge and interconnection to address shared problems.

SPASE

SPASE

28 of 33

Figure based on Yarmey and Baker (2013). https://doi.org/10.2218/ijdc.v8i1.252

Working at multiple scales

— glocally —

builds knowledge and interconnection to address shared problems.

29 of 33

Figure based on Yarmey and Baker (2013). https://doi.org/10.2218/ijdc.v8i1.252

Working at multiple scales

— glocally —

builds knowledge and interconnection to address shared problems.

?

?

30 of 33

“Standardization is dynamic, not static; it means not to stand still, but to move forward together.”

1920’s motto for the Engineering Standards Committee (precursor to ANSI)

Created by Manuel Waelder, Noun Project

31 of 33

Thank You

Contact me at mark.parsons@uah.com @chutneyboy

32 of 33

Organizing Breakouts

Session voting:

https://bit.ly/votenasa

33 of 33

Instructions to Breakouts

Start the afternoon in breakouts and then come together to report before next session

5 min: Identify helpers

  • Moderator (person who proposed the topic)
  • Note taker but everyone should contribute
  • Reporter to summarize to the whole group

10 min: Clarify the general issue

20 min: Break it down

25 min: Identify approaches to a solution

5 min: Synthesize and prepare to report back