1 of 22

1

Open-Source Science at NASA

Kevin Murphy

Chief Science Data Officer

OSSI Data Repositories Workshop

September 27, 2022

1

2 of 22

2

Overview

  • Open Meetings & Core Values
  • SMD Strategy for Data and Computing
    • How it started
    • What we’ve accomplished
    • What we’re doing now
  • Next Steps

3 of 22

Open Meetings

  • CSDO meetings are open to all.
  • This is a working meeting - there will be active discussions.
  • Reason for opening up is to provide insight on work being done and to gain perspectives

4 of 22

Open Source Science Core Values

  • As open as possible, as restricted as necessary, always secure
  • Increase the accessibility, inclusion, and reproducibility of SMD scientific activities
  • When possible, minimize the burden

5 of 22

SMD Strategy for Data Management and Computing for Groundbreaking Science

STRATEGY DEVELOPMENT

WORKSHOP #2

WORKSHOP #1

RFI ACTIVITY

SMDWG KICKOFF

SMD identifies strategic data management and science computing as a priority (February 2018)

Archives Processing and Data Exploitation Meeting (GRC) (August 2018)

SMD’s Strategy for Data Management and Computing (December 2019)

67 RFI Responses with five common themes (July-September 2018)

Maximizing the Scientific Return of NASA Data (DC) (October 2018)

Open

Science

Open

Software

Open

Data

Open

Results

Open

Tools

6 of 22

6

What is the SMD Strategy for Data and Computing?

An SMD-approved strategy to enable transformational open science through continuous evolution of SMD’s science data and computing systems.

Goal 1: Develop and Implement

Capabilities to Enable Open Science

Goal 2: Continuous Evolution of

Data and Computing Systems

Goal 3: Harness the Community

and Strategic Partnerships for Innovation

7 of 22

7

Goal 1: Develop and Implement Capabilities to Enable Open Science

Goal 2: Continuous Evolution of Data and Computing Systems

Goal 3: Harness the Community and Strategic Partnerships for Innovation

1.1

Develop and implement a consistent open data and software policy tailored for SMD

2.1

Establish standardized approaches for all new missions and sponsored research that encourage the adoption of advanced techniques

3.1

Develop community of practice and standards group

1.2

Upgrade capabilities at existing archives to support machine readable data access using open formats and data services

2.2

Integrate investment decisions in High-End Computing with the strategic needs of the research communities

3.2

Partner with academic, commercial, governmental and international organizations

1.3

Develop and implement a SMD data catalog to support discovery and access to complex scientific data across divisions

2.3

Invest in capabilities to use commercial cloud environments for open science

3.3

Promote opportunities for continuous learning as the field evolves through collaboration

1.4

Increase transparency into how science data are being used through a free and open unified journal server

2.4

Invest in the tools and training necessary to enable breakthrough science through application of AI/ML

SMD Strategy for Data Management and Computing for Groundbreaking Science 2019-2024

8 of 22

SPD-41: Scientific Information Policy

SPD-41a was released in November with proposed additions.

An RFI was released to the community and closed on March 4, 2022.

SPD-41 was released in August 2021.

SPD-41 brings together existing NASA and Federal guidance.

  • SPD-41: The Science Information Policy - https://go.usa.gov/xtNTJ

  • Science Information Policy Website - https://go.usa.gov/xtNTt

9 of 22

Overview of the implementation SPD-41a

Future implementation plans include:

  • Software release policy
  • Guidance for awards, contracts, ROSES, and Announcement of Opportunities; PIs should include these costs in proposals
  • Incorporated text into AOs
  • Incentives for the community to make the transition - e.g ROSES22 F8. Supplement for Open Source Software

SPD-41a is forward looking - it is meant to apply to work going forward. Existing missions and investigations should adopt parts of this policy consistent with available resources.

10 of 22

SMD Science Discovery Engine

Create an SMD discovery capability to enable open source science. Scope includes:

  • Astrophysics: NAVO registry
  • BPS: GeneLab, Life Sciences Data Archive
  • Earth Science: Common Metadata Repository
  • Heliophysics: SPASE registry, Events Knowledgebase
  • Planetary Science: PDS API
  • + Models, software, tools and other contextual information from all 5 divisions
  • Over 1 million documents & metadata included at this time.
  • Incorporated 3 SMD relevant facets into the interface
    • Platforms
    • Instruments
    • Missions

11 of 22

SMD Science Discovery Engine

12 of 22

SDE Project Timeline

Summer 2020

Pre-formulation

Developed project charter.

Established SDE team.

Oct 2020 - Sept 2021

Formulation

Established SDE working group.

Surveyed divisions and technology approaches.

Selected technology solution for Version 0.

Oct 2021 - Sept 2022

Version 0 Development

Established SMD instance of Sinequa in the Enterprise Data Platform.

Indexed content from all 5 divisions.

Created vocabulary lists for inclusion in the SDE.

Oct 2022 - Sept 2023

Version 1 Development

Public release of SDE.

Incorporation of more SMD content and vocabulary lists into the SDE.

Improved UI/UX and API access.

13 of 22

The NASA Astrophysics Data System (ADS)

ADS is a NASA-funded project which provides discovery services for scholarly literature in Astronomy & Physics

  • 15M metadata records, most of them traditional publications
  • 6M full-text documents from all major publishers
  • A citation graph with over 8M nodes and 142M edges
  • (Anonymous) usage data for 50k regular users

https://ui.adsabs.harvard.edu

14 of 22

We’re on track to accomplish much more.

15 of 22

15

15

What is Transform to Open Science (TOPS)?

TOPS is a 5-year NASA SMD initiative to foster adoption of Open Science practices across the scientific community.

Strategic Objectives:

  • Increase understanding & adoption of open science
  • Accelerate major scientific discoveries
  • Broaden participation by historically underrepresented communities

16 of 22

16

16

What is TOPS doing?

  • NASA has allocated $3 million/year to fund projects related to Open Science Training via the “TOPST” ROSES 22 element.
    • Develop ScienceCore
    • OpenCore summer schools
    • OpenCore virtual cohorts
  • OpenCore is a community developed introduction to open science
  • CSDO is participating in the Office of Science and Technology Policy (OSTP) Subworking group on the Year of Open Science
  • Maintaining GitHub to share resources and ensure an open and transparent working environment

17 of 22

Context for Data and Computing Architecture Study

The CSDO is conducting two activities to develop cyberinfrastructure to support the Strategy for Data Management and Computing and SPD41:

1. Defining Core Data and Computing Services Requirements

Common SMD IT policies, software and computing capabilities to support:

  • Moving to hybrid cloud environments: computing, storage, cybersecurity, networking, and business processes
  • Open-Source Science/SPD-41 requirements: Research Data and Software Archive, User Registration, Data Set Search, Journal Search, AI/ML models, and more

2. Data and Computing Architecture Study

  • Study to evaluate architecture options for scientific data and computing elements of Core Services infrastructure.
  • Produce recommendations for a Hybrid Cloud Infrastructure for SMD (mixed computing, storage, and services environment made up of on-premises infrastructure, private cloud services, high-end computing, and a public cloud)

Core Services funding initiates in FY24 and ramps up fully in FY25.

18 of 22

Timeline for Core Services

FY22

FY23

FY24

FY25

FY26

Develop and deploy initial capabilities

Transition plan execution

Initiate HQ and Center Offices

Refine and approve

today

Develop Core Services requirements and cost models

Study: Scientific Data and Computing Architecture

CSDO supports pilot cloud environments for divisions

Divisions continue to support their existing data and computing activities

Core Services Operational

Open-Source Science Initiative Council and SMaC will guide and approve Core Services.

19 of 22

19

Goal 1: Develop and Implement Capabilities to Enable Open Science

Goal 2: Continuous Evolution of Data and Computing Systems

Goal 3: Harness the Community and Strategic Partnerships for Innovation

1.1

Develop and implement a consistent open data and software policy tailored for SMD-

2.1

Establish standardized approaches for all new missions and sponsored research that encourage the adoption of advanced techniques-

3.1

Develop community of practice and standards group-

1.2

Upgrade capabilities at existing archives to support machine readable data access using open formats and data services-

2.2

Integrate investment decisions in High-End Computing with the strategic needs of the research communities -

3.2

Partner with academic, commercial, governmental and international organizations-

1.3

Develop and implement a SMD data catalog to support discovery and access to complex scientific data across divisions-

2.3

Invest in capabilities to use commercial cloud environments for open science-

3.3

Promote opportunities for continuous learning as the field evolves through collaboration

1.4

Increase transparency into how science data are being used through a free and open unified journal server-

2.4

Invest in the tools and training necessary to enable breakthrough science through application of AI/ML-

SMD Strategy for Data Management and Computing for Groundbreaking Science 2019-2024

20 of 22

Open Science Advancement Summary

    • Divisions are moving towards expansion of cloud activities (including with support from NGAP)�
    • Multiple Divisions engaging in new AI/ML activities�
    • Divisions are facilitating improved information and knowledge discovery (e.g. through engagement with the Science Discovery Engine)

Images: Noun Project

21 of 22

Where we’re heading…

    • Receive feedback from the community on the implementation of the strategy for Open Science�
    • Identify tangible activities for collaboration and development between NASA science data repositories
    • Build networks to support our Open Science vision

Images: Noun Project

22 of 22

Backups