1
Open-Source Science at NASA
Kevin Murphy
Chief Science Data Officer
OSSI Data Repositories Workshop
September 27, 2022
1
2
Overview
Open Meetings
Open Source Science Core Values
SMD Strategy for Data Management and Computing for Groundbreaking Science
STRATEGY DEVELOPMENT
WORKSHOP #2
WORKSHOP #1
RFI ACTIVITY
SMDWG KICKOFF
SMD identifies strategic data management and science computing as a priority (February 2018)
Archives Processing and Data Exploitation Meeting (GRC) (August 2018)
SMD’s Strategy for Data Management and Computing (December 2019)
67 RFI Responses with five common themes (July-September 2018)
Maximizing the Scientific Return of NASA Data (DC) (October 2018)
Open
Science
Open
Software
Open
Data
Open
Results
Open
Tools
6
What is the SMD Strategy for Data and Computing?
An SMD-approved strategy to enable transformational open science through continuous evolution of SMD’s science data and computing systems.
Goal 1: Develop and Implement
Capabilities to Enable Open Science
Goal 2: Continuous Evolution of
Data and Computing Systems
Goal 3: Harness the Community
and Strategic Partnerships for Innovation
7
Goal 1: Develop and Implement Capabilities to Enable Open Science | Goal 2: Continuous Evolution of Data and Computing Systems | Goal 3: Harness the Community and Strategic Partnerships for Innovation | |||
1.1 | Develop and implement a consistent open data and software policy tailored for SMD | 2.1 | Establish standardized approaches for all new missions and sponsored research that encourage the adoption of advanced techniques | 3.1 | Develop community of practice and standards group |
1.2 | Upgrade capabilities at existing archives to support machine readable data access using open formats and data services | 2.2 | Integrate investment decisions in High-End Computing with the strategic needs of the research communities | 3.2 | Partner with academic, commercial, governmental and international organizations |
1.3 | Develop and implement a SMD data catalog to support discovery and access to complex scientific data across divisions | 2.3 | Invest in capabilities to use commercial cloud environments for open science | 3.3 | Promote opportunities for continuous learning as the field evolves through collaboration |
1.4 | Increase transparency into how science data are being used through a free and open unified journal server | 2.4 | Invest in the tools and training necessary to enable breakthrough science through application of AI/ML | | |
SMD Strategy for Data Management and Computing for Groundbreaking Science 2019-2024
✔
✔
✔
✔
✔
✔
SPD-41: Scientific Information Policy
SPD-41a was released in November with proposed additions.
An RFI was released to the community and closed on March 4, 2022.
SPD-41 was released in August 2021.
SPD-41 brings together existing NASA and Federal guidance.
Overview of the implementation SPD-41a
Future implementation plans include:
SPD-41a is forward looking - it is meant to apply to work going forward. Existing missions and investigations should adopt parts of this policy consistent with available resources.
SMD Science Discovery Engine
Create an SMD discovery capability to enable open source science. Scope includes:
SMD Science Discovery Engine
SDE Project Timeline
Summer 2020
Pre-formulation
Developed project charter.
Established SDE team.
Oct 2020 - Sept 2021
Formulation
Established SDE working group.
Surveyed divisions and technology approaches.
Selected technology solution for Version 0.
Oct 2021 - Sept 2022
Version 0 Development
Established SMD instance of Sinequa in the Enterprise Data Platform.
Indexed content from all 5 divisions.
Created vocabulary lists for inclusion in the SDE.
Oct 2022 - Sept 2023
Version 1 Development
Public release of SDE.
Incorporation of more SMD content and vocabulary lists into the SDE.
Improved UI/UX and API access.
The NASA Astrophysics Data System (ADS)
ADS is a NASA-funded project which provides discovery services for scholarly literature in Astronomy & Physics
https://ui.adsabs.harvard.edu
We’re on track to accomplish much more.
15
15
What is Transform to Open Science (TOPS)?
TOPS is a 5-year NASA SMD initiative to foster adoption of Open Science practices across the scientific community.
Strategic Objectives:
16
16
What is TOPS doing?
Context for Data and Computing Architecture Study
The CSDO is conducting two activities to develop cyberinfrastructure to support the Strategy for Data Management and Computing and SPD41: |
1. Defining Core Data and Computing Services Requirements Common SMD IT policies, software and computing capabilities to support:
|
2. Data and Computing Architecture Study
|
Core Services funding initiates in FY24 and ramps up fully in FY25.
Timeline for Core Services
| | | | |
FY22 | FY23 | FY24 | FY25 | FY26 |
Develop and deploy initial capabilities
Transition plan execution
Initiate HQ and Center Offices
Refine and approve
today
Develop Core Services requirements and cost models
Study: Scientific Data and Computing Architecture
CSDO supports pilot cloud environments for divisions
Divisions continue to support their existing data and computing activities
Core Services Operational
Open-Source Science Initiative Council and SMaC will guide and approve Core Services.
19
Goal 1: Develop and Implement Capabilities to Enable Open Science | Goal 2: Continuous Evolution of Data and Computing Systems | Goal 3: Harness the Community and Strategic Partnerships for Innovation | |||
1.1 | Develop and implement a consistent open data and software policy tailored for SMD- | 2.1 | Establish standardized approaches for all new missions and sponsored research that encourage the adoption of advanced techniques- | 3.1 | Develop community of practice and standards group- |
1.2 | Upgrade capabilities at existing archives to support machine readable data access using open formats and data services- | 2.2 | Integrate investment decisions in High-End Computing with the strategic needs of the research communities - | 3.2 | Partner with academic, commercial, governmental and international organizations- |
1.3 | Develop and implement a SMD data catalog to support discovery and access to complex scientific data across divisions- | 2.3 | Invest in capabilities to use commercial cloud environments for open science- | 3.3 | Promote opportunities for continuous learning as the field evolves through collaboration |
1.4 | Increase transparency into how science data are being used through a free and open unified journal server- | 2.4 | Invest in the tools and training necessary to enable breakthrough science through application of AI/ML- | | |
SMD Strategy for Data Management and Computing for Groundbreaking Science 2019-2024
Open Science Advancement Summary
Images: Noun Project
Where we’re heading…
Images: Noun Project
Backups