1 of 20

Information architecture and Data Utilization

ISWAT 02 Roadmap paper

Arnaud Masson and Shing Fung

02 Moderators

ISWAT plenary session

13 September 2021

Information architecture and Data Utilization Roadmap paper | A. Masson | ISWAT 2021 | Slide 1/12

2 of 20

https://www.iswat-cospar.org/

O2: Information Architecture and Data Utilization

O2 Information Architecture Mini-Workshop 15-17 March 2021

Information architecture and Data Utilization Roadmap paper | A. Masson | ISWAT 2021 | Slide 2/12

3 of 20

Outline of Cluster roadmap paper

  1. Introduction: complementarity of IHDEA (long-term) and ISWAT

  • PyHC Integration Strategy workshop outcome (O2-06 paper, J. Barnum)

  • SPASE Metadata as a Heliophysics Science-Enabling Tool inc. DOI (O2-02 paper, S. Fung)

  • The HAPI Standard for Accessing Time Series Data (O2-05 J. Vandegriff)

  • Kamodo

  • Machine learning and ML ready data

Information architecture and Data Utilization Roadmap paper | A. Masson | ISWAT 2021 | Slide 3/12

4 of 20

International Heliophysics Data Environment Alliance (IHDEA)�������• 2019: creation of IHDEA, during a meeting at NASA GSFC� �• 2020: setup of WGs, hosted by Obs. Paris (led by B. Cecconi)��• 2021: annual IHDEA meeting, hosted by ESA, 27 Sep.- 01 Oct.� feel free to join!� https://cosmos.esa.int/web/ihdea���

Information architecture and Data Utilization Roadmap paper | A. Masson | ISWAT 2021 | Slide 4/12

5 of 20

���The goal of the IHDEA is to encourage the use of common standards and�services to boost data sharing and enhance science.��• Its objectives are�o Active involvement of international heliophysics and space weather data providers�o Develop Standards-based data systems with uniform and well-defined terminology�o Coordinated, user-friendly data access and analysis tools to serve diverse communities�o Adequate documentation of data products and sources�o Flexible, interoperable (ex. HAPI), and interconnected data archives, modeling centers, and virtual�observatories�o Effective communication among national and international partners, data providers, data�tool developers, and data users��• The role of the IHDEA is to engage the community, foster communication, and to�identify the standards and services which will best serve the science needs.��(courtesy of S. Fung)

Goal and objectives

Information architecture and Data Utilization Roadmap paper | A. Masson | ISWAT 2021 | Slide 5/12

6 of 20

Researchers, simulators, information technology experts, developers and�data engineers dedicated to advancing heliophysics* and improving the�heliophysics data environment.�������(courtesy of S. Fung)

Who are we?

Information architecture and Data Utilization Roadmap paper | A. Masson | ISWAT 2021 | Slide 6/12

7 of 20

https://ihdea.net

Information architecture and Data Utilization Roadmap paper | A. Masson | ISWAT 2021 | Slide 7/12

8 of 20

Outline

  1. Introduction: complementarity of IHDEA (long-term) and ISWAT

  • PyHC Integration Strategy workshop outcome (O2-06 paper, J. Barnum)

  • SPASE Metadata as a Heliophysics Science-Enabling Tool inc. DOI (O2-02 paper, S. Fung)

  • The HAPI Standard for Accessing Time Series Data (O2-05 J. Vandegriff)

  • Kamodo

  • Machine learning and ML ready data

Information architecture and Data Utilization Roadmap paper | A. Masson | ISWAT 2021 | Slide 8/12

9 of 20

2. PyHC Integration Strategy Workshop

�Grand Challenge of workshop: How do we better enable creation of software which interoperate well for heliophysics?

�Day 1 (August 30): Identify challenges to developing heliophysics software

Day 2 (August 31): How to solve challenges 

Information architecture and Data Utilization Roadmap paper | A. Masson | ISWAT 2021 | Slide 9/12

10 of 20

2. PyHC Integration Strategy Workshop

Day 1: [3 hrs] �Deliverable : List of ranked challenges in PyHC

  • Sustainability of Projects/Software (people, money)
  • Reducing Overlapping Functionality in Packages
  • Improve PyHC discoverability through Jupyter notebooks/documentation
  • APIs (provide ways for current data access APIs to be infused into current core PyHC libraries)
  • How to get new developers/inclusion of small Python project into major libraries
  • How to get new users involved

Information architecture and Data Utilization Roadmap paper | A. Masson | ISWAT 2021 | Slide 10/12

11 of 20

2. PyHC Integration Strategy Workshop

�Day 2: [3 hrs]

�Review of challenges from first day [30 min] 

�Breakout Sessions : Discussion of how to solve some key challenges [1:30 hr]

Outcome of this workshop will be presented in a paper led by J. Barnum (Colorado University)

Information architecture and Data Utilization Roadmap paper | A. Masson | ISWAT 2021 | Slide 11/12

12 of 20

Outline

  1. Introduction: complementarity of IHDEA (long-term) and ISWAT

  • PyHC Integration Strategy workshop outcome (O2-06 paper, J. Barnum)

  • SPASE Metadata as a Heliophysics Science-Enabling Tool inc. DOI (O2-02, S. Fung)

  • The HAPI Standard for Accessing Time Series Data (O2-05 J. Vandegriff)

  • Data visualization through Kamodo

  • Machine learning and ML ready data

Information architecture and Data Utilization Roadmap paper | A. Masson | ISWAT 2021 | Slide 12/12

13 of 20

3. SPASE Metadata as a Heliophysics Science-Enabling Tool

Paper outline (led by S. Fung)

  • Infrastructure supporting the Heliophysics data environment
  • SPASE metadata model inc. dataset citation through DOI
  • Science-enabling data services and Tools
  • Science Analysis
  • Future outlook

Information architecture and Data Utilization Roadmap paper | A. Masson | ISWAT 2021 | Slide 13/12

14 of 20

3. SPASE Metadata as a Heliophysics Science-Enabling Tool

�Section 5 will provide use cases for Science analysis including

5.1 Event analysis

5.2 Statistical studies

5.3 Data science analysis

5.4 Data-model comparison

Coord. with ISWAT H1-01 (ambient solar wind validation team)

Last section on Future outlook

Prospect on Supporting data search by phenomena, using Annotation resource

Support keyword search

e.g., phenomenon keyword->publications—apply OCR to harvest data intervals 🡪 get and deliver data

Information architecture and Data Utilization Roadmap paper | A. Masson | ISWAT 2021 | Slide 14/12

15 of 20

Outline

  1. Introduction: complementarity of IHDEA (long-term) and ISWAT

  • PyHC Integration Strategy workshop outcome (O2-06 paper, J. Barnum)

  • SPASE Metadata as a Heliophysics Science-Enabling Tool inc. DOI (O2-02, S. Fung)

  • The HAPI Standard for Accessing Time Series Data (O2-05 J. Vandegriff)

  • Data visualization through Kamodo

  • Machine learning and ML ready data

Information architecture and Data Utilization Roadmap paper | A. Masson | ISWAT 2021 | Slide 15/12

16 of 20

4. HAPI

The Heliophysics Application Programmer's Interface (HAPI)

is a standard for accessing distributed time-series data to increase interoperability.

The HAPI standardizes the two main parts of a data service: the request interface and the response data structures.

Available in IDL, Java, MATLAB, and Python.

Multiple data providers in the US and Europe have added HAPI access alongside their existing interfaces, and data can now be served via HAPI

HAPI has been adopted in 2018 as a recommended COSPAR standard for time series data delivery.�

Information architecture and Data Utilization Roadmap paper | A. Masson | ISWAT 2021 | Slide 16/12

17 of 20

Outline

  1. Introduction: complementarity of IHDEA (long-term) and ISWAT

  • PyHC Integration Strategy workshop outcome (O2-06 paper, J. Barnum)

  • SPASE Metadata as a Heliophysics Science-Enabling Tool inc. DOI (O2-02, S. Fung)

  • The HAPI Standard for Accessing Time Series Data (O2-05 J. Vandegriff)

  • Kamodo

  • Machine learning and ML ready data

Information architecture and Data Utilization Roadmap paper | A. Masson | ISWAT 2021 | Slide 17/12

18 of 20

5. Kamodo

Kamodo is a tool for simulation output onboarding and community utilization

Support popular models like SWMF (GM/BATSRUS, IE/Ridley_serial, UA/GITM), IT/TIE-GCM, IT/CTIPe (versions 3,4), IRI (NetCDF files in run-on-request), Tsyganenko (T89, T96, TS04) – natively in Python 3

ENLIL, SWMF/GM, CTIPe (version 2.0), OpenGGCM

Written in Python 3 and operates with popular Pyton packages (SciPy, PlasmaPy…)

At least 3 papers are planned

  • Simulation data Visualization through Kamodo (O2-01 paper, D. DeZeeuw)
  • Kamodo model output access in geospace (O2-01 paper, Lutz)
  • Kamodo-based tools for Model Data Accessibility (O2-01 paper R. Ringuette)

Information architecture and Data Utilization Roadmap paper | A. Masson | ISWAT 2021 | Slide 18/12

19 of 20

Outline

  1. Introduction: complementarity of IHDEA (long-term) and ISWAT

  • PyHC Integration Strategy workshop outcome (O2-06 paper, J. Barnum)

  • SPASE Metadata as a Heliophysics Science-Enabling Tool inc. DOI (O2-02, S. Fung)

  • The HAPI Standard for Accessing Time Series Data (O2-05 J. Vandegriff)

  • Data visualization through Kamodo

  • Machine learning and ML ready data

Information architecture and Data Utilization Roadmap paper | A. Masson | ISWAT 2021 | Slide 19/12

20 of 20

6. Machine learning and ML ready data

A paper on Predicting solar transient events using machine learning: research advances and data preparation efforts (led. By V. Sadykov) is in preparation

Paper in collaboration with other clusters (S3-01, S3-03, H3-01, O3) �

Information architecture and Data Utilization Roadmap paper | A. Masson | ISWAT 2021 | Slide 20/12