1) WorldFAIR+ and CDIF: vision, progress, and next steps: https://bit.ly/wfpluscdif
Simon Hodson, Steve Richard
Making Data Work – WorldFAIR – WorldFAIR+
Making Data Work
(2018-2022)
WorldFAIR
(2022-2024)
WorldFAIR+
(2024+)
WorldFAIR Outputs
Enabling Global FAIR Data: https://doi.org/10.5281/zenodo.11242702
CODATA-WorldFAIR Policy Recommendations: Enabling Global FAIR Data
What is CDIF?
Discovery Profile
Description Profile: DDI CDI for Data Structure, Variable Cascade, Provenance…
CDIF, Next Steps
CDIF as a ‘curated collection of “FAIR-enabling resources” ’
WorldFAIR+
Vision:
Potential Case Studies and partnerships:
WorldFAIR+ how to get involved?
‘WorldFAIR+’, CDIF Implementation Projects
2) Progress with funded projects (CDIF4XAS, Climate-Adapt4EOSC, FAIR4DDE)
Simon Hodson, Steve Richards
CDIF-4-XAS: Overview
Products
X-Ray Absorption Challenge
From Matthew Newville, 2023
Proposed solution:
Spectra Data
Implementation
"schema:about": {"@id": "xas:485749"},� "schema:description": "metadata about documentation for se_na2so4",� "dcterms:conformsTo": [� {"@id": "cdif:profile_basic_1.0"},� {"@id": "cdif:profile_xasCDIF"}� ]
Self describing modularization
CDIF in Climate-Adapt4EOSC
CDIF-4-XAS: Next Steps
CDIF in DDE
Mapping CDIF discovery -- DDE Metadata
3) Updates on recent funding proposals and implications
Simon Hodson
Updates on recent funding proposals
CDIF4EOSC
4) Plans for upcoming Dagstuhl workshop
Simon Hodson
Dagstuhl Workshop: the Provenance Chain
5) Priority topics: CDIF, Croissant and AI; dealing with binary data formats; context, provenance and data quality.
Steve Richard, Slava Tykhonov, Simon Hodson
scan to access slides
and links
Croissant for Machine Learning
Croissant Format Specification https://docs.mlcommons.org/croissant/docs/croissant-spec.html
Responsible AI: CroissantML and DDI
(Data Documentation Initiative)
Responsible AI
“As AI advances at rapid speed there is increased recognition among researchers, practitioners and policy makers that we need to explore, understand, manage, and assess its economic, social, and environmental impacts. One of the main instruments to operationalise responsible AI (RAI) is dataset documentation.
This is how Croissant helps address RAI:
Croissant is designed to be modular and extensible. One such extension is the Croissant RAI vocabulary, which addresses 7 specific use cases, starting with the data life cycle, data labeling, and participatory scenarios to AI safety and fairness evaluation, traceability, regulatory compliance and inclusion. More details are available in the . We welcome additional extensions from the community to meet the needs of specific data modalities (e.g. audio or video) and domains (e.g. geospatial, life sciences, cultural heritage).”
Croissant spec v1.0
CDIF-driven DDI variable cascade integrated into CroissantML
Note: CroissantML defines an AI-ready metadata layer, CDIF is the graph representation of expert knowledge
Multilingual properties in Semantic Croissant: “energy”
Short Description
Energy is the capacity to do work or perform tasks. It is a fundamental concept in physics and is often measured in units such as joules or kilowatt‑hours. Energy can be transferred from one object to another, and can be transformed from one form to another. It is essential for powering machines, lighting homes, and powering transportation systems.
AI-generated concept description powered by CDIF and based on factual data (MCP)
scan to access slides
and links
Binary Formats
Context, Provenance and Quality
6) Governance, release management and licensing
Discussion
Governance, release management and licensing
7) Suggestions, recommendations, opportunities for collaboration, AOB?
Discussion