1 of 12

Working Group: Data

Meeting 2 - Recap

Global Initiative on AI for Health

Luis Oala and Marc Lecoultre

2 of 12

Raison d'être

Working Group: Data

“Make healthcare data accessible through a unified, ML-ready format that meets the specific requirements of healthcare”

3 of 12

Raison d'être

Working Group: Data

“Make healthcare data accessible through a unified, ML-ready format that meets the specific requirements of healthcare”

Croissant Working Group. (2024). Croissant: A metadata format for ml-ready datasets. Advances in Neural Information Processing Systems, 37, 82133-82148.

4 of 12

Raison d'être

Working Group: Data

“Make healthcare data accessible through a unified, ML-ready format that meets the specific requirements of healthcare”

+

5 of 12

Raison d'être

Working Group: Data

“Make healthcare data accessible through a unified, ML-ready format that meets the specific requirements of healthcare”

+

6 of 12

Goals (short, medium and long term)

  1. (G1) Establish data indexing good practices for healthcare AI datasets. Incorporate harmonized data product standards supporting diverse healthcare data types. Define standardized formats for imaging, genomics, and EHR data. Establish quality management protocols for data assets.
  2. (G2) Build a catalog meta-index across data vendors enabling unified browsing and loading of ML-ready healthcare datasets using Croissant.
    1. Ontology compatibility (e.g. ICD/SNOMED/UMLS…)
    2. Searchability (e.g. Graph based, Embedding (Predictive vs numerical -> numerical preferred))
  3. (G3) Develop transaction protocols for secure ML asset exchange
    • Data authentication (e.g. synthetic data attribution, Digital signing of data)
    • Usage rights (User license for data assets)
    • Data communication protocol (MCP)
    • User authentication
  4. (G4) Cross-sectorial, cross-discipline federated test of the protocol in the real world
    • Across different user roles of the protocol (Data vendor, data user, etc.)

Working Group: Data

SHORT

MEDIUM

LONG

7 of 12

  • (G1) Data indexing good practices��
  • (G2) Catalog meta-index
    • Ontology compatibility
    • Searchability
  • (G3) Transaction protocols for secure ML asset exchange
    • Data authentication
    • Usage rights
    • Data communication protocol
    • User authentication �
  • (G4) Protocol test in real-world

Working Group: Data

SHORT

MEDIUM

LONG

Deliverables��

  • (D1) Guidance document��
  • (D2) Bio and Health Croissant data format specification
    • (D2.1) Distributed index through data vendors
    • (D2.2) OCI Implementation: Meta-index hosted at WHO
    • (D2.3) OCI Implementation: Search interface
    • (D2.4) ICD/SNOMED Ontology mapping specification
  • (D3) Transaction protocols for secure ML asset exchange
    • (D3.1) Data authentication specification
    • (D3.2) Usage rights guidance
    • (D3.3)Data communication protocol
    • (D3.4) User authentication specification
    • (D3.5) OCI reference implementation�
  • (D4) Report on on real-world protocol test by partners

= feasible with bootstrap

= funding contingent

8 of 12

Resource planning: what is our funding situation?

Currently work is sponsored by in-kind contributions of participating organizations. For sustainability and achievement of certain milestones (D2.4-D4) additional resources will be needed, comprising�

  • Organization Contributions
    • WHO
      • Permanent liaison at WHO to facilitate buy-in and proliferation of the data exchange
      • Web tenant to host the data portal
      • Data owner, vendor and user outreach support inside WHO network including LMICs
    • ITU
      • Permanent liaison to develop the DMXP spec
      • AWS credits to host and run data and model index and exchange
    • WIPO
      • Permanent legal-technical contributor to develop licensing and provenance model for health data and model exchange
  • Funding and Other Resources Requests
    • Core Team
      • Technical Lead (1.0 FTE): Protocol architecture and standardization
      • Implementation Lead (1.0 FTE): Implementation coordination and deployment
      • ML Engineers (2.0 FTE): Model integration and federation
      • Data Engineers (2.0 FTE): System integration and privacy preservation
      • Clinical Partners (0.5 FTE): Use case validation and workflow integration
  • Incidentals
    • Development and testing environments
      • Cloud compute (EUR100k/year),
      • Contributor travel to GIA4H meetings (EUR3k/meeting/person)

Working Group: Data

9 of 12

Operations

Work on the four goals (G1-G4) will be developed jointly with the sister group “Bio and Health Croissant”. This joint effort will combine the dynamic execution through open source community with the expertise, strategic guidance and stakeholder engagement of the GI-AI4H sister organizations.

  • Meetings
    • Weekly: Review of Kanban board, definition and assignment of new tasks (Operational meeting)
    • Monthly: Planning meeting
    • Quarterly: Strategy and prioritization meeting
  • Mailing list
    • TBD
  • Contacts
    • Secretariat at GIAI4H: Simão Campos, simao.campos@itu.int
    • Operations and open source engagement:
      • Marc Lecoultre, ml@mllab.ai
      • Christina Parry, christina.parry@sagebase.org
  • Other coordinates

Working Group: Data

10 of 12

Group composition: Who are the members?

  • Luis Oala, Dotphoton, CH
  • Marc Lecoultre, Mllab.ai, CH
  • Ferath Kherif, CHUV, CH
  • Pradeep Balachandran, Engineering Consultant (Digital Health), India
  • Hend Abou El Nasr, Professor at Faculty of Dentistry, Cairo University Consultant, Dubai Health Authority, Dubai, UAE, Hend.abouelnasr@dentistry.cu.edu.eg
  • Dominick Romano, CEO Drainpipe, USA
  • Dhanushi Hettiarachchi, CEO Ophtha Innovations, Sri Lanka
  • Nigel Foo, PhD Student NUS, Singapore
  • Thomas Wiegand, Fraunhofer HHI/TU Berlin, Germany
  • Sage Bionetworks (health data hosting nonprofit)
    • Christina Parry, USA
    • Susheel Varma, USA
    • Luca Foschini, USA
  • PhysioNet (academic research portal)
    • Chrystinne Fernandes, USA
    • Hyung-Chul Lee, South Korea
    • Tom Pollard, USA
  • Secretariat: Simao Campos, ITU, CH

Working Group: Data

11 of 12

Short description

The WG Data addresses the critical challenge of high transaction costs related to data in healthcare AI development, evaluation and deployment. Current healthcare systems face significant barriers in implementing AI solutions due to complex data preparation requirements, manual validation processes, and fragmented deployment approaches. We aim to specify a Data and Model Exchange Protocol (DMXP) for secure data and model exchange across agents with a tiered approach based on following goals:

  1. (G1) establishing data indexing good practices,
  2. (G2) building a catalog meta-index across data vendors for unified browsing of ML-ready healthcare datasets,
  3. (G3) developing transaction protocols for ML assets, and
  4. (G4) Cross-sectorial, cross-discipline federated test of the protocol in the real world.

Building on work of the FG-AI4H Open Code Initiative (OCI) and recent advances in data specification standards, our work aims to lower barriers to healthcare AI adoption while ensuring privacy, quality, and equitable access to data.

Working Group: Data

12 of 12

Why are we, as a group, here to join the initiative?

Do any health/telecommunication/IP issues align with our group’s priority?

  • Core Function
    • Advances strategic objectives across WHO, ITU, and WIPO through standardization and harmonization of data access
    • Uses tiered approach to deliver value across different resource scenarios
  • Organizational Benefits
    • WHO: Enables equitable AI health data access and standardized quality assessment
    • ITU: Contributes to technical standardization and supports developing regions
    • WIPO: Protects IP while enabling secure data exchange and collaboration
  • Collaboration
    • Partners with other working groups to maximize impact
    • Coordinates with Assessment, Ethics, and Regulatory teams
    • Provides implementation support through workshops and pilot projects

Working Group: Data