1 of 62

2 of 62

Bridging AI international policy and practice

AIDV WG

Francis Crawley, Natalie Meyers, Rodrigo Roa, Seonyoung Kim, Patricia Buendia, Madhava Jay

Oct 15, 25th RDA Plenary Meeting �Brisbane, 2025

https://www.rd-alliance.org/groups/artificial-intelligence-and-data-visitation-aidv-wg/activity/

3 of 62

Acknowledgement of Country

We acknowledge and celebrate the First Australians on whose traditional lands we meet, and we pay our respect to their elders past and present.

4 of 62

Welcome to new RDA members!

OPENNESS

COMMUNITY-�DRIVEN

CONSENSUS

NON-PROFIT AND TECHNOLOGY-�NEUTRAL

HARMONISATION

INCLUSIVITY

6 Guiding Principles are at the heart of the RDA community.

JOIN THE RDAwww.rd-alliance.org/register/

All RDA members are expected to adhere by the RDA Code of Conduct to foster a welcoming and inclusive environment.

https://www.rd-alliance.org/groups/artificial-intelligence-and-data-visitation-aidv-wg/activity/

5 of 62

EOSC-Future/RDA AIDV Working Group and its core outputs

Co-Chairs: Natalie Meyers & Francis Crawley

https://www.rd-alliance.org/groups/artificial-intelligence-and-data-visitation-aidv-wg/activity/

6 of 62

Shaping Responsible AI

Four Recommendations from the AIDV Working Group:

Supporting/Other Outputs     

11:35

7 of 62

The Shifting Paradigm: From Data Transfer to Data Visitation

  • Increasing volume and sensitivity of data (e.g., health, genomics).
  • Challenges of traditional data transfer (security, governance, duplication).
  • Data visitation: Enabling analysis in situ without physical movement.
  • Potential benefits: Collaboration, security, reduced burden on data owners.

11:40

8 of 62

DV4RDA TIGER Project: Putting RDA Principles into Practice

Natalie Meyers

11:45

9 of 62

Bridging Data Silos and Privacy

  • The Challenge: Biomedical data holds profound potential, but is often trapped in silos due to complex security and privacy concerns (e.g., HIPAA, GDPR).
  • The Solution: The DV4RDA project directly aligns with RDA's mission to champion interoperability, reproducibility, and accessibility by providing a practical framework to address these hurdles.

10 of 62

DV4RDA: Embodiment of RDA Core Principles

  • Community-Driven: Global RDA Members collaborated to implement policies and evaluate real-world data technology.
  • Delivering Value: The project champions producing tangible, implementable outputs through a practical data platform.
  • Harmonization: The project’s "computation-to-data" model aligns with data security and health data RDA working groups like:
    • Trusted Research Environments for Sensitive or Confidential Data
    • Health Data Commons GORC Profile
    • Building Immune Digital Twins

11 of 62

Goals of the DV4RDA project

  • Framework for Implementing Data Visitation technologies
  • Demonstrate and Evaluate Data Visitation in Practice
  • Enhance Data Security and Privacy
  • Promote FAIR Data Principles
  • Accelerate Research
  • Foster Global Collaboration

12 of 62

Participants' Recruitment

13 of 62

DV4RDA Participants

14 of 62

Actionable RDA Outputs

  • Policy Language: Proposed expansion of language for IRB Protocols and Informed Consent Forms (ICFs) to explicitly include "Data Visitation" as a protection method.
  • Security Framework: Developed a comprehensive "RDA AIDV Template - System Security Assessment Plan" for future Data Visiting Technologies.
  • Technical Tool: Implemented per-subject consent checking during the data quality control (QC) process for DV4RDA.
  • Bill of Rights: Expanded online documentation with AIDV policy information.

15 of 62

4 Short Talks

Presenters: Rodrigo Roa, Seonyoung Kim, Patricia Buendia, Madhava Jay

16 of 62

Policy into Practice & the Data Observatory Experience

Rodrigo Roa, Executive Director

Data Observatory

Santiago, Chile

11:50

17 of 62

Who we are?

Data Observatory (DO) is a non-profit public–private–academic institution created by the Government of Chile, Amazon Web Services (AWS), and Adolfo Ibáñez University.

Its mission is to acquire, process, and make available large volumes of data with scientific, technological, and social impact.

18 of 62

Astronomy

Earth Observation

Natural Resources

Society

Research & Work Areas

Open data platforms and infrastructures with a FAIR approach (Findable, Accessible, Interoperable, Reusable), developed in collaboration and partnership with public and private institutions.

19 of 62

FAIR Strategy and SURDATA Alliance

The Data Observatory (DO) leads Chile’s FAIR Data Policy Implementation Strategy, in coordination with the Research Data Alliance (RDA), CODATA, and as Chile’s DataCite consortium for DOI provision.

This work evolved into SURDATA, a regional alliance that promotes collaboration across science, government, industry, and civil society to strengthen data interoperability, research, and innovation, supporting the sustainable development of Chile and Latin America.

FAIR Strategy Launch

March 2025

1

Framework Agreement and Stakeholder Mapping

Apr - May 2025

2

Stakeholder invitation

May 2025

3

1st General Assembly

Oct 2025

5

2nd General Assembly

April 2026

4

Launch of thematic working groups - Nov 2025 - March 2026

ROADMAP

20 of 62

LatamGPT

ROADMAP

DO Participation begins December 2024

1

Public Launch by Science Minister

February 2025

2

Cloud and Data Infrastructure

Mar - Jun 2025

3

Model Training

Jun - Dec 2025

New versions - LatamGPT

2026

Data Sources

+2,6 M documents

21 countries

DO-AWS

Infrastructure, engineering

Experts from GenAI and Data

Processing

Participants

30 Institutions from Latin America and over 60 experts involved

Data Storage and Integration

- Classification of trained data

Data Collection and Cleaning

- Sources in spanish, english & portuguese

Analysis, Processing and Modeling

- Cloud-based training of the LatamGPT

Key Aspects

  • 1st Latin American language model reflecting the region’s cultural, social, and linguistic diversity
  • Open model, trained with regional data
  • Goal: to develop a 70B-parameter LLM, comparable to OpenAI’s ChatGPT-3.5

Release v1.0 LatamGPT

Dec 2025

ETraining, Capacity Building and Support

- Workshops and expert consulting

21 of 62

The Pulse of AI in Latin America (ILIA 2025)

The region is at a turning point.

According to ILIA 2025 LatAm and the Caribbean are moving forward with strong interest, but also deep asymmetries. Brazil and Chile lead the way, while countries such as Costa Rica, Ecuador, and the Dominican Republic are rapidly emerging.

The challenge: to move from enthusiasm to real investment, and from plans to implementation.

Generative AI and open source development are consolidating as the region’s most powerful drivers of democratization.

Source: ILIA 2025

22 of 62

From Data to Action: How the DO Strengthens Data & AI in Lat Am

Latin America INDEX for AI conclusions-> we have a lot of data but limited availability.

Without openness and standardization, data cannot generate real value.

This is where institutions like Data Observatory become strategic, curating, processing, and making available reliable, interoperable, and FAIR data to support research, innovation, and public policy.

The Data Observatory helps close this gap through open platforms that bring open science and digital sovereignty to life. In parallel, through SURDATA, we foster regional collaboration and data interoperability to strengthen Latin America’s research and innovation ecosystem.

Acquisition

Cleansing

Storage

Processing

Analysis

Visualization

Interpretation

23 of 62

contacto@dataobservatory.net

Thank you - Gracias!!!!!

24 of 62

Questions for Rodrigo Roa about DO, Surdata, ILIA, or LATAM GPT?

https://www.rd-alliance.org/groups/artificial-intelligence-and-data-visitation-aidv-wg/outputs/

12:00

25 of 62

Use Case: Policy and Compliance for Data Visitation

Seonyoung Kim, PhD

Bernard Becker Medical Library

Washington University in St. Louis

12:10

26 of 62

What is Data Visitation and Why Adopt it?

  • Traditional data sharing:
    • moves sensitive data to external servers
    • Increasing privacy and regulatory constraints (e.g., HIPPA, GDPR, NIH GDS)
  • Data Visitation (DV)
    • Data remains in its original secure storage location
    • Analysis tools or codes access the data within this environment enabling analysis without moving the data
    • Only aggregate, de-identified results leave the environment
    • Stronger protection for participant privacy
    • Reduced risk of unauthorized access or breach
    • Supports compliance with regulations

27 of 62

Data Visitation in the IRB Protocol

Key Placement Areas:

  1. Data Collection & Management – Describe where data lives and note DV use
  2. Confidentiality & Privacy – Emphasize
    • No transfer of individual-level data
    • Secure, auditable execution environments
    • Only de-identified summaries leave
  3. Data Analysis Plan – Refer back to DV method
  4. Storage of Data/Specimens – Reiterate “data stays in place”

Goal: Frame DV as a risk-mitigating strategy

28 of 62

Guidance for Informed Consent in AI & Data Visitation

  • Provides a framework for involving individuals in decisions about their personal data use in AI development and research.
  • Promotes autonomous choice through informed consent and alternative engagement methods.
  • Targets stakeholders across governments, academia, industry, civil society, and international organizations.
  • Advocates for flexible, practical consent mechanisms to support autonomy in the age of AI.

“A reconsideration of the classic form of informed consent is necessary in light of AI. We need to support autonomy through practical, flexible consent mechanisms.”

Dr. Kristy Hackett, Institue on Ethics & Policy for Innovation, McMaster University

29 of 62

Evolving Consent Models in AI and Data Visitation

  • Traditional one-time consent is not enough for AI and Data Visitation
  • Dynamic consent allows ongoing, adaptable participant control
  • FRIES & TEASE models support active, informed, and reversible decisions
    • FRIES model (Freely given, Reversible, Informed, Enthusiastic, Specific)
    • TEASE model (Traffic lights, Establish ongoing dialogue, Aftercare, Safewords, Explicate limits)
  • Emphasizes autonomy, transparency, and trust
  • Promotes inclusion through community-based and culturally sensitive consent

30 of 62

Data Visitation in the Informed Consent Form (ICF)

Plain language explanation to participants:

  • Your data stays put – never copied or moved
  • Analysis comes to the data – approved tools run in secure servers
  • Only safe results leave – de-identified summaries only
  • Why it matters – minimize unauthorized access, build trust

Place it in “Confidentiality” section

31 of 62

Per-Subject Informed Consent Verification During QC

  • Built on the dbGaP model
  • Standardized using the Informed Consent Ontology (ICO)

32 of 62

Implementation and Call to Action

  • IRB and ICF language now provide a clear framework for compliant Data Visitation
  • Institutions can adopt this language to demonstrate privacy-by-design practices
  • Aligns with NIH DMS & GDS policies, HIPAA security standards, and GDPR principles
  • Enables ethical AI applications while preserving participant autonomy
  • Call to Action: We invite RDA members and institutions to adopt and reference this guidance in their own policies and protocol templates

“A reconsideration of the classic form of informed consent is necessary in light of AI. We need to support autonomy through practical, flexible consent mechanisms.”

Dr. Kristy Hackett, Institue on Ethics & Policy for Innovation, McMaster University

33 of 62

Questions for Seonyoung Kim about Informed Consent in DV?

https://www.rd-alliance.org/groups/artificial-intelligence-and-data-visitation-aidv-wg/outputs/

12:20

34 of 62

Use Case: FAIRlyz Implementation of AIDV Policies

Patricia Buendia

12:25

35 of 62

2. FAIRlyz Demo of New Features

3:25 minutes video

  1. Video Slide Deck Presentation

2.49 minutes video

Video Presentations

Also in YouTube

Also in YouTube

36 of 62

FAIRlyz: An Infrastructure for Secure Data Visitation

*FAIR data is Findable, Accessible, Interoperable, Reusable

*FAIRLYZ adds anaLYZable as a 5th principle to the FAIR principles

*Data visitation refers to moving the analysis to the data

Manage Data

simply and securely

+

Validate Data

with semi-automated QC through data visitation

+

Share Data

based on FAIR principles

37 of 62

FAIRLYZ: Rethinking the Data Workflow

  • EHRs
  • Omics
  • Public Repositories
  • Researchers

Data Consumer

via Data Visitation

QC Reports

Study Data Registry

Ontology and Omics models

  • Research Institutions
  • Funding Agencies
  • Collaborators
  • AI
  • Access the registry to review
  • Public or Private
  • Data Curation
  • Provide QC Score

Data Contributor

38 of 62

Value Proposition: QC Before Commitment

  • The Pain Point: Researchers invest time and resources navigating DUA, IRB, and downloads only to discover unusable data. Meanwhile, repositories bear the cost of storing and maintaining access portals for datasets that never get used.
  • Our Solution: FAIRlyz QC allows data sharers to run in-place QC and share results before legal steps or repository uploads, aiding them with data curation.
  • The Result: Transparency ensures QC at the source, saving time and speeding scientific discovery. Repositories wait and accept only QC-ed data.

39 of 62

Future Focus

AI-Powered QC

New UI will integrate an AI agent chatbot to guide researchers through complex QC tasks intuitively.

Enhanced Security

Transitioning to a locally trained AI model to eliminate reliance on external APIs, ensuring data remains isolated and secure.

Ecosystem Growth

Release of open-source plugins and integration with federated learning networks to encourage community contribution and maximum extensibility.

40 of 62

Call to Action

Seeking Institutional Data Owners Worldwide:

  • Validation: Help us ensure FAIRlyz meets real-world institutional needs
  • Customization Funding: Support tailored enhancements that reflect your unique data challenges.

Seeking Developers:

  • Developer collaborators for our upcoming open-source plugin development

Seeking Researchers:

  • Partners to test our new AI-powered QC features.

41 of 62

Questions about FAIRlyz Implementation about DV Policies?

https://www.rd-alliance.org/groups/artificial-intelligence-and-data-visitation-aidv-wg/outputs/

12:35

42 of 62

SyftBox: a General Purpose Solution for Data Visitation and Equitable Data Sharing

Lightning Talk

Madhava Jay

12:40

43 of 62

Madhava Jay

🧬 Rare Disease Patient

Software Engineer @ OpenMined

🚀 Help solve data access with open source

🌏 Brisbane, Australia

📧 madhava@openmined.org

44 of 62

Building the public network

for non-public information

Mission

45 of 62

  • Founded in 2017
  • Tech Nonprofit and 501(c)(3)
  • We build open-source privacy-preserving technologies

  • > 30 Team Members
  • > 230 GitHub Repos
  • ~ 20k Slack Community

46 of 62

47 of 62

48 of 62

1. Problems with data sharing

2. A general purpose solution

3. A use-case for equitable genomics

This lightning talk

49 of 62

Data’s true power comes from collaboration.

But many data owners are forced to choose between giving up data ownership through copying and centralization, or simply not participating.

Due to legal and ethical constraints, copying data across borders is often unacceptable; resulting in no action.

We need a new way to collaborate fairly and securely.

The Motivating Problem

50 of 62

Data Visitation

Remotely study data on a computer at another organisation

Data Scientist

Datasite

Can answer a “specific” question

…and only that question

Retains governance over the information they steward

…and never shares a copy of the data

51 of 62

Federated Learning

Data Scientist

Datasite

Datasite

Datasite

FL Project

FL Project

FL Project

Datasite

FL Project

Federated Learning

52 of 62

SyftBox.net

An open-source, privacy-first, decentralized network

for secure data collaboration

53 of 62

  • Apache 2.0 Open-source
  • End-to-end encrypted
  • Permissionless network
  • Supports any data format
  • Runs any program or code
  • Enables federated analysis across multiple datasites
  • Low latency and large file transfer support

The SyftBox Platform

54 of 62

https://github.com/OpenMined/syftbox

Try it out!

55 of 62

BioVault.net

A free, open-source, permissionless network

for collaborative genomics

Built on SyftBox

56 of 62

  • Genomic data is private and very sensitive
  • Researchers face lengthy requests and institutional reviews
  • Difficulties sharing data across datasites due to different policies
  • Participants lack transparency on how their data is used
  • Many communities, especially in the Global South, lack expertise and resources to analyze their own data
  • Unequal and inequitable benefit sharing between haves and have nots

Problems with equitable genomics and data sharing

57 of 62

We allow data owners to make their

data available for remote analysis without uploading or exposing the raw data

Because we’re built on SyftBox and Nextflow, researchers can easily run arbitrary analysis and complex data pipelines

Our solution - data visitation for genomics

58 of 62

Video Slide

59 of 62

60 of 62

Dr Carika Weldon (Bermuda)

Founder of CariGenetics

  • BioVault is enabling researchers in the Global South, to participate in genomics and derive equitable benefits from their data
  • We are also partnering with Human Genome Project II to help deliver infrastructure and capacity building in genomics

Dr Rana Dajani (Jordan)

Professor at Hashemite University

Pilot Programmes

61 of 62

  • We are on a mission to deliver equitable access to data
  • We have resources to help solve your data access problems
  • Contact us to learn more: madhava@openmined.org

SyftBox is Looking for partners and pilots

62 of 62

Any Questions for Madhava about SyftBox?

Or for the Panel?

https://www.rd-alliance.org/groups/artificial-intelligence-and-data-visitation-aidv-wg/outputs/

12:50