1 of 47

AIDA Data Hub

Services for Research and Clinical Innovation in Data Driven Precision Health.

National data infrastructure supporting the Analytic Imaging Diagnostic Arena (AIDA)

Hosted by LiU and the Center for Medical Image Science and Visualization (CMIV)�Funded by SciLifeLab Bioinformatics platform (NBIS)

241211 AIDA Data Hub Data Science Platform for SciLifeLab seminar series

2 of 47

AIDA & AIDA Data Hub

AIDA Community - medtech4health.se/aida

National collaboration arena in AI research and innovation in medical imaging diagnostics.

AIDA Data Hub - datahub.aida.scilifelab.se

The data infrastructure supporting AIDA.

3 of 47

Vetenskapsrådet

Research funding agency

Government

Executive branch

Knut and Alice Wallenberg foundation

Private research funder

VINNOVA

Innovation agency

SciLifeLab

Life science research infrastructure/center

AIDA

Collaboration arena in Swedish medical imaging diagnostics AI innovation

AIDA Data Hub

Data infrastructure supporting AIDA

NBIS

Bioinformatics platform

4 of 47

AIDA mission

Bridge the gap between research promise and patient benefit, through a clinic-native research agenda for innovation.

Clinical wilderness

Research sandbox

5 of 47

AIDA Data Hub Staff

Caroline Bivik Stadler

AIDA Project Lead

Varshith Konda

Systems development

Pontus Freyhult

IT Architect

Betul Eren

Data sharing

Erik Ylipää

Support lead

Emre Balsever

Systems development

Claes Lundström

AIDA lead

Joel Hedlund

AIDA Data Hub Lead

Minh-Ha Le

Systems Development

6 of 47

AIDA Data Hub

E-infrastructure for research and clinical innovation in data driven precision health.

Data services

  • Access high quality datasets
  • FAIR sharing of DOI citable datasets
  • Extract and enrich clinical data for research

Data Science Platform for Sensitive data

  • Secure long term primary storage & compute.
  • Advanced usage patterns: �collaborate, annotate, federate, train AI, ...

Support

  • Data sharing, ethics, legal, and policy
  • AI development & System design

7 of 47

AIDA Data Hub�Data sharing

Share data with AIDA and the world.

Make high-quality datasets more FAIR and citable using DOI and search engine optimized dataset landing pages.

We cover costs extracting prioritized data for sharing on the AIDA Data Hub.

Manage your own sharing, or delegate handling and paperwork to us.

8 of 47

9 of 47

Data In

Datasets

Scans

Annotations

Size

Total

48

81075744

39514

55.15TB

14

6123

39514

1.97TB

11

13240

34186

10.86TB

37

81062504

5328

44.29TB

2

106448

1006448

124.32GB

10 of 47

Data Out

Metrics:

  • Countries: 42
  • External sharing events: 242

11 of 47

AIDA Data Hub�Data Science Platform

Secure data science platform co-located with national flagship compute systems.

Supporting advanced data usage patterns:�long term primary storage, collaborate, annotate, share, federate, train AI...

Customers make security decisions as appropriate; outgoing connections to home institution servers, collaborators...

User fees for sustainable operations and development. Discounts to incentivize data sharing and maximize high impact research.

Based on Bigpicture/GDI technologies.

12 of 47

13 of 47

14 of 47

Status

Sensitive Data Services 2.0 extension.

10MSEK hardware installed in NAISS/NSC data center, next to the upcoming Arrhenius.

2 GPU servers with 4xL40s, 8 with 4xL4.� 40 CPU servers with 32cores 1TB RAM.� 3 PB Ceph storage.

OpenStack, Ceph, k8s, scalable as needed.

Aiming to align agreement model with NAISS, for ease of cross-platform use.

Continuous rollout of features and guarantees, planned sensitive data compliance in Dec.

Demo today: DGX-2 like service.

15 of 47

Demo: DGX-2 Like service

  1. Launch a GPU enabled virtual machine, �in a secure environment, using a 2fa customer self-service portal.
  2. Install software from public repositories that are trusted by the platform.
  3. Upload own data.
  4. Inspect data in a remote desktop.
  5. Use a Jupyter notebook to train an AI model, and monitor progress graphically.

16 of 47

New and better

Onboarding: Life Science Login using your home organization account, ORCID, etc.

Self-service portal for booking/managing resources, start/stop when you want.

VPN not required for most use cases.

More compute flavors: More GPUs, newer GPUs, CPU compute.

More storage. PB instead of TB.

Faster easier software installations from Ubuntu apt repositories, GitHub, pip, and DockerHub, through an inspecting http proxy.

17 of 47

Still missing

Compliance work.

Contract templates.

Sensitive data sharing.

18 of 47

Demo: DGX-2 Like service

19 of 47

Data Science Platform Launch party!

To celebrate the successful establishment of our Data Science Platform we are arranging a two-day conference on Mar 19-20 2025 at CMIV with national and international speakers and a launch celebration dinner.

Registation will open in Jan 2025!

https://datahub.aida.scilifelab.se/events/2025-03-19-data-science-platform-launch-party/

20 of 47

AIDA Data Hub

Thank you!

Services for Research and Clinical Innovation in Data Driven Precision Health

National data infrastructure supporting the Analytic Imaging Diagnostic Arena (AIDA)

Hosted by LiU and the Center for Medical Image Science and Visualization (CMIV)�Part of SciLifeLab Bioinformatics platform (NBIS)

21 of 47

22 of 47

Questions?

23 of 47

Extra slides in case of questions

24 of 47

Hardware

Compute

2 L40s GPU servers with 4 GPUs (48 Gbyte VRAM/GPU), 32 CPU cores, 512 Gbyte RAM, 8 Tbyte local high speed storage

6 L4 GPU servers with 4 GPUs (24 Gbyte VRAM/GPU), 32 CPU cores, 512 Gbyte RAM, 8 Tbyte local high speed storage

40 CPU compute servers with 1 Tbyte RAM 32 cores, 6.4 Tbyte local high speed storage

Storage

3 PB of raw storage for Ceph (3168 Tbyte HDD, 156 Tbyte high speed storage)

25 of 47

Establishment

First: Basal services for technical experts.

Progressively more advanced services for a progressively broader audience.

Service delivery roadmap and iterative development priorities will be based on continuous stakeholder dialogue.

26 of 47

Customer model

Generally: Activity with legal basis for processing, such as a clinic or company.

Typically: Ethically approved research project, a research institute represented by a competent researcher (PI).

Customer segmentation: you cannot see other customers, they cannot see you.

Customer makes security decisions appropriate for their project.

27 of 47

Business model

Funded by user fees.

Service portfolio priced for sustainable operations and development.

Yearly membership fee provides basic service for typical research projects.

Additional services cost extra, e.g. GPU, primary storage, etc.

Discounts offered to maximize high impact research output.

Fee waivers for data sharing parties who �help build the data commons / data lake.

€€€

28 of 47

Basic service

Tentative fee: 50 kSEK/yr.

Up to 2 TB quota on private project storage (no backup) accessible through e.g. Windows file sharing.

Multifactor login using Life Science AAI and your home organization account.

Access to shared datasets on approval, does not count toward project storage quota.

29 of 47

Add-on services

Tentative prices, pay as you go.

Backed up primary storage�~2.5 kSEK/TB/yr

Large volume project storage�~1.5 kSEK/TB/yr

Large scale CPU compute�24 kSEK/CPU/yr

GPU compute�80 kSEK/GPU/yr�

Data sharing: Free of charge�Help build the data lake / data commons.

30 of 47

Building the data commons

Incentivize FAIR sharing of health data for secondary use in OpenScience, to help build the data lake / data commons.

Ethical- and legal support to preparation of high-quality datasets.

Support to handling access requests.

Support to publishing and advertising, for increased academic impact.

Discounts to data sharing parties.

31 of 47

Data sharing

FAIR data sharing with the world.

Make high-quality datasets citable using Digital Object Identifiers and Search Engine Optimized landing pages.

Personal data or anonymized data.

Manage access requests using Resource Entitlement Management System, or delegate handling to the AIDA Data Hub Data Access Committee.

Based on Bigpicture/GDI technologies.

32 of 47

Upcoming services

Secure remote desktop�Intended default interface.

Authorized data import/exports�PI can delegate import/export rights.

Telerad destination & DICOM router�Receive images from specified scanners.

OpenEHR proxy�Approved comms with EHR systems.

Sectra PACS�Project private Sectra PACS.

33 of 47

Authentication and authorization

Multi-factor authentication with your home organization account using Life Science Login.

Customer-managed authorization using the Life Science Login Perun groupware.

<home organization>

Perun

34 of 47

Bigpicture Petabyte platform for European digital pathology AI

AIDA Data Hub leading repository infrastructure development, which is carried out in� collaboration with sensitive data teams at the NBIS Systems Development unit and CSC.fi.

� Large scale archive operations started Mar 2023.

35 of 47

EUCAIM Federated infrastructure for cancer imaging data

AIDA Data Hub contributing data collaboration workspaces for use in EUCAIM� with cancer imaging data based on Bigpicture Federated node technologies. �� Collaboration with sensitive data teams at the NBIS Systems Development unit.

36 of 47

ASHA - Använda Standardiserade Hälsodata som Accelerator

RÖ led VINNOVA Systems demonstrator for Data lake systems for primary and secondary use.�AIDA Data Hub provides spaces for secondary use.

37 of 47

SCAPIS Image data sharing

through AIDA Data Hub

All SCAPIS imaging data to be shared through AIDA Data Hub (~100 TB) as �24 datasets.

Legal agreements being prepared.

Launch originally planned for AIDA Days in Gothenburg in Oct.

Tech solution is production ready.

Demo today.

38 of 47

Process overview

You ask SCAPIS for access to datasets.

SCAPIS tells us to give you access.

You get the data from AIDA Data Hub.

39 of 47

In more detail

  1. Researcher finds data
  2. Researcher applies for access
  3. SCAPIS approves access
  4. SCAPIS tells AIDA Data Hub to give access
  5. Researcher gets account at AIDA Data Hub
  6. Researcher downloads data
  7. Optional: Researcher joins AIDA and uses on-platform compute power

40 of 47

1. Researcher finds data

Use a normal web browser to search for good data.

The top hit is a landing page that describes a dataset on the platform.

The landing page is easy to find, because the page page is made easy to understand for computers, aka "search engine optimised" using schema.org LD-JSON.

Researcher

41 of 47

1. Researcher finds data

The landing page has basic information on the dataset.

It explains why you should bother applying for access.

Note: Google picks up our sample images, and shows them already in their search results.

The "Apply for access" button takes you to SCAPIS.

Researcher

Apply for Access

42 of 47

2. Researcher applies for access

The researcher goes through normal SCAPIS procedures to apply for access to the dataset.

Researcher

SCAPIS

§

?

43 of 47

3. SCAPIS approves access

SCAPIS goes through the normal access request evaluation procedures, and approves the request.

§

SCAPIS

Researcher

👍

44 of 47

4. SCAPIS tells AIDA Data Hub to

give access

SCAPIS instructs AIDA Data Hub to give the researcher access to the dataset.

§

SCAPIS

AIDA Data Hub

👍

45 of 47

5. Researcher gets account at

AIDA Data Hub

High security service.

Three-factor authentication 2fa SSLVPN + ssh pubkey.

NG SDS will support Life Science AAI, using your normal institutional login.

AIDA Data Hub

AIDA DGX-2 Service

Service for best-in-class researchers in �Swedish medical imaging diagnostic AI.�Secure enough for medical personal data.

Researcher

?

46 of 47

6. Researcher downloads data

Insert live demo here.

Researcher

47 of 47

7. Optional: Researcher joins AIDA� and uses platform compute power

�Current interface is "ssh tunnel + bash".

SDS 2.0 will offer more types of interface, suitable for wider ranges of professionalities and competencies.

AIDA DGX-2 Service

Service for best-in-class researchers in �Swedish medical imaging diagnostic AI.�Secure enough for medical personal data.

Researcher