AIDA Data Hub
Services for Research and Clinical Innovation in Data Driven Precision Health.
National data infrastructure supporting the Analytic Imaging Diagnostic Arena (AIDA)
Hosted by LiU and the Center for Medical Image Science and Visualization (CMIV)�Funded by SciLifeLab Bioinformatics platform (NBIS)
241211 AIDA Data Hub Data Science Platform for SciLifeLab seminar series
AIDA & AIDA Data Hub
AIDA Community - medtech4health.se/aida
National collaboration arena in AI research and innovation in medical imaging diagnostics.
AIDA Data Hub - datahub.aida.scilifelab.se
The data infrastructure supporting AIDA.
Vetenskapsrådet
Research funding agency
Government
Executive branch
Knut and Alice Wallenberg foundation
Private research funder
VINNOVA
Innovation agency
SciLifeLab
Life science research infrastructure/center
AIDA
Collaboration arena in Swedish medical imaging diagnostics AI innovation
AIDA Data Hub
Data infrastructure supporting AIDA
NBIS
Bioinformatics platform
AIDA mission
Bridge the gap between research promise and patient benefit, through a clinic-native research agenda for innovation.
Clinical wilderness
Research sandbox
AIDA Data Hub Staff
Caroline Bivik Stadler
AIDA Project Lead
Varshith Konda
Systems development
Pontus Freyhult
IT Architect
Betul Eren
Data sharing
Erik Ylipää
Support lead
Emre Balsever
Systems development
Claes Lundström
AIDA lead
Joel Hedlund
AIDA Data Hub Lead
Minh-Ha Le
Systems Development
AIDA Data Hub
E-infrastructure for research and clinical innovation in data driven precision health.
Data services
Data Science Platform for Sensitive data
Support
AIDA Data Hub�Data sharing
Share data with AIDA and the world.
Make high-quality datasets more FAIR and citable using DOI and search engine optimized dataset landing pages.
We cover costs extracting prioritized data for sharing on the AIDA Data Hub.
Manage your own sharing, or delegate handling and paperwork to us.
Data In
Datasets | Scans | Annotations | Size | |
Total | 48 | 81075744 | 39514 | 55.15TB |
14 | 6123 | 39514 | 1.97TB | |
11 | 13240 | 34186 | 10.86TB | |
37 | 81062504 | 5328 | 44.29TB | |
| | | | |
2 | 106448 | 1006448 | 124.32GB | |
Data Out
AIDA Data Hub�Data Science Platform
Secure data science platform co-located with national flagship compute systems.
Supporting advanced data usage patterns:�long term primary storage, collaborate, annotate, share, federate, train AI...
Customers make security decisions as appropriate; outgoing connections to home institution servers, collaborators...
User fees for sustainable operations and development. Discounts to incentivize data sharing and maximize high impact research.
Based on Bigpicture/GDI technologies.
Status
Sensitive Data Services 2.0 extension.
10MSEK hardware installed in NAISS/NSC data center, next to the upcoming Arrhenius.
2 GPU servers with 4xL40s, 8 with 4xL4.� 40 CPU servers with 32cores 1TB RAM.� 3 PB Ceph storage.
OpenStack, Ceph, k8s, scalable as needed.
Aiming to align agreement model with NAISS, for ease of cross-platform use.
Continuous rollout of features and guarantees, planned sensitive data compliance in Dec.
Demo today: DGX-2 like service.
Demo: DGX-2 Like service
New and better
Onboarding: Life Science Login using your home organization account, ORCID, etc.
Self-service portal for booking/managing resources, start/stop when you want.
VPN not required for most use cases.
More compute flavors: More GPUs, newer GPUs, CPU compute.
More storage. PB instead of TB.
Faster easier software installations from Ubuntu apt repositories, GitHub, pip, and DockerHub, through an inspecting http proxy.
Still missing
Compliance work.
Contract templates.
Sensitive data sharing.
Demo: DGX-2 Like service
Demo time.
https://datahub.aida.scilifelab.se/data-science-platform/examples/gpu-sd-iaas-jupyter
Data Science Platform Launch party!
To celebrate the successful establishment of our Data Science Platform we are arranging a two-day conference on Mar 19-20 2025 at CMIV with national and international speakers and a launch celebration dinner.
Registation will open in Jan 2025!
https://datahub.aida.scilifelab.se/events/2025-03-19-data-science-platform-launch-party/
AIDA Data Hub
Thank you!
�Services for Research and Clinical Innovation in Data Driven Precision Health
National data infrastructure supporting the Analytic Imaging Diagnostic Arena (AIDA)
Hosted by LiU and the Center for Medical Image Science and Visualization (CMIV)�Part of SciLifeLab Bioinformatics platform (NBIS)
Questions?
Extra slides in case of questions
Hardware
Compute
2 L40s GPU servers with 4 GPUs (48 Gbyte VRAM/GPU), 32 CPU cores, 512 Gbyte RAM, 8 Tbyte local high speed storage
6 L4 GPU servers with 4 GPUs (24 Gbyte VRAM/GPU), 32 CPU cores, 512 Gbyte RAM, 8 Tbyte local high speed storage
40 CPU compute servers with 1 Tbyte RAM 32 cores, 6.4 Tbyte local high speed storage
Storage
3 PB of raw storage for Ceph (3168 Tbyte HDD, 156 Tbyte high speed storage)
Establishment
First: Basal services for technical experts.
Progressively more advanced services for a progressively broader audience.
Service delivery roadmap and iterative development priorities will be based on continuous stakeholder dialogue.
Customer model
Generally: Activity with legal basis for processing, such as a clinic or company.
Typically: Ethically approved research project, a research institute represented by a competent researcher (PI).
Customer segmentation: you cannot see other customers, they cannot see you.
Customer makes security decisions appropriate for their project.
Business model
Funded by user fees.
Service portfolio priced for sustainable operations and development.
Yearly membership fee provides basic service for typical research projects.
Additional services cost extra, e.g. GPU, primary storage, etc.
Discounts offered to maximize high impact research output.
Fee waivers for data sharing parties who �help build the data commons / data lake.
€€€
€
Basic service
Tentative fee: 50 kSEK/yr.
Up to 2 TB quota on private project storage (no backup) accessible through e.g. Windows file sharing.
Multifactor login using Life Science AAI and your home organization account.
Access to shared datasets on approval, does not count toward project storage quota.
Add-on services
Tentative prices, pay as you go.
Backed up primary storage�~2.5 kSEK/TB/yr
Large volume project storage�~1.5 kSEK/TB/yr
Large scale CPU compute�24 kSEK/CPU/yr
GPU compute�80 kSEK/GPU/yr�
Data sharing: Free of charge�Help build the data lake / data commons.
Building the data commons
Incentivize FAIR sharing of health data for secondary use in OpenScience, to help build the data lake / data commons.
Ethical- and legal support to preparation of high-quality datasets.
Support to handling access requests.
Support to publishing and advertising, for increased academic impact.
Discounts to data sharing parties.
Data sharing
FAIR data sharing with the world.
Make high-quality datasets citable using Digital Object Identifiers and Search Engine Optimized landing pages.
Personal data or anonymized data.
Manage access requests using Resource Entitlement Management System, or delegate handling to the AIDA Data Hub Data Access Committee.
Based on Bigpicture/GDI technologies.
Upcoming services
Secure remote desktop�Intended default interface.
Authorized data import/exports�PI can delegate import/export rights.
Telerad destination & DICOM router�Receive images from specified scanners.
OpenEHR proxy�Approved comms with EHR systems.
Sectra PACS�Project private Sectra PACS.
Authentication and authorization
Multi-factor authentication with your home organization account using Life Science Login.
Customer-managed authorization using the Life Science Login Perun groupware.
<home organization>
Perun
Bigpicture Petabyte platform for European digital pathology AI
AIDA Data Hub leading repository infrastructure development, which is carried out in� collaboration with sensitive data teams at the NBIS Systems Development unit and CSC.fi.
� Large scale archive operations started Mar 2023.
EUCAIM Federated infrastructure for cancer imaging data
AIDA Data Hub contributing data collaboration workspaces for use in EUCAIM� with cancer imaging data based on Bigpicture Federated node technologies. �� Collaboration with sensitive data teams at the NBIS Systems Development unit.
ASHA - Använda Standardiserade Hälsodata som Accelerator
RÖ led VINNOVA Systems demonstrator for Data lake systems for primary and secondary use.�AIDA Data Hub provides spaces for secondary use.
SCAPIS Image data sharing
through AIDA Data Hub
All SCAPIS imaging data to be shared through AIDA Data Hub (~100 TB) as �24 datasets.
Legal agreements being prepared.
Launch originally planned for AIDA Days in Gothenburg in Oct.
Tech solution is production ready.
Demo today.
Process overview
You ask SCAPIS for access to datasets.
SCAPIS tells us to give you access.
You get the data from AIDA Data Hub.
In more detail
1. Researcher finds data
Use a normal web browser to search for good data.
The top hit is a landing page that describes a dataset on the platform.
The landing page is easy to find, because the page page is made easy to understand for computers, aka "search engine optimised" using schema.org LD-JSON.
Researcher
1. Researcher finds data
The landing page has basic information on the dataset.
It explains why you should bother applying for access.
Note: Google picks up our sample images, and shows them already in their search results.
The "Apply for access" button takes you to SCAPIS.
Researcher
Apply for Access
2. Researcher applies for access
The researcher goes through normal SCAPIS procedures to apply for access to the dataset.
Researcher
SCAPIS
§
?
3. SCAPIS approves access
SCAPIS goes through the normal access request evaluation procedures, and approves the request.
§
SCAPIS
Researcher
👍
4. SCAPIS tells AIDA Data Hub to
give access
SCAPIS instructs AIDA Data Hub to give the researcher access to the dataset.
§
SCAPIS
AIDA Data Hub
👍
5. Researcher gets account at
AIDA Data Hub
High security service.
Three-factor authentication 2fa SSLVPN + ssh pubkey.
NG SDS will support Life Science AAI, using your normal institutional login.
AIDA Data Hub
AIDA DGX-2 Service
Service for best-in-class researchers in �Swedish medical imaging diagnostic AI.�Secure enough for medical personal data.
Researcher
?
6. Researcher downloads data
Insert live demo here.
Researcher
7. Optional: Researcher joins AIDA� and uses platform compute power
�Current interface is "ssh tunnel + bash".
SDS 2.0 will offer more types of interface, suitable for wider ranges of professionalities and competencies.
AIDA DGX-2 Service
Service for best-in-class researchers in �Swedish medical imaging diagnostic AI.�Secure enough for medical personal data.
Researcher