1 of 8

Artificial

Intelligence in

Medicine

AIM1 - Data Harmonization

Joint Analysis Pipeline for Data Harmonization in MRI and PET

A. Retico (PI)

AIM, AIM1 Meeting - Febr 17, 2021 https://agenda.infn.it/event/25883/

AIM1.T1 AIM1.T3

AIM, CSN5, 2019-2021

2 of 8

AIM1 - multi-site data harmonization

data gathered by different sites and/or acquisition systems carries local “fingerprint”, often to the detriment of the much more subtle information of interest.

this problem is akin to the management �of systematic errors

typical application cases: MRI, RX, PET, NPSY tests

Autism Brain Imaging Data Exchange

2226 subjects

1060 ASDs

1166 TDCs

907 M

153 F

879 M

287 F

Age at Scan 5 – 64 years

40 different acquisition sites

AIM, CSN5, 2019-2021

3 of 8

Data normalization strategy already implemented in AIM

  • Intra-subject Normalization (Pisa, Ferrari et al. 2020, AIIM 108, 101926. https://doi.org/10.1016/j.artmed.2020.101926):
    • volumetric features are divided by the total intracranial volume
    • cortical surfaces are divided by the area of the total white matter
    • cortical thicknesses are divided by the mean cortical thickness across the entire brain.
  • Adjust for know batches using COMBAT [1,2] (Bologna)
  • Adjust for know batches using COMBAT [1,2] (Bari, Lombardi A, Amoroso N., Diacono D., Monaco A., Tangaro S., Bellotti R. Extensive Evaluation of Morphological Statistical Harmonization for Brain Age Prediction, Brain Sciences, 2020, 10(6), 364)

[1] Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007 Jan 1;8(1):118-27.

[2] Fortin, J.P.; Parker, D.; Tunç, B.; Watanabe, T.; Elliott, M.A.; Ruparel, K.; Roalf, D.R.; Satterthwaite, T.D.; Gur, R.C.; Gur, R.E.; et al. Harmonization of multi-site diffusion tensor imaging data. Neuroimage 2017, 161, 149–170.

3

AIM, CSN5, 2019-2021

4 of 8

Recently…

Dataset of 10477 typical subjects [3-96 years]

ComBat-GAM

Not limited to linear model for age trends, but introduces Generalized Additive Model (GAM)

4

non-linear function of age, sex, TIV

location

scale

AIM, CSN5, 2019-2021

5 of 8

Age trends for selected ROI (Pomponio et al. 2020)

5

AIM, CSN5, 2019-2021

6 of 8

Final considerations from Pomponio et al. 2020

  • The authors made publicly-available:
    • visualization tool we provide as a product of the LIFESPAN dataset (https://rpomponio.shinyapps.io/neuro_lifespan/)
    • a package that enables users to apply ComBat-GAM on their own datasets (https://github.com/rpomponio/neuroHarmonize)

6

AIM, CSN5, 2019-2021

7 of 8

Proposed pipeline for AIM1 joint analysis on Data Harmonization

  • Milestone 2021 related to AIM1
    • AIM.1: Valutazione dell’impatto delle diverse strategie implementate per l’armonizzazione dei dati e identificazione delle strategie ottimali rispettivamente per studi MRI/Mammografici/PET multicentrici

  • We are probably ready to define a common strategy for MRI and PET data harmonization. Breast imaging will be handled in a separate work.

7

AIM, CSN5, 2019-2021

8 of 8

Proposed pipeline for AIM1 joint analysis on Data Harmonization

Paper outline (“Impact of data harmonization on ML performance in multicenter studies: MRI, fMRI and PET applications”):

  • Introduction on the need for data harmonization (DH) in multicenter studies
  • Overview of most used techniques (focusing on MRI, fMRI and PET)
  • Focus on the impact of DH in machine learning (ML) applications (limiting to case-control classification)
  • Each clinical question deserves a specific harmonization strategy
  • (harmonizing raw data vs. harmonizing extracted features; which DH strategy?)
  • Case studies:
    • selection of available/accessible “large” datasets
    • selection of clinical questions that can be answered with ML (classification/regression)
    • evaluation of the effect of DH on performance (choice of appropriate evaluation metrics)
  • Discussion and definition of “general guidelines” for DH

8

AIM, CSN5, 2019-2021