1 of 27

ENIGMA and COINSTAC�Theo G.M. van Erp, Ph.D.��Department of Psychiatry and Human Behavior�University of California Irvine, Irvine, CA, USA

ENIGMA Chairs Meeting 2020

2 of 27

ENIGMA

Enhancing NeuroImaging Genetics through Meta-Analysis

1400+ scientists

43 countries

50 working groups

Paul Thompson

Jason Stein

ENIGMA’s first consortium paper

Stein et al. VOLUME 44, NUMBER 5, MAY 2012, Nature Genetics

3 of 27

51 working groups (95 chairs), 31 disorders; 1,400 members, 400 institutions, 43 countries

3

4 of 27

5 of 27

33 clinical working groups

5

6 of 27

ENIGMA Disorder Working Group

Conduct prospective meta-/mega-analyses to:

  1. Rank order case-control (disease-related) effect sizes of brain measures

  • Examine factors that may moderate these effects sizes

  • Compare brain effects across disorders

Schizophrenia

Jess & Theo

Bipolar Disorder

Chris, Derrek, & Ole

Major Depressive Disorder

Lianne & Dick

7 of 27

ENIGMA’s Analysis Approach

  • Structural image analysis using FreeSurfer 5.1 or 5.3 (some 6.0dev) – earlier studies also used FSL
  • Extract deep brain structure volumes, and mean cortical thickness and surface area measures within Desikan-Killiany Atlas ROIs (Desikan et al. 2006)
  • used a common set of quality assurance procedures, which included review of distributions for outliers and visual inspection of deep brain structure and cortical surface labels
  • Run statistical analyses in R (lm) to generate effect sizes (Cohen’s d and partial correlations)
  • Meta-analyses using inverse variance-weighted random-effects model as implemented in the R package metafor (version 1.9-7) with FDR correction for multiple comparisons.

Cortical thickness and surface area are thought to be influenced by separate sets of genes

(Panizzon eta l. 2009; Winkler et al. 2010).

8 of 27

9 groups published their cortical papers

8

9 of 27

ENIGMA initial meta-analysis: �Dependent on Freesurfer common structure, and human agreement on covariates of interest

Standard CSV files of variables

R scripts

zip/tar file of results returned to central analysis team for meta-analysis

Automation

via

Linux Shell Scripts

10 of 27

Automating ENIGMA: COINSTAC

  • Collaborative Informatics and Neuroimaging Suite

Toolkit for Anonymous Computation (COINSTAC)

- enables federated data analysis

11 of 27

A. D. Sarwate, S. M. Plis, J. A. Turner, M. R. Arbabshirani, and V. D. Calhoun, "Sharing privacy-sensitive access to neuroimaging and genetics data: a review and preliminary validation," Front Neuroinform, vol. 8, p. 35, 2014, 3985022.

S. M. Plis, A. D. Sarwate, D. Wood, C. Dieringer, D. Landis, C. Reed, S. R. Panta, J. A. Turner, J. M. Shoemaker, K. W. Carter, P. Thompson, K. Hutchison, and V. D. Calhoun, "COINSTAC: A Privacy Enabled Model and Prototype for Leveraging and Processing Decentralized Brain Imaging Data," Frontiers in Neuroscience, vol. 10, Aug 19 2016.

COINSTAC: Local analysis, globally aggregated

12 of 27

The process begins when the lead site defines the local and global analyses.

The local analysis is downloaded by each site (docker/singularity image within coinstac) to run a local data analysis that generates intermediate results that are combined in the global analysis that is run at the lead site.

1

Lead Site

From: Turner, MD., Gazula, H., et al., ENIGMA COINSTAC: Increasing neuroimaging data diversity with managed privacy, https://doi.org/10.13140/RG.2.2.18471.37285 

13 of 27

Participating sites --collaborating sites with similar data-- can see how their data is to be used (they can see all of the analyses proposed).

If they agree, they may join the consortium to participate in the meta-analysis, and map their data to the analysis.

2

Participating

Sites

14 of 27

3

Once the participating sites have joined the consortium and mapped their data, the lead site launches all of the local analyses.

Analyses at the participating run automatically and return the intermediate data to the lead site.

15 of 27

ENIGMA COINSTAC cross-diagnosis collaborations

ENIGMA SZ

rsfMRI

DTI

Structural

Regressions

Multivariate

Multimodal ML

ENIGMA BD

ENIGMA MDD

ENIGMA-COINSTAC: Advanced Worldwide Transdiagnostic Analysis Of Valence System Brain Circuits

16 of 27

Neural Circuitry Associated with Negative Symptom Domains

Strauss et al. 2019; JAMA Psychiatry

MPIs: Turner, Van Erp, & Calhoun

1R01MH121246

17 of 27

Overview of COINSTAC Analysis

  • The consortium leader:
    • names the consortium
    • selects an analysis pipeline
  • Members:
    • join the consortium
    • download a dockerized version of the consortium-specific analysis pipeline to their local COINSTAC instances
    • map their data files for analysis (e.g., csv files with the relevant clinical and demographic data)
  • The consortium leader:
    • launches the pipeline.

18 of 27

COINSTAC ENIGMA Negative Symptom Domain Analysis Pipeline

  • At each site:
    • Computes SANS factor scores
    • Runs regressions between SANS factor scores and deep brain structure volumes and cortical thickness and surface area measures
  • At consortium leader site:
    • Runs meta-analyses
  • We will compare the COINSTAC meta-analysis output with the meta-analysis done via the method of distributing code and obtaining analysis output
  • The first “real” ENIGMA COINSTAC test in early June involved seven sites from Australia to Germany
    • This identified issues with the transfer of large files
    • And some lack of robustness in the R scripts

  • We ran a successful analysis with 7 sites on 7/30/2020.

19 of 27

Using COINSTAC

Where to download:

https://github.com/trendscenter/coinstac/releases

Currently v4.7.5 is the latest binary release:

After download and uncompressing, just double click on the “coinstac” icon

20 of 27

COINSTAC Login Screen

21 of 27

Computations

Download Docker/Synology Image

to local compute node

Remove Docker /Synology Image

from local compute node

Add a New Docker/Synology Image

22 of 27

Pipeline Menu

c

R code to run analyses at individual

sites + R code to run meta-analysis

Create a new pipeline

23 of 27

Consortia Menu

Start new consortium

Leave a consortium you joined

Join a consortium

Map data to a consortium you joined

24 of 27

Map Local Data

25 of 27

Launch the Pipeline

See the Remote and Local Pipeline Progress and get your results.

c

See that all sites have mapped

their data successfully

26 of 27

International multisite collaborations

Power - Statistical power to tackle new kinds of questions (‘cracking the brain’s genetic code’), impossible to solve before

Computational efficiency - Massive distributed computing, leverage infrastructure/data across 400 institutions across 40+ countries; beginning new cooperative machine learning, deep learning, new mathematics

Open Science & Reproducibility Studies are designed to ensure replication (as seen from Forest plots). Proposals = registered reports. Analyses run are public info.

Team Science experts from all fields working together makes for more expertise and a much stronger publication, and everyone learns and gains!

Advancing Science Rather than try to answer the same question independently, merge data already collected to drive new investigations

27 of 27

Acknowledgments

NSF: 1631819

NIH COINSTAC: R01DA040487

NIH COINSTAC ENIGMA: 1R01MH121246

NIH COINSTAC GIGA/ABCD: R01DA049238

Turner Lab