1 of 58

Open Data Science and Reproducibility

Mikolaj A. Pawlak MD PhD

mpawlak@ump.edu.pl

Department of Neurology and Cerebrovascular Disorders

Poznan University of Medical Sciences

Research Methodology Conference, February 3rd, 2018

2 of 58

Topics

  • The Problem
  • Solutions
  • Your Role in Clinical Science
  • Ideas for practical implementation

3 of 58

The Problem

  • Lack of confidence in science
  • Irreproducible results
  • Inconclusive research
  • Abnormal structure of scientific community
  • Publication bias

4 of 58

5 of 58

6 of 58

Irreproducibility of preclinical cancer research

  • Problem starts when the data has to be translated into operational research and products
  • Preclinical work should led to Phase I & II clinical trials
  • Success rate falling to 18%
  • Health outcomes depend on the results of published research

7 of 58

Believe it or not

8 of 58

9 of 58

  • poor experimental design
  • inappropriate analysis
  • questionable research practices
  • Cultural factors:
    • highly competitive research environment
    • high value placed on novelty
    • publication in high-profile journals

10 of 58

11 of 58

Scale and the consequences

The idea that the same experiments always get the same results, no matter who performs them, is one of the cornerstones of science’s claim to objective truth.

12 of 58

13 of 58

Solutions

  • Raw data sharing
  • Open data projects
  • Open Access publishing
  • Reproducible research

14 of 58

Measures that might improve reproducibility

  • Greater openness and transparency
  • Better use of input and advice from other experts
  • Reporting guidelines
  • Post-publication peer review
  • Pre-registration of protocols and plans for analysis
  • Better use of standards and quality control measures

15 of 58

16 of 58

17 of 58

Swedish system for monitoring multiple sclerosis

18 of 58

Swedish system for monitoring multiple sclerosis

19 of 58

Examples of open data

  • NIH Pediatric database
  • ADNI
  • 1000 functional connectomes
  • INDI
  • Human Connectome project
  • Brain Development Database
  • Nationale Kohorte

20 of 58

National Database for Autism Research

  • The National Database for Autism Research (NDAR) is an NIH-funded research data repository
  • aims to accelerate progress in autism spectrum disorders (ASD) research through data sharing, data harmonization, and the reporting of research results
  • serves as a scientific community platform and portal to multiple other research repositories, allowing for aggregation and secondary analysis of data

https://ndar.nih.gov/

21 of 58

22 of 58

1000 Functional Connectomes

23 of 58

Rockland sample

24 of 58

ADHD 200

  • International project aimed at finding imaging features of ADHD
  • Machine learning competition based on brain imaging data

25 of 58

Autism Brain Image Database ABIDE

  • 539 ASD
  • 579 NC
  • Structural brain data and resting state BOLD fMRI
  • behavioral data
  • 17 sites

26 of 58

NIH Pediatric database

  • healthy kids 5-18 y.o.
  • Learning about the dynamics of brain structure change in healthy population

27 of 58

28 of 58

Brain Development Database

29 of 58

Brain Development Database IXI

IXI - Information eXtraction from Images (EPSRC GR/S21533/02)

The images in NIFTI format can be downloaded from here:

30 of 58

31 of 58

Alzheimer Disease Neuroimaging Initiative

  • Largest project oriented at data acquisition and sharing in Alzheimer disease
  • Initial budget ~$60M
  • Currently 3rd iteration

32 of 58

ADNI perspectives

Dementia is not just a neurological symptom, it is a population challenge

33 of 58

34 of 58

35 of 58

Parkinson Progression Markers Initiative

  • database for brain samples
  • the aim is to create a reference repository for assessment of clinical progression in parkinson disease

36 of 58

37 of 58

The Open Access Series of Imaging Studies (OASIS)

  • cross-sectional collection of 416 subjects aged 18 to 96
  • For each subject, 3 or 4 individual T1-weighted MRI scans obtained in single scan sessions are included
  • longitudinal collection of 150 subjects aged 60 to 96. Each subject was scanned on two or more visits, separated by at least one year for a total of 373 imaging sessions

38 of 58

Human Connectome Project

39 of 58

Human Connectome Project

S - 1200 subjects

S - HQ data 3T(Skyra HCP) i 7T

W - not everything can be done in one go

O - two day neuropsych evaluation

T - multiple sites interested

40 of 58

Big Data - computing needs

  • cloud-based tools
  • nitrc type repositories
  • Virtual research units

41 of 58

Your Role in Clinical Science

  • Question scientific results
  • Learn methods
  • Get the raw data and analyze it yourself
  • Replicate previous findings and publish it

42 of 58

R

  • Open and free www.r-project.org
  • Numerous solutions to common and uncommon biomedical problems
  • Collaborative learning (Coursera & Edx)
  • Packages with examples
  • Publishable scripts enable reproducible research and literate programming

43 of 58

44 of 58

Survival analysis

45 of 58

Survival data is all around you

46 of 58

Neurodebian

  • Virtual computing environment for brain image processing
  • Multiple software packages
  • OS independent
  • Data-rich
  • Practical tutorials

47 of 58

48 of 58

49 of 58

Ideas for Practical Implementation

  • Get involved in reproducibility project
  • Learn open science skills (stats, image processing)
  • Focus on clinical problems (hard outcomes - survival, incidence)
  • Publish Open Access paper for free (Springer Open Choice)

50 of 58

51 of 58

Number of open access publications

BMC Medicine 2012, 10:124  doi 10.1186/1741-7015-10-124

52 of 58

Nationale Kohorte

53 of 58

Open Science Prize

54 of 58

Summary

  • Active participation in science is critical for Clinician Scientists
  • Tools are open and data is free
  • Ideas for research come from real clinical problems
  • Intro to Research elective course

mpawlak@ump.edu.pl

55 of 58

References

56 of 58

References

57 of 58

References

58 of 58

References