1 of 34

ReproSchema�Harmonize data by design

satra@mit.edu

2 of 34

  • Collaborative partnership between ABCD and ReproNim investigators (https://www.repronim.org)
  • Organized by Angie Laird, David Kennedy, Satra Ghosh, JB Poline
  • 13-week Online Course, followed by 5-day Project Week
  • No registration fees, all materials open and accessible
  • October 16 – November 20, 2020 (Weeks 1 – 6)
  • January 15 – February 26, 2021 (Weeks 7 – 13)
  • March 8 – 12, 2021 (Project Week)
  • Syllabus: https://www.abcd-repronim.org/syllabus.html
  • We are currently accepting student applications (Aug 28) and applications for teaching assistants (Aug 21)

ABCD-ReproNim: An ABCD Course for Reproducible Analyses

The ABCD-ReproNim Course was designed to provide a comprehensive background to data from the ABCD Study® while delivering hands-on, interactive instruction to enable rigorous and reproducible data analyses.

ABCD-ReproNim Course is supported by an award from the National Institute of Drug Abuse (R25-DA051675).

ABCD-ReproNim training is targeted to students, postdoctoral fellows, and early career faculty.

3 of 34

Thank you!

Sanu Ann

Abraham

Daniel Low

Anisha Keshavan

Zaliqa Rosli

Remi Gau

Arno Klein

Jon Clucas

Child Mind Institute

Mindlogger Team

McGill U.�LORIS

U Louvain�COBIDAS

Octave Bioscience

MIT

Harvard�MIT

Hauke Bartsch

U Bergen�ABCD

4 of 34

What is ReproSchema?

It is a ReproNim Project to:

  • solve a set of problems associated with clinical and behavioral data collection
  • relate data across studies and projects in one or more laboratories.
  • create an open and reusable library of clinical and behavioral assessments

5 of 34

Assessments

Data Collection

Data Submission

Mostly In-Lab

Public/Proprietary

Collaborators/Public/Private

indLogger

6 of 34

The Challenges

  • Curation
    • Knowledge of assessment, standards, schemas, technologies, licensing
    • Intent of a protocol or assessment
  • Standards, formats, ontologies
    • Number, evolution, stability
  • Standardized activities and protocols
    • No public library of assessments
  • Provenance
    • Data loses reference to collection instrument
    • Linking information lacking between data collection plan and data collection
  • Extensibility
    • Domains evolve: New assessments, sensors
    • Other use cases crop up, where many of the requirements overlap

7 of 34

7

NDA

10 != 43

8 of 34

8

9 of 34

10 of 34

11 of 34

12 of 34

  • A graph model for encoding data collection protocols and activities
  • Uses Linked Data principles to:
    • provide persistent URLs
    • track provenance
    • harmonize prior to collection
  • A validation schema
  • An open collection of clinical and behavioral assessments
  • Managed through a Github project to track versions and releases
  • Allows a mechanism to create your own protocols and activities first
  • A VueJS-based user interface for using ReproSchema forms
  • Client-side application to preserve privacy
  • Allows server side backend to collect data
  • Python and JavaScript tools
  • Validate reproschema elements
  • Convert between different representations
    • from/to RedCap
    • Linked Data formats (e.g., JSON-LD, Turtle, N-Triples)

ReproSchema Tools

13 of 34

What can you do now

Use the ReproSchema library.

Convert RedCap activities or submit new ones to the library.

Create your own versioned data collection protocol on your own Github repo.

Collect data using a browser on computer, tablet, or smartphone.

Use Mindlogger, RedCap.

Use multilingual assessments.

Extend language support to existing assessments.

14 of 34

15 of 34

16 of 34

Technical dive

The Schema

The Validator

The Formats

The Infrastructure

Versioning/Persistence

17 of 34

Protocol

├── Activity

│ ├── Field

│ │ └── ResponseOption

│ └── Field

│ └── ResponseOption

│ ...

├── Activity

│ ...

...

ResponseActivity

├── Response

│ ...

├── Response

│ ...

18 of 34

Technical dive: The schema

Ontologies: CEDAR, SDO, SKOS, W3C-PROV, NIDM

Protocol

├── Activity

│ ├── Field

│ │ └── ResponseOption

│ └── Field

│ │ └── ResponseOption

│ ├── Activity

│ │ ├── Field

│ │ │ └── ResponseOption

│ │ ...

│ ...

├── Activity

│ ...

...

Schema classes/objects

  • Protocol
  • Activity
  • Field
  • AdditionalProperty
  • ResponseOption
  • Choice
  • ComputeSpecification
  • AdditionalNoteObj

  • ResponseActivity
  • Response
  • Participant
  • SoftwareAgent

19 of 34

20 of 34

Technical dive: The validator

  • ReproSchema Protocols, Activities, Fields, ResponseOptions, Responses
    • Written using JSON-LD to simplify human consumption
    • Corresponds to a valid RDF graph for machine interaction and query�
  • Uses SHACL shapes graph for validation
  • Validation tools:
    • PySHACL
    • ReproSchema
    • Future: JSON-schema

21 of 34

Technical dive: The formats

@prefix ex: <http://example.org/ns#> .

@prefix schema: <http://schema.org/> .

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ex:Bob

a schema:Person ;

schema:givenName "Robert" ;

schema:familyName "Junior" ;

schema:birthDate "1971-07-07"^^xsd:date ;

schema:deathDate "1968-09-10"^^xsd:date ;

schema:address ex:BobsAddress .

ex:BobsAddress

schema:streetAddress "1600 Amphitheatre Pkway" ;

schema:postalCode 9404 .

{

"@context": { "@vocab": "http://schema.org/" },

"@id": "http://example.org/ns#Bob",

"@type": "Person",

"givenName": "Robert",

"familyName": "Junior",

"birthDate": "1971-07-07",

"deathDate": "1968-09-10",

"address":

{

"@id": "http://example.org/ns#BobsAddress",

"streetAddress": "1600 Amphitheatre Pkway",

"postalCode": 9404

}

}

22 of 34

Technical dive: The formats

@prefix ex: <http://example.org/ns#> .

@prefix sdo: <http://schema.org/> .

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ex:Bob

a sdo:Person ;

sdo:givenName "Robert" ;

sdo:familyName "Junior" ;

sdo:birthDate "1971-07-07"^^xsd:date ;

sdo:deathDate "1968-09-10"^^xsd:date ;

sdo:address ex:BobsAddress .

ex:BobsAddress

sdo:streetAddress "1600 Amphitheatre Pkway" ;

sdo:postalCode 9404 .

{

"@context": { "@vocab": "http://schema.org/" },

"@id": "http://example.org/ns#Bob",

"@type": "Person",

"givenName": "Robert",

"familyName": "Junior",

"birthDate": "1971-07-07",

"deathDate": "1968-09-10",

"address":

{

"@id": "http://example.org/ns#BobsAddress",

"streetAddress": "1600 Amphitheatre Pkway",

"postalCode": 9404

}

}

23 of 34

Technical dive: The formats

ex:Bob

a sdo:Person ;

sdo:givenName "Robert" ;

sdo:familyName "Junior" ;

sdo:birthDate "1971-07-07"^^xsd:date ;

sdo:deathDate "1968-09-10"^^xsd:date ;

sdo:address ex:BobsAddress .

ex:BobsAddress

sdo:streetAddress "1600 Amphitheatre Pkway" ;

sdo:postalCode 9404 .

24 of 34

Example: Patient Health Questionnaire Schema

25 of 34

{

"@context": "https://raw.githubusercontent.com/ReproNim/reproschema/1.0.0-rc1/contexts/generic",

"@id": "PHQ9_schema",

"@type": "reproschema:Activity",

"prefLabel": "PHQ-9 Assessment",

"description": "PHQ-9 assessment schema",

"schemaVersion": "1.0.0-rc1",

"version": "0.0.1",

"citation": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1495268/",

"preamble": {

"en": "Over the last 2 weeks, how often have you been bothered by any of the following problems?",

"es": "Durante las últimas 2 semanas, ¿con qué frecuencia le han molestado los siguintes problemas?"

},

"compute": [

{

"variableName": "phq9_total_score",

"jsExpression": "phq9_1 + phq9_2 + phq9_3 + phq9_4 + phq9_5 + phq9_6 + phq9_7 + phq9_8 + phq9_9"

}

],

"ui": {

"inputType": "section",

"shuffle": false

...

26 of 34

Multilingual support

27 of 34

"order": [

"items/phq9_1",

...

"items/phq9_10"

],

"addProperties": [

{

"isAbout": "items/phq9_1",

"variableName": "phq9_1",

"valueRequired": true,

"isVis": true

},

...

{

"isAbout": "items/phq9_10",

"variableName": "phq9_10",

"isVis": "phq9_1 > 0 || phq9_2 > 0 || phq9_3 > 0 || phq9_4 > 0 || phq9_5 > 0 || phq9_6 > 0 || phq9_7 > 0 || phq9_8 > 0 || phq9_9 > 0"

},

{

"isAbout": "items/phq9_total_score",

"variableName": "phq9_total_score",

"isVis": false

}

]

}

}

28 of 34

{

"@context": "https://raw.githubusercontent.com/ReproNim/reproschema/1.0.0-rc1/contexts/generic",

"@type": "reproschema:ResponseActivity",

"@id": "uuid:efc5195a-734e-41f1-b1f8-51d17c23831a",�

"used": [

"https://raw.githubusercontent.com/sensein/covid19/master/activity/covid19/items/covid19_clinical_history",

"https://raw.githubusercontent.com/sensein/covid19/master/activity/covid19/covid19_schema",

"https://raw.githubusercontent.com/sensein/covid19/master/protocol/Covid19_schema"

],

"inLanguage": "en",

"startedAtTime": "2020-08-07T17:41:05.367Z",

"endedAtTime": "2020-08-07T17:41:10.249Z",�

"wasAssociatedWith": {

"version": "0.0.1",

"url": "https://sensein.github.io/covid19/",

"@id": "https://github.com/ReproNim/reproschema-ui"

},

"generated": "uuid:318ad01a-b4a7-4235-97b2-b5bd0574328d"

}

ResponseActivity

29 of 34

{

"@context": "https://raw.githubusercontent.com/ReproNim/reproschema/1.0.0-rc1/contexts/generic",

"@type": "reproschema:Response",

"@id": "uuid:318ad01a-b4a7-4235-97b2-b5bd0574328d",

"wasAttributedTo": {

"@id": "ceb1e03a-7ed0-4978-811e-7766cb8428fd",

"subject_id": "satra"

},

"isAbout": "https://raw.githubusercontent.com/sensein/covid19/master/activity/covid19/items/covid19_clinical_history",

"value": [

4,

5,

6

]

}

Response

30 of 34

Technical dive: The infrastructure

The Reproschema project is organized around several ReproNim Github repositories.

  • reproschema: Schema releases with context + validation spec
  • reproschema-library: The public assessment library
  • reproschema-ui: The user interface for data collection
  • reproschema-py: The Python library for validation, conversion
  • demo-protocol: An example protocol for demonstration

And one additional repo that is currently in my group's Github org

  • voice-backend: An API service to receive data from the UI

31 of 34

Accessed via Github http server�Maintains relations via relative paths

Study Protocols

User or organization repositories

Deployed via Continuous Integration to Github pages

Tested via Continuous Integration

Server side - Github

ReproNim Org

Lab, Institution, or Cloud Server

Client side

32 of 34

Limitations and challenges

  • The library currently has ~57 assessments
  • Cannot easily search across items and activities yet
    • Since it's a graph, can search via ReproLake or other graph db
  • No UI-based builder yet, although a simple version exists in MindLogger
  • Conversion from other formats may require custom code
  • How do you trust curation?
  • How does each item, activity and protocol connect to a clinical, psychological, and psychiatric space?
  • What processes are being studied?
  • What claims are being related to these?
  • Connections to other ontologies

33 of 34

How you can help

  • Curate the library
    • Fix/Refine/Validate existing assessments
    • Add new assessments
    • Add more languages
  • Try it out
    • Create your own protocol
  • Improve the converters
  • Improve the UI
    • Add more widgets
  • Contribute to related opensource projects
    • MindLogger
    • LORIS

34 of 34

Summary

  • Each item (a question), activity (a questionnaire), and protocol (a set of questionnaires) provides unique and persistent identifiers.
  • A open and extensible library of curated assessments.
  • Versions of a given questionnaire can be tracked
    • e.g., PHQ-8 is a custom derivative of Patient Health Questionnaire (PHQ-9).
  • Allows, supports, and tracks internationalization
    • e.g., the ABCD study requires Spanish and English forms.
  • Protocols, Activities, and Items connected using a Linked data approach
    • Can be queried using graph databases
    • Creates the ability to add contextual information to a questionnaire
      • e.g., PHQ-9 is about measuring symptoms of depression.
  • The schema focuses on information, not implementation.