CCDH Pilot Demonstration
CENTER for �CANCER DATA HARMONIZATION
OUTCOMES
PROCESS
GOALS
Outline
01
02
03
Highlight the value of harmonization & demonstrate the use of CRDCH.
Exemplar data package, workflow diagram, and�documentation.
Harmonize data models and value sets for demonstrator queries.
The role of CCDH in the wider CRDC environment
Understanding the Problem
Each node models �things differently
Each node uses different values
For example:
For example, one node encodes race like this:
not reported
white
american indian or alaska native
black or african american
While another does it like this:
not allowed to collect
unknown
white
native hawaiian or other pacific islander
american indian or alaska native
asian
other
black or african american
The DST metadata WG aimed to identify the most needed elements for harmonization.
PILOT DEMONSTRATION
Illustrate what CRDCH-based harmonization looks like and the value it affords, by providing a set of concrete and well-documented examples of harmonized data records, framed around two demonstrator use cases.
Goals
Pilot Workflow
STEP 1
Define demonstrator use cases and queries
STEP 2
Select 'high value' elements to include in the data examples
STEP 3
Identified source data from different CRDC Nodes that meet query criteria
STEP 4
Manually transform/harmonize source data, and validate against CRDCH JSON Schema
STEP 5
Document common modeling and transform patterns in the data examples
D1: Cross-Node Specimen Data Retrieval
D2: Aggregation of Case Data Across Nodes
Step 1: Define Demonstrator Use Cases
STEP 1
Show harmonized data records returned by a specimen query across three CRDC Nodes (GDC, PDC, ICDC)
Use Case: Researcher looking for specimens to compare molecular characteristics of primary vs lung metastases of Osteosarcoma tumors.
Query: "Find frozen Osteosarcoma tumor specimens >30 mg that are lung metastases from patients with Stage 4 disease"
Show data for the same case across three CRDC Nodes (GDC, PDC, IDC) merged into a single harmonized record
Use Case: Researcher looking to assemble a cohort for a retrospective analysis, using Case data distributed across different systems .
Query: "Find patients in the TCGA-OV project who are 50 years of age or older and who have Stage IIIC ovarian cancer"
Prioritize data elements that:
Step 2: Select 'High Value' Elements to Include
STEP 2
Included Elements from GDC.Sample | |||
Data Element | Query Target | DST Report | Illustrative or of Interest |
sample_id | | x | |
sample_submitter_id | | | x |
project_id | | | x |
case_id | | | x |
case_submitter_id | | | x |
sample_type | | x | |
tissue_type | | x | |
tumor_code | x | x | |
tumor_descriptor | x | | |
current_weight | x | | |
days_to_collection | | x* | |
initial_weight | | | x |
biospecimen_anatomic_site | x | x | |
time_between_excision_and_ freezing | | | x |
preservation_method | x | | |
freezing_method | | | x |
* Asterisk indicates that this element is not in the current DST report, but will be included once mappings to node attributes are updated
Different approaches used for the two Demonstrators:
Step 3: Identify or Generate Source Data Records
STEP 3
GDC.Sample
PDC.Sample
ICDC.Sample
Demonstrator 1 ‘Aggregate’ Sample Records from GDC, PDC, and ICDC. Full source data documented here.
Manually generated YAML (with JSON-translation available)
Step 4: Transform / Harmonize Source Data
STEP 4
Excerpt from the harmonized GDC Specimen example showing how specimen weight is represented
Element-by-element annotation of each transformation performed on source data
Step 5: Document Modeling and Transform Patterns
STEP 5
Harmonization Allows Cross-Node Querying
"Find frozen Osteosarcoma specimens >30 mg that are lung metastases from patients with Stage 4 disease"
Source GDC Data | tumor_code | tumor_ descriptor | current_weight | biospecimen_ anatomic_site | preservation_ method | ajcc_pathologic_ stage |
Osteosarcoma (OS) | Metastatic | 45 | Lung | Frozen | Stage IVA | |
| | | | | | |
Source ICDC Data | specific_sample_pathology | tumor_sample_ origin | n/a (not captured) | sample_site | sample_ preservation | stage_of_disease |
Osteosarcoma | Metastatic | n/a | Lung | Snap Frozen | IV | |
| | | | | | |
CRDCH Data | Specimen. specific_sample_pathology | Specimen. tumor_status_at_ collection | Specimen. quantity_ measure | SpecimenCreationActivity. collection_site | SpecimenCreationActivity. activity_type | Diagnosis. stage |
GDC | C9145 (Osteosarcoma) | C3261 (Metastatic) | 45 milligrams | C12468 (Lung) | C178955 (Freezing) | C27979 (Stage IVA) |
ICDC | C9145 (Osteosarcoma) | C3261 (Metastatic) | n/a | C12468 (Lung) | CC63521 (Quick Freeze) | C27971 (Stage IV) |
Value Harmonization:
Harmonization Allows Cross-Node Querying
"Find frozen Osteosarcoma specimens >30 mg that are lung metastases from patients with Stage 4 disease"
# type = Specimen
id: "f2f:05f1574e-2a28-50bc-bdc1-e4c6dee92fd1"
specific_tissue_pathology:
coding:
- code: C9145
label: Osteosarcoma
system: http://ncithesaurus.nci.nih.gov
tumor_status_at_collection: # type = CodeableConcept
coding: # type = Coding
- code: C3261
label: Metastatic Neoplasm
system: http://ncithesaurus.nci.nih.gov
quantity_measure: # type = QuantityMeasureObservation
- observation_type: # type = CodeableConcept
coding: # type = Coding
- code: C25208
label: Weight
system: http://ncithesaurus.nci.nih.gov
value_quantity: # type = Quantity
value_decimal: 45
unit: # type = CodeableConcept
coding: # type = Coding
- code: C28253
label: Milligram
system: http://ncithesaurus.nci.nih.gov
creation_activity: # type = SpecimenCreationActivity
collection_site: # type = BodySite
site: # type = CodeableConcept
coding: # type = Coding
- code: C12468
label: Lung
system: http://ncithesaurus.nci.nih.gov
processing_activity: # type = SpecimenProcessingActivity
- activity_type: # type = CodeableConcept
coding: # type = Coding
- code: C48160
label: Freezing
system: http://ncithesaurus.nci.nih.gov
Harmonized GDC Sample Record
# type = Specimen
id: "f2f:ecc2bf49-204d-11e9-b7f8-0a80fada099c"
specific_tissue_pathology:
coding:
- code: C9145
label: Osteosarcoma
system: http://ncithesaurus.nci.nih.gov
tumor_status_at_collection: # type = CodeableConcept
coding: # type = Coding
- code: C3261
label: Metastatic Neoplasm
system: http://ncithesaurus.nci.nih.gov
# Sample weight is not captured by ICDC
creation_activity: # type = SpecimenCreationActivity
collection_site: # type = BodySite
site: # type = CodeableConcept
coding: # type = Coding
- code: C12468
label: Lung
system: http://ncithesaurus.nci.nih.gov
processing_activity: # type = SpecimenProcessingActivity
- activity_type: # type = CodeableConcept
coding: # type = Coding
- code: CC63521
label: Quick Freeze
system: http://ncithesaurus.nci.nih.gov
Harmonized ICDC Sample Record
Structural Harmonization:
Canonical Terminology Recommendations
Terminological Recommendations
Demonstrator 1
Demonstrator 2
CRDCH Field Name | Recommended Terminologies |
Subject.sex | HL7 Gender Harmony Project |
Subject.race | OMB / USCDI |
Subject.vital_status | LOINC |
Subject.ethnicity | OMB / USCDI |
Diagnosis.morphology | ICD-O-3, MONDO (NCIT) �Likely needs further requirements analysis |
Diagnosis.primary_site | Uberon |
Diagnosis.stage[CancerStageObservationSet.observation[CancerStageObservation.valueEnum]] | Harmonize to AJCC, use NCIT codes |
Diagnosis.condition | ICD10-CM, MONDO (NCIT)�Likely needs further requirements analysis |
CRDCH Field Name | Recommended Terminologies |
Specimen.general_tissue_morphology | NCIT |
Specimen.source_material_type | Mixed bag of concepts, recommend refactor |
Specimen.specific_tissue_morphology | ICD-O-3, MONDO (NCIT). �Likely needs further requirements analysis |
Specimen.tumor_status_at_collection | NCIT |
SpecimenCreationActivity.collection_site[BodySite.site] | Uberon, Snomed |
SpecimenProcessingActivity.method_type | Ontology of Biomedical investigations (OBI), NCIT |
Diagnosis.condition | ICD 10-CM, MONDO (NCIT) Likely needs further requirements analysis |
Diagnosis.stage[CancerStageObservationSet.observation[CancerStageObservation.valueEnum]] | Harmonize to AJCC, use NCIT for AJCC stage terms |
Diagnosis.stage[CancerStageObservationSet.observation[CancerStageObservation.method_type]] | NCIT |
Diagnosis.primary_site | Uberon, Snomed |
Quantity.unit (used in many CRDC-H elements) | UOM (Units of Measurement Ontology) |
SpecimenProcessingActivity.method_type | Ontology of Biomedical investigations (OBI), NCIT |
Lessons Learned
Data Examples
Terminologies
Discussion Questions for Breakouts
Pilot Deliverables
THANKS
CRDC Data Nodes and Collaborators: CDA, CDS, ICDC, IDC, GDC, PDC, DCF.
Center for Biomedical Informatics & Information Technology: Gilberto Fragoso, Robinette Renner, Sherri de Coronado.
Frederick National Laboratory for Cancer Research: Todd Pihl, Mark Jensen, Resham Kulkarni.
Samvit Solutions: Smita Hastak, Wendy Ver Hoef.
CENTER for �CANCER DATA HARMONIZATION
CCDH Team:
JHU: Christopher Chute (PI), Davera Gabriel, Dazhi Jiao, Harold Solbrig, Joe Flack, Tricia Francis
LBL: Chris Mungall (PI), Mark Miller, Nomi Harris, Sujay Patil, William Duncan
RENCI: James Balhoff (PI), Gaurav Vaidya
U Chicago: Samuel Volchenboum (PI), Brian Furner, Debra Venckus, Jooho Lee, Kathryn Blumhardt
U Colorado: Melissa Haendel (PI), Anne Thessen, Julie McMurry, Matthew Brush, Monica Munoz-Torres, Nicole Vasilevsky, Shahim Essaid
CREDITS: This presentation template was created by Slidesgo, including icons by Flaticon, and infographics by Freepik