1 of 12

DDI-CDI and Other Standards

2 of 12

Overall goal

To define general patterns for translating between different data models (SDMX, schema.org, etc) using DDI-CDI as the “link” between those standards

3 of 12

Making dataset “integration-ready”

  • Describe source datasets using the DDI-CDI vocabulary
  • This requires mapping specific objects in the source dataset to individuals of specific classes in the DDI-CDI vocabulary. 
    • The mapping connects the main “entities” and “relations” of the source model (e.g., SDMX artifacts or schema.org classes or properties) to the relevant subset of DDI-CDI classes and properties in the target model
  • Accompanying the DDI-CDI data descriptions, the dataset is converted into a wide-form tabular array (with columns corresponding to each dimension, measure, and attribute component), serialized as csv. 

4 of 12

Producing the target integrated dataset

  • The target data structure is also specified using the same DDI-DCI profile.
  • Mappings between source and target represented variables
  • Mappings between source and target codes for each variable whose value domain is given by a code list.
  • Automating data integration (python scripts)

5 of 12

DDI-CDI Profile

6 of 12

Example 1: Global SDG Indicators� From SDMX to DDI-CDI

  • Source SDMX artifacts describing the global SDG dataset available from the SDMX Global registry.
    1. Concept Scheme
    2. Data Flow,
    3. Data structure definition
    4. Code lists.
  • Translate this SDMX dataset description into DDI-CDI (Output: SDG Indicators schema in .ttl )

7 of 12

The SDMX information model

CodeList

Data Structure Definition

TimeDimension

Attribute

Measure

Dimension

Group

Code

Concept

Data Flow

Concept scheme

8 of 12

SDMX-to-DDI-CDI “connector” mapping

9 of 12

Example 2: Global SDG Indicators�

From Data Commons schema to DDI-CDI

10 of 12

Data Commons� schema

11 of 12

MCF files describing the global SDG dataset have been produced as part of the UN Data modernization project

    • dcs:UNDATA_SDG_Series
      • dcs:StatisticalVariable
      • schema:Enumeration
    • dcs:UNDATA_SDG_UnitOfMeasure
    • dcs:UNDATA_SDG_MeasurementMethodEnum

STATISTICAL VARIABLE

TIME_PERIOD

OBSERVATION_PERIOD

OBS_VALUE

UNIT_MEASURE

MEASUREMENT_METHOD

GEOGRAPHY

dcs:undata/sdg/SI_POV_DAY1.AGE--Y0T14

2016

P1Y

7.2654

dcs:UNDATA_SDG_PT

dcs:UNDATA_SDG_G_G

country/MEX

dcs:undata/sdg/SI_POV_DAY1.AGE--Y0T14

2018

P1Y

6.41098

dcs:UNDATA_SDG_PT

dcs:UNDATA_SDG_G_G

country/MEX

dcs:undata/sdg/SI_POV_DAY1.AGE--Y0T14

2020

P1Y

7.28747

dcs:UNDATA_SDG_PT

dcs:UNDATA_SDG_G_G

country/MEX

dcs:undata/sdg/SI_POV_DAY1.AGE--Y0T14

2022

P1Y

4.55222

dcs:UNDATA_SDG_PT

dcs:UNDATA_SDG_G_G

country/MEX

12 of 12

Node: dcid:undata/sdg/SI_POV_DAY1.AGE--Y0T14

typeOf: dcs:StatisticalVariable

name: "Proportion of population below international poverty line [under 15 years old]“

measuredProperty: dcs:value

populationType: dcs:UNDATA_SDG_SI_POV_DAY1

age: dcs:UNDATA_SDG_AgeEnum_Y0T14

footnote: "Includes data from the following sources: Poverty and Inequality Portal, World Bank"

Node: dcid:UNDATA_SDG_AgeEnum

typeOf: schema:Class

subClassOf: schema:Enumeration

name: "Age Enumeration"

Node: dcid:UNDATA_SDG_AgeEnum_Y0T14

typeOf: dcs:UNDATA_SDG_AgeEnum

name: "under 15 years old"

Node: dcid:UNDATA_SDG_SI_POV_DAY1

typeOf: dcs:UNDATA_SDG_Series

name: "Proportion of population below international poverty line“

source: "https://unstats.un.org/sdgs/dataportal/database"

Code List