1 of 9

The DDI Standards and Technology: Adapting to Change

Introduction and History

Arofan Gregory

17 October 2023

2 of 9

Before we begin…

  • If you are interested in knowing more about DDI, we have a series of webinars (like this one) on different topics
    • Slides and recording can be found at:

https://codata.org/initiatives/data-skills/ddi-training-webinars/

  • At the end of November, the European DDI User Conference (EDDI) will take place in Ljubliana, Slovenia.
    • This is a great place to learn all about DDI and engage with the community!

https://www.eddi-conferences.eu/

3 of 9

DDI and the Technology Landscape

  • The Data Documentation Initiative (DDI) is a metadata standard, focusing on social, behavioural, and economic research data, and official statistics
    • The fundamental information needed for data collection, production, management, dissemination, and use doesn’t change rapidly
    • The scope of DDI standards has grown over time, but the information payload has been relatively stable
  • DDI metadata is machine-readable and machine-actionable
    • The interface between the metadata model and the machine is dependent on technology
    • Technology has been a major driver of change in the DDI standards and products – DDI must stay current to be useful!

4 of 9

An Historical Perspective and the Future

  • DDI has been through several changes over time
    • DDI 1.*/2.*/Codebook
    • DDI 3.*/Lifecycle
    • RDF Vocabularies
    • DDI-CDI (upcoming)
  • Technology changes have been reflected as the DDI products have developed and matured
  • Now, the technology landscape is changing massively with the advent of AI
    • ChatGPT and similar tools are currently being hyped in the media
    • Other forms of AI are relevant and have been with us for some time
  • How much does this matter to DDI?

5 of 9

Technology and DDI 1.*/2.*/ Codebook

  • DDI started out as an XML Document Type Definition (DTD)
  • This was state-of-the-art at the time
  • XML allowed documents to be treated as structured sources of information
  • For DDI, this allowed both machine-readability and a degree of machine-actionability
    • Platform- and application-independent archival format
    • Transformation between stat packages

6 of 9

Technology and DDI 3.*/Lifecycle

  • DDI 3.0 was developed using W3C XML Schema (XSD)
    • Replaced DTDs for both DDI 3.0 and DDI 2.0/Codebook
    • This was best practice in the XML world at the time
  • Supported strong datatyping – not just “character” content
  • Provided support for object-oriented languages (XML Beans, etc.)
    • Primarily Java (JAXB)
    • Others also supported (C#, etc.)

7 of 9

DDI and RDF

  • The “Semantic Web” and the “Web of Linked Data” have slowly gained traction, especially in the academic community
  • Many important internet standards rely on RDF and related technologies, published by the W3C and elsewhere
  • DDI has stayed abreast of these developments:
    • DISCO – The DDI Discovery Vocabulary (not released)
    • XKOS – The Extended Knowledge Organization System (based on W3C SKOS)
    • DDI Controlled Vocabularies in SKOS/XKOS

8 of 9

DDI Cross-Domain Integration (DDI-CDI)

  • With the upcoming DDI-CDI specification this trend continues
    • Currently a “Candidate Release”
    • Based on a canonical UML Model (“model-driven”)
    • Provides an OWL description in Turtle and JSON-LD to support RDF implementations (as well as XML)
    • Designed for integration and use with with many RDF specifications (PROV-O, SKOS, XKOS, DCAT, etc.)
  • DDI-CDI is looking also at expressions in Python, ShEx, SHACL, and others
  • DDI Lifecycle will also have an RDF representation

9 of 9

But What About ChatGPT?

  • Machine learning is a huge new thing
    • Some say it threatens to end life as we know it (SkyNet is here!)
    • Some say it will replace everyone with chat-bots
    • The hype is (hopefully) overblown!
  • Some think it may actually have some practical application
    • The question is how it can be used, and what impact will it have on data collection, processing, management, and dissemination applications?
    • This question remains open
  • Today we will look at the current state of the art with metadata creation for DDI Codebook and DDI Lifecycle based on real examples
  • Then we will look at what the future might hold…