1 of 27

Statistical Classification management with DDI-L

Kaia Kulla

Statistics Estonia

2 of 27

Outline

  • Background
    • Neuchâtel Model -> GSIM –> DDI-L
  • Object types in Classification Model and its properties in DDI-L (consistent with GSIM), examples
    • Classification Family and Series
    • Statistical Classification
    • Classification Item, Level
    • Classification Index and Index Entry
    • Correspondence Table, and Map
  • Summary: Advantage of DDI-L

2

3 of 27

Background: Classification in DDI

  • Neuchâtel Terminology Model: Classification Database object types and their attributes v 2.1 (2004; revised 2013)
  • GSIM Statistical Classification Model v1.1 (2013)
    • GSIM - Generic Statistical Information Model
      • The first internationally endorsed reference framework for statistical information
  • DDI 3.2 Code List and Copenhagen Mapping based on GSIM, (2014)
  • DDI 3.3 Statistical Classification and related objects (2020)
    • Classification management based on GSIM / Neuchâtel Model

3

4 of 27

GSIM Classification Model

4

5 of 27

Classification Family

  • A Classification Family is a group of Classification Series related from a particular point of view.
  • The Classification Family is related by being based on a common Concept (e.g. economic activity).

5

6 of 27

Classification Family in DDI-L, example

6

Classification Family –>

Classification Series

is reused

7 of 27

Classification Series

  •  A Classification Series is an ensemble of one or more Statistical Classifications, based on the same concept, and related to each other as versions or updates.
  • Typically, these Statistical Classifications have the same name.

7

8 of 27

Classification Series in DDI-L, example

8

Classification Series –>

Unit Types classified

Current Classification

Statistical Classifications

Owner

are reused

9 of 27

Statistical Classification

  • The Statistical Classification is exhaustive and structured set of mutually exclusive and well-described categories, often presented in a hierarchy that is reflected by the numeric or alphabetical codes assigned to them, used to standardize concepts and compile statistical data.

[Eurostat’s Concepts and Definitions Database]

  • A Statistical Classification is a set of categories which may be assigned to one or more variables registered in statistical surveys or administrative files, and used in the data collection, production and dissemination of statistics.

9

10 of 27

Types of Statistical Classification

  • A Statistical Classification may be
    • versionable
      • valid from a particular date for a period that may or may not be specified
    • floating
      • a validity period should be defined for all Classification Items which will allow the display of the item structure and content at different points of time

10

11 of 27

Structure of the Classification

  • A Statistical Classification may have
    • a flat, linear structure or
    • may be hierarchically structured, such that �all categories at lower Levels are �sub-categories of categories at �the next Level up.

  • Categories in Statistical Classifications �are represented in the information �model as Classification Items.

11

12 of 27

Statistical Classification in DDI-L, example

12

Statistical Classification –>

Maintenance Units

Contact Person(s)

Publications

Classification Indexes

Predecessor

Successor

Derived From

Variant of

Levels

Classification Items

are reused

13 of 27

Classification Item

  • A Classification Item represents a Category at a certain Level within a Statistical Classification.
  • It defines the content and the borders of the category.
  • An object/unit can be classified to one and only one Classification Item at each Level of a Statistical Classification.

13

14 of 27

Classification Item in DDI-L, example

14

Classification Item –>

Parent

Defining Concept

Excludes

Successor

are reused

15 of 27

Level

  • Statistical Classification has a structure which is composed of one or several Levels.
  • A Level often is associated with a Concept, which defines it.
  • In a hierarchical classification the Classification Items of each Level but the highest are aggregated to the nearest higher Level.
  • A linear classification has only one Level.

15

16 of 27

Level in DDI-L, example

16

Level –>

Defining Concept

is reused

17 of 27

Classification Index

  • Classification Index is an ordered list (alphabetical, in code order etc.) of Classification Index Entries.
  • Classification Index can relate to one particular or to several Statistical Classifications.
  • Classification Index shows the relationship between text found in statistical data sources (responses to survey questionnaires, administrative records) and one or more Statistical Classifications
  • Classification Index may be used to assign the codes for Classification Items to observations in statistical collections. 

17

18 of 27

Classification Index in DDI-L, example

18

Classification Index –>

Maintenance Unit

Contact Person(s)

Publications

are reused

19 of 27

Classification Index Entry

  • Classification Index Entry is a word or a short text (e.g. the name of a locality, an economic activity or an occupational title) describing a type of object/unit or object property to which a Classification Item applies, together with the code of the corresponding Classification Item.
  • Each Classification Index Entry typically refers to one item of the Statistical Classification.
  • Although a Classification Index Entry may be associated with a Classification Item at any Level of a Statistical ClassificationClassification Index Entries are normally associated with items at the lowest Level.

19

20 of 27

Classification Index Entry in DDI-L, example

20

Classification Index Entry –>

Classification Item

is reused

21 of 27

Correspondence Table

  • Correspondence Table expresses the relationship between two Statistical Classifications.
  • These are typically:
    • two versions from the same Classification Series
    • Statistical Classifications from different Classification Series;
    • a variant and the version on which it is based; or,
    • different versions of a variant.
  • In the first and last examples, the Correspondence Table facilitates comparability over time.
  • Correspondence relationships are shown in both directions.

21

22 of 27

Correspondence Table in DDI-L, example

22

Correspondence Table –>

Maintenance Units

Contact Person(s)

Publications

Owners

Source Classification

Target Classification

Source Level

Target Level

are reused

23 of 27

Map

  • Map is an expression of the relation between a Classification Item in a source Statistical Classification and a corresponding Classification Item in the target Statistical Classification.
  • The Map should specify whether the relationship between the two Classification Items is partial or complete.
  • Depending on the relationship type of the Correspondence Table, there may be several Maps for a single source or target item.

23

24 of 27

Mapping in DDI-L, example

24

Map –>

Source Item

Target Item

are reused

25 of 27

Summary

25

26 of 27

Advantages of DDI-L

26

Time saving

Quality control

Reusability

Consistency

Life-cycle tracking

Provenance

27 of 27

References to the material used

27