CEDAR tech doc

This document covers up-to-date CEDAR data and software resources

Created: August 13, 2014

Last modified: Jan 21, 2015

Data

DataDump

Source: https://github.com/CEDAR-project/DataDump

The source CEDAR dataset, containing the following formats:

Original raw Excel spreadsheets, under xls
Originals in ODS with markup, under xls-marked
Tables conversion to raw (i.e. non-harmonized) RDF Turtle, under raw-rdf
Tables conversion to harmonized RDF Turtle, under release
Mappings used to harmonize (in Excel format), under mappings
Mappings turned into rules as RDF annotations, under rules
Exportable/importable table markup, under marking

Links from CEDAR ACs to gemeentegeschiedenis.nl

Source: https://github.com/CEDAR-project/DataDump/blob/master/links/cedar2gg-links.ttl

Links between CEDAR municipalities (standardized as Amsterdamse Code URIs) and the same municipalities in gemeentegeschiedenis.nl (in RDF).

Links from CEDAR HISCOs to ICONCLASS

Source: https://github.com/CEDAR-project/DataDump/blob/master/links/gspread-ic-hisco.ttl

Links between CEDAR occupations (standardized as HISCO URIs) and approximately/exactly the same occupations in ICONCLASS (in RDF).

Links from DSS to CEDAR HISCOs

Source: https://github.com/CEDAR-project/DataDump/blob/master/links/Victor_datathon_links.ttl

Links between entities in the Dutch Ships and Sailors (DSS) dataset (containing historical records on ships, locations, captains, etc.) to CEDAR HISCO URIs.

Queries

Source: https://github.com/CEDAR-project/Queries

SPARQL queries that can be used against the data published by the project "CEDAR", in the YASQE interface at http://lod.cedar-project.nl/cedar/data.html

Homepage

Source: https://github.com/CEDAR-project/Homepage

CEDAR homepage served at http://lod.cedar-project.nl/cedar/

Vocab

Source: https://github.com/CEDAR-project/Vocab

RDF description of vocabularies used by several components in CEDAR, including:

HISCO terms
Marital status terms
Tablink terms

Software

Linked Edit Rules

Source: https://github.com/albertmeronyo/linked-edit-rules

Instance: http://www.linkededitrules.org/

Linked Edit Rules (LER) is a methodology to publish, link, combine and execute edit rules on the Web as Linked Data to verify consistency of statistical datasets. They can be applied to check consistency of the CEDAR data as well.

Stardog-R extension

Source: https://github.com/albertmeronyo/stardog-r

Instance: N/A

A Stardog extension that wraps R calls as custom SPARQL functions. R functionality can be used to query statistical data or write statistical constraints (such as Linked Edit Rules).

Integrator

Source: https://github.com/CEDAR-project/Integrator

Instance: N/A

The Integrator is the main platform from which the CEDAR pipeline is executed. All developed scripts are integrated and sequentially executed on the source data, following the planned workflow.

MESS

Source: https://github.com/CEDAR-project/mess

Instance: N/A

MESS (Microsoft Excel Style-extractor and Styler) is a tool to automatically extract styles and their locations from Excel spreadsheets, and either store them in standalone files or apply them to non-styled spreadsheet files. It serves the purpose of avoiding the task of re-styling the entire CEDAR dataset again if new versions of the tables come in the future (e.g. with corrected errors).

NOTICE: this repository is deprecated, and its functionality is now part of the Integrator. Consider a future release as a standalone application.

HISCO2CEDARLabels

Source: https://github.com/CEDAR-project/HISCO2CEDARLabels

Instance: N/A

This script enriches HISCO with CEDAR occupation labels. Since mappings between CEDAR occupations and HISCO codes already exist, this script reads such mappings to attach all known labels to each HISCO code URI.

CEDAR2gg

Source: https://github.com/CEDAR-project/CEDAR2gg

Instance: N/A

Script to generate links between CEDAR municipality URIs and gemeentegeschiedenis.nl municipality URIs (see previous section for a link to the produced links).

CEDAR Harmonize

Source: https://github.com/CEDAR-project/Harmonize

Instance: http://lod.cedar-project.nl:8082/harmonize

Interface for manual harmonization of the CEDAR dataset. It includes the following modules:

Harmonization vocabulary manager, an RDF Data Cube vocabulary manager to create new concept schemes attached to dimensions (i.e. standard variables and their possible standard values)
Harmonization layer manager, a manual harmonization tool. It lists all CEDAR tables and, once one is selected, all its variable non-standard values. The interface allows to map these non-standard values to standard ones, as they are defined in the previous module. For each value, a standard variable name (dimension) and a standard variable value (concept in the concept scheme) must be specified.
Query interface, a Web form that uses the previous standard vocabularies and harmonization mappings to homogeneously query the entire dataset, according to the specified variable names and values.

These modules can be easily detached in the future if they make sense as standalone applications.

NOTICE: part of this functionality overlaps with Integrator’s input system (ODS files). Consider to deprecate.

hald

Source: https://github.com/CEDAR-project/hald

Instance: http://lod.cedar-project.nl/hald/

hald is the prototype of a natural language interface that translates CEDAR related human queries from English to SPARQL, and displays the query results.

HISCO2RDF

Source: https://github.com/CEDAR-project/hisco2rdf

Instance: N/A

HISCO2RDF is an HTML scratcher that reads HISCO data from the HISCO website and creates an RDF SKOS taxonomy out of it. Since most efforts are now focused on translating the original SQL HISCO database to RDF (in collaboration with Richard Zijdeman), this code is expected to be either deprecated or totally changed.