Published using Google Docs
CEDAR tech doc 2014
Updated automatically every 5 minutes

CEDAR tech doc

This document covers up-to-date CEDAR data and software resources

Created: August 13, 2014

Last modified: Jan 21, 2015

Data

DataDump 

Source: https://github.com/CEDAR-project/DataDump

The source CEDAR dataset, containing the following formats:

Links from CEDAR ACs to gemeentegeschiedenis.nl

Source: https://github.com/CEDAR-project/DataDump/blob/master/links/cedar2gg-links.ttl

Links between CEDAR municipalities (standardized as Amsterdamse Code URIs) and the same municipalities in gemeentegeschiedenis.nl (in RDF).

Links from CEDAR HISCOs to ICONCLASS

Source: https://github.com/CEDAR-project/DataDump/blob/master/links/gspread-ic-hisco.ttl

Links between CEDAR occupations (standardized as HISCO URIs) and approximately/exactly the same occupations in ICONCLASS (in RDF).

Links from DSS to CEDAR HISCOs

Source: https://github.com/CEDAR-project/DataDump/blob/master/links/Victor_datathon_links.ttl

Links between entities in the Dutch Ships and Sailors (DSS) dataset (containing historical records on ships, locations, captains, etc.) to CEDAR HISCO URIs.

Queries

Source: https://github.com/CEDAR-project/Queries 

SPARQL queries that can be used against the data published by the project "CEDAR", in the YASQE interface at http://lod.cedar-project.nl/cedar/data.html

Homepage

Source: https://github.com/CEDAR-project/Homepage 

CEDAR homepage served at http://lod.cedar-project.nl/cedar/ 

Vocab

Source: https://github.com/CEDAR-project/Vocab 

RDF description of vocabularies used by several components in CEDAR, including:

Software

Linked Edit Rules

Source: https://github.com/albertmeronyo/linked-edit-rules 

Instance: http://www.linkededitrules.org/

Linked Edit Rules (LER) is a methodology to publish, link, combine and execute edit rules on the Web as Linked Data to verify consistency of statistical datasets. They can be applied to check consistency of the CEDAR data as well.

Stardog-R extension

Source: https://github.com/albertmeronyo/stardog-r 

Instance: N/A

A Stardog extension that wraps R calls as custom SPARQL functions. R functionality can be used to query statistical data or write statistical constraints (such as Linked Edit Rules).

Integrator

Source: https://github.com/CEDAR-project/Integrator

Instance: N/A

The Integrator is the main platform from which the CEDAR pipeline is executed. All developed scripts are integrated and sequentially executed on the source data, following the planned workflow.

MESS

Source: https://github.com/CEDAR-project/mess

Instance: N/A

MESS (Microsoft Excel Style-extractor and Styler) is a tool to automatically extract styles and their locations from Excel spreadsheets, and either store them in standalone files or apply them to non-styled spreadsheet files. It serves the purpose of avoiding the task of re-styling the entire CEDAR dataset again if new versions of the tables come in the future (e.g. with corrected errors).

NOTICE: this repository is deprecated, and its functionality is now part of the Integrator. Consider a future release as a standalone application. 

HISCO2CEDARLabels

Source: https://github.com/CEDAR-project/HISCO2CEDARLabels

Instance: N/A

This script enriches HISCO with CEDAR occupation labels. Since mappings between CEDAR occupations and HISCO codes already exist, this script reads such mappings to attach all known labels to each HISCO code URI.

CEDAR2gg

Source: https://github.com/CEDAR-project/CEDAR2gg

Instance: N/A

Script to generate links between CEDAR municipality URIs and gemeentegeschiedenis.nl municipality URIs (see previous section for a link to the produced links).

CEDAR Harmonize

Source: https://github.com/CEDAR-project/Harmonize

Instance: http://lod.cedar-project.nl:8082/harmonize

Interface for manual harmonization of the CEDAR dataset. It includes the following modules:

These modules can be easily detached in the future if they make sense as standalone applications.

NOTICE: part of this functionality overlaps with Integrator’s input system (ODS files). Consider to deprecate.

hald

Source: https://github.com/CEDAR-project/hald

Instance: http://lod.cedar-project.nl/hald/

hald is the prototype of a natural language interface that translates CEDAR related human queries from English to SPARQL, and displays the query results.

HISCO2RDF

Source: https://github.com/CEDAR-project/hisco2rdf

Instance: N/A

HISCO2RDF is an HTML scratcher that reads HISCO data from the HISCO website and creates an RDF SKOS taxonomy out of it. Since most efforts are now focused on translating the original SQL HISCO database to RDF (in collaboration with Richard Zijdeman), this code is expected to be either deprecated or totally changed.