CEDAR tech doc
This document covers up-to-date CEDAR data and software resources
Created: August 13, 2014
Last modified: Jan 21, 2015
DataDump
Source: https://github.com/CEDAR-project/DataDump
The source CEDAR dataset, containing the following formats:
Links from CEDAR ACs to gemeentegeschiedenis.nl
Source: https://github.com/CEDAR-project/DataDump/blob/master/links/cedar2gg-links.ttl
Links between CEDAR municipalities (standardized as Amsterdamse Code URIs) and the same municipalities in gemeentegeschiedenis.nl (in RDF).
Links from CEDAR HISCOs to ICONCLASS
Source: https://github.com/CEDAR-project/DataDump/blob/master/links/gspread-ic-hisco.ttl
Links between CEDAR occupations (standardized as HISCO URIs) and approximately/exactly the same occupations in ICONCLASS (in RDF).
Links from DSS to CEDAR HISCOs
Source: https://github.com/CEDAR-project/DataDump/blob/master/links/Victor_datathon_links.ttl
Links between entities in the Dutch Ships and Sailors (DSS) dataset (containing historical records on ships, locations, captains, etc.) to CEDAR HISCO URIs.
Queries
Source: https://github.com/CEDAR-project/Queries
SPARQL queries that can be used against the data published by the project "CEDAR", in the YASQE interface at http://lod.cedar-project.nl/cedar/data.html
Homepage
Source: https://github.com/CEDAR-project/Homepage
CEDAR homepage served at http://lod.cedar-project.nl/cedar/
Vocab
Source: https://github.com/CEDAR-project/Vocab
RDF description of vocabularies used by several components in CEDAR, including:
Linked Edit Rules
Source: https://github.com/albertmeronyo/linked-edit-rules
Instance: http://www.linkededitrules.org/
Linked Edit Rules (LER) is a methodology to publish, link, combine and execute edit rules on the Web as Linked Data to verify consistency of statistical datasets. They can be applied to check consistency of the CEDAR data as well.
Stardog-R extension
Source: https://github.com/albertmeronyo/stardog-r
Instance: N/A
A Stardog extension that wraps R calls as custom SPARQL functions. R functionality can be used to query statistical data or write statistical constraints (such as Linked Edit Rules).
Integrator
Source: https://github.com/CEDAR-project/Integrator
Instance: N/A
The Integrator is the main platform from which the CEDAR pipeline is executed. All developed scripts are integrated and sequentially executed on the source data, following the planned workflow.
MESS
Source: https://github.com/CEDAR-project/mess
Instance: N/A
MESS (Microsoft Excel Style-extractor and Styler) is a tool to automatically extract styles and their locations from Excel spreadsheets, and either store them in standalone files or apply them to non-styled spreadsheet files. It serves the purpose of avoiding the task of re-styling the entire CEDAR dataset again if new versions of the tables come in the future (e.g. with corrected errors).
NOTICE: this repository is deprecated, and its functionality is now part of the Integrator. Consider a future release as a standalone application.
HISCO2CEDARLabels
Source: https://github.com/CEDAR-project/HISCO2CEDARLabels
Instance: N/A
This script enriches HISCO with CEDAR occupation labels. Since mappings between CEDAR occupations and HISCO codes already exist, this script reads such mappings to attach all known labels to each HISCO code URI.
CEDAR2gg
Source: https://github.com/CEDAR-project/CEDAR2gg
Instance: N/A
Script to generate links between CEDAR municipality URIs and gemeentegeschiedenis.nl municipality URIs (see previous section for a link to the produced links).
CEDAR Harmonize
Source: https://github.com/CEDAR-project/Harmonize
Instance: http://lod.cedar-project.nl:8082/harmonize
Interface for manual harmonization of the CEDAR dataset. It includes the following modules:
These modules can be easily detached in the future if they make sense as standalone applications.
NOTICE: part of this functionality overlaps with Integrator’s input system (ODS files). Consider to deprecate.
hald
Source: https://github.com/CEDAR-project/hald
Instance: http://lod.cedar-project.nl/hald/
hald is the prototype of a natural language interface that translates CEDAR related human queries from English to SPARQL, and displays the query results.
HISCO2RDF
Source: https://github.com/CEDAR-project/hisco2rdf
Instance: N/A
HISCO2RDF is an HTML scratcher that reads HISCO data from the HISCO website and creates an RDF SKOS taxonomy out of it. Since most efforts are now focused on translating the original SQL HISCO database to RDF (in collaboration with Richard Zijdeman), this code is expected to be either deprecated or totally changed.