Published using Google Docs
Web Archive Ontology (SIOC+CDM)
Updated automatically every 5 minutes

Web Archive Ontology (SIOC+CDM)

John G. Breslin and Guangyuan Piao, Unit for Social Semantics, Insight Centre for Data Analytics, NUI Galway

Ontology Prototype

We have created a prototype ontology for web archives based on two existing ontologies: Semantically-Interlinked Online Communities (SIOC) and the Common Data Model (CDM).

SW Tender.jpg

Figure 1: Initial Prototype of Web Archive Ontology, Linking to SIOC and CDM[1]

In Figure 1, we give an initial prototype for a general web archive ontology, linked to concepts in the CDM, but allowing flexibility in terms of archiving works, media, web pages, etc. through the “Item” concept. Items are versioned and linked to each other, as well as to concepts appearing in the archived items themselves.

We have not shown the full CDM[2] for ease of display in this document, but rather some of the more commonly used concepts[3]. We can also map to other vocabulary terms shown in the last column of Table 1 below; some mappings and reused terms are shown in Figure 1.

Essentially, the top part of the model differentiates between the archive / storage mechanism for an item in an area (Container) on a website (Site), i.e. where it originally came from [source URL], who made it, when it was created / modified, when it was archived, the content stream, etc., and on the bottom, what the item actually is (for example, in terms of CDM, the single exemplar of the manifestation of an expression of a work).

Also, the agents who make the item and the work may differ (e.g. a bot may generate a HTML copy of a PDF publication written by Ms. Smith).

Relevant Public Ontologies

In Table 1, we list some relevant public ontologies and terms of interest. Some terms can be reused, and others can be mapped to for interoperability purposes.

Ontology Name

Overview

Why relevant?

What terms are useful?

FRBR

For describing functional requirements for bibliographic records.

To describe bibliographic  records.

Expression

Work

FRBRoo

Express the conceptualisation of FRBR with an object-oriented methodology instead of the entity-relationship methodology, as an alternative.

In general, FRBRoo “inherits” all concepts of CIDOC-CRM and harmonises with it.

ClassicalWork

LegalWork

ScholarlyWork

Publication

Expression

BIBFrame

For describing bibliographic descriptions, both on the Web and in the broader networked world.

To represent and exchange bibliographic data.

Work

Instance

Annotation

Authority

EDM

The Europeana Data Model models data in and supports functionality for Europeana, an internet portal that acts as an interface to millions of books, paintings, films, museum objects and archival records that have been digitised throughout Europe.

Complements FRBRoo with additional properties and classes.

incorporate

isDerivativeOf

WebResource

TimeSpan

Agent

Place

PhysicalThing

CIDOC-

CRM

For describing the implicit and explicit concepts and relationships used in the cultural heritage domain.

To describe cultural heritage information.

EndofExistence

Creation

Time-Span

EAC-CPF

Encoded Archival Context for Corporate Bodies, Persons and Families is used for encoding the names of creators of archival materials and related information.

Used closely in association with EAD to provide a formal method for recording the descriptions of record creators.

lastDateTimeVerified

Control

Identity

 

EU PO CDM

Ontology based on the FRBR model, for describing the relationships between resource types managed by the EU Publications Office and their views, according to the FRBR model.

To describe records.

Expression

Work

Manifestation

Agent

Subject

Item

OAI-ORE

Defines standards for the description and exchange of aggregations of Web resources.

To describe relationships among resources (also used in EDM).

aggregates

Aggregation

ResourceMap

EAD

Standard used for hierarchical descriptions of archival records.

Terms are designed to describe archival records.

audience

abbreviation

certainty

repositorycode

AcquisitionInformation

ArchivalDescription

WGS84 Geo

For describing information about spatially located things.

Terms can be used with the Place ontology for describing place information.

lat

long

Media

For describing media resources on the Web.

To describe media contents for web archiving.

compression

format

MediaType

Places

For describing places of geographic interest.

To describe place information for events, etc.

City

Country

Continent

Event

For describing events.

To describe specific event in content. Also can be used for representing events at an administrative level.

agent

product

place

Agent

Event

SKOS

A common data model for sharing and linking knowledge organisation systems.

To capture similarities among ontologies and makes the relationships explicit.

broader

related

semanticRelation

relatedMatch

Concept

Collection

SIOC

For describing social content.

Terms are general enough to be used for web archiving.

previous_version

next_version

earlier_version

later_version

latest_version

Item

Container

Site

embed_knowledge

Dublin Core

Provide a metadata vocabulary of “core” properties that is able to provide basic descriptive information about any kind of resource.

Fundamental terms used with other ontologies.

creator

date

description

identifier

language

publisher

LOC METS Profile

The Metadata Encoding and Transmission Standard (METS) is a metadata standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library. The METS profile expresses the requirements that a METS document must satisfy.

To describe and organise the components of a digital object.

controlled_

vocabularies

external_schema

DCAT and DCAT-AP

A specification based on the Data Catalogue vocabulary (DCAT) for describing public sector datasets in Europe. Its basic use case is to enable a cross-data portal search for data sets and make public sector data better searchable across borders and sectors.

Enable the exchange of description metadata between data portals.

downloadURL

accessURL

Distribution

Dataset

CatalogRecord

Formex

A format for the exchange of data between the Publication Office and its contractors. In particular (but not only), it defines the logical markup for documents, which are published in the different series of the Official Journal of the European Union.

Useful for annotating archived items as well for exchange purposes.

Archived

Annotation

FT

Note

ODP

Ontology describing the metadata vocabulary for the Open Data Portal of the European Union.

To describe dataset portals.

datasetType

datasetStatus

accrualPeriodicity

DatasetDocumentation

LOC PREMIS

Used to describe preservation metadata.

Applicable to archives.

ContentLocation

CreatingApplication

Dependency

VIAF

Virtual International Authority File is an international service designed to provide convenient access to the world's major name authority files (lists of names of people, organisations, places, etc. used by libraries). Enables switching of the displayed form of names to the preferred language of a web user.

Useful for linking to name authority files and helping to serve different language communities in Europe.

AuthorityAgency

NameAuthority

NameAuthorityCluster

Table 1: Relevant Ontologies and Terms


[1] Created using CMAP http://cmap.ihmc.us/ and using terms from DC, FOAF, SKOS, SIOC and CDM

[2] https://joinup.ec.europa.eu/sites/default/files/ckeditor_files/files/10_SEMIC2014_Van%20Gemert_Kuster_The%20role%20of%20Publications%20Office%20of%20the%20European%20Union%20in%20semantic%20web%20and%20standardisation%20activities(2).pdf 

[3] Many properties also have inverse properties; for clarity only one of these is shown here