Ancient Space, Linked Data and Digital Research

Convenor: Dr Gabriel Bodard

This panel brings together four papers that take different approaches to the question of applying digital methods and technologies to the study of ancient geographical space. These methods range from automated identification of place-names and locations through pattern-matching and natural language processing, to recording and exploiting Semantic Web technologies and Geographic Information Science to enrich spacial datasets. These methods enhance research in several ways:

The papers and the discussion in this panel will introduce this variety of approaches to ancient space, and we hope will help to initiate new conversations and collaborations to improve our understanding of the political and physical geography of the Greco-Roman worlds.

Paper 1: Google Ancient Places (GAP): Discovering historic geographical entities in the Google Books corpus

Elton Barker (Open University), Leif Isaksen (University of Southampton), Eric C. Kansa (UC Berkeley)

Google has so far digitized over 12 million books in over 300 languages, most of which were previously available only in major libraries. The amount of data now available is enormous, which is very exciting but quite bewildering.

Google Ancient Places (GAP), a Google Digital Humanities Award recipient, is mining the Google Books corpus for classical material that has a geographic and historical basis. Traditionally much antiquarian literature has been limited to scholars at prestige institutions: facilitating access to large text repositories like Google Books will help open up disciplines such as History, Classics and Archaeology to anybody with an interest in the subject. Furthermore, references to ancient literature are often brief or fragmentary; aggregating short extracts can be of great value, and information on locating the full text is helpful for more traditional scholars.

Current services are extremely powerful in their extent but have a high rate of false positive and negative matches due to the problems of toponymic homonyms and synonyms (different places that share names, and single places with multiple names). We believe that leveraging services such as GeoNames[1] and Pleiades[2], along with metadata such as the location of other places in the text, should reduce such inaccuracy. By identifying spatial clustering at chapter, text and corpus scales we will be able to significantly reduce misidentifications in a fully-automated process.

While the proposed web-service will support many applications, this paper will explore the use of GAP in two specific research domains, archaeology (Open Context [3]) and Classics (HESTIA [4]). It will show how GAP can be a utility  both for the scholar whose research has a historical or geographical basis, and for the tourist, for instance, wanting to download information on an ancient location to their smartphone―a case of literally putting knowledge into people’s hands.

1. http://www.geonames.org/

2. http://pleiades.stoa.org/

3. http://opencontext.org

4. http://www.open.ac.uk/arts/hestia

Paper 2: Supporting Productive Queries for Research (SPQR): the Semantic Web and ancient datasets

Tobias Blanke, Gabriel Bodard, Mark Hedges (King’s College London)

We shall address the integration of heterogeneous datasets in the humanities, specifically data relating to Classical antiquity, using a Linked Data approach.

 

The research requirements addressed by the SPQR project (http://spqr.cerch.kcl.ac.uk/) are driven by the outcomes of the LaQuAT (Linking and Querying Ancient Texts) project, which concluded that a relational database model for integrating these datasets was insufficiently flexible. It is in precisely this regard that Semantic Web and Linked Data approaches have great potential, as they allow researchers more flexibly to formalise resources and the links between them, and to create, explore and query these linked resources.

 

Closely allied to Linked Data has been work on ontologies, providing agreed meanings for both links and the resources they connect. Ontologies can act as the semantic mediator between heterogeneous databases, enabling researchers to explore, understand and extend these datasets more productively and so improve the contributions that the data can make to their research.

 

In this paper we shall discuss the challenges raised by our three main datasets (Inscriptions of Aphrodisias and Tripolitania in EpiDoc XML; Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens in Filemaker Pro and XML; Project Volterra in Microsoft Access and MySQL), and show some of the techniques we are developing to link these to other ancient resources in the Semantic Web, including the Pleiades gazetteer of ancient places, archaeological reports and museum collections.

 

These datasets have features that make them a particular challenge and opportunity: they are “hand-crafted”, resulting from much individual effort; they are often incomplete/ambiguous, and may contain errors/contradictions; they contain much embedded, implicit semantics, which is difficult for researchers to use or comprehend. By building repurposable tools for exploiting the Semantic Web, we hope to make these rich resources available both to non-specialist users and to machine processing.

Paper 3: Connecting Historical Authorities with Linked data, Contexts and Entities (Chalice)

Jo Walsh (University of Edinburgh)

Chalice is a project to mine text—or rather mine structure—from volumes of the English Place Name Survey, and extract entities to build a historic gazetteer with time-range information from documentary evidence stretching back to Anglo-Saxon charters. EPNS, a unique 80-volume work of placename scholarship started in 1925, draws on many different historical sources and combines years of collaborative scholarship, to provide unparalleled insight into change in placenames over time. The Chalice gazetteer will be published as Linked Data, searchable through EDINA's Unlock Places gazetteer cross-search service. We shall establish links to common sources of placename reference on the Linked Data Web (Geonames, the Ordnance Survey open data ontologies, OpenStreetMap), and thus make links to shapes as well as points describing 'the same place'. We shall contribute new placename data to Geonames.org and give mappings between placenames and links to Sameas.org.

The state of the art in Geographic Information Retrieval still largely involves clusters of points; we hope to move forward by using mereological relations between things in space (and time) to reason about their state (a containment relationship between two objects, or two classes in a schema, tells us more than simple proximity plus metadata). Historic coverage in the gazetteer will improve the quality of future historic text-mining work; more placenames will be located and the likely locations of others suggested with less difference, creating less work. As part of Chalice, use cases for relevant historical database projects are being developed/ The paper will present these, which will demonstrate in concrete terms how the Chalice gazetter will support scholarship: the paper will focus particularly on how a linked data gazetter can support an RDF-based geographic data service such as Pleaides, which supports research on the classical world.

Paper 4: Mapping Roman Inscriptions from Britain

Valentina Asciutti (King’s College London)

With this paper we intend to combine epigraphic, iconographical and geographical data in order to fully analyze the metrical texts from Britannia, trace their history of discovery and pose questions relating to the status of literacy in Roman Britain.

Despite its paucity, the inscribed poems from Britannia show some interesting features, and vary a lot from each other in terms of findspot, date, type and metre. Our analysis aims to trace the history of discovery of the texts showing how, where and why they have moved through time.  The result is a comprehensive study of the texts from the moment they were discovered until the present, illustrated by a set of maps that show the different stages and locations of the texts throughout time. Thus we analyze the texts as well as “draw” the history of their discovery through the creation of maps. The texts are geo-tagged by linking the tables containing the transcriptions to an output table of the GeoNames database (http://www.geonames.org/). This automatically ascribes both a unique ID to each entry, and also latitude/longitude coordinates. The resulting relational dataset is easily exportable to KML, where further visualizations and (non-semantic or linguistic) analysis is possible.  

While geo-referencing the inscriptions, we came across several problems (e.g. non-unique place names, different findspots in different accounts, not very precise location). While the process of geo-referencing cannot itself add to knowledge about an inscription's location, it can nonetheless formally quantify what we do and do not know. Thus, when more detailed geographic information becomes available—for example from a GPS reading—there exists an established formal framework for linking it to the wider epigraphic corpus.

This paper outlines some high level research problems arising from epigraphic research on Roman Britain, and suggests how a richer understanding of this material can be gained by geographically tagging and representing it.