1 of 15

Named

entities and LOD�

Around the Globe�in Eight Technologies

VeDPH Summer School 2022

Tiziana Mancinelli

authoritness and LOD

2 of 15

Named entities: what

Definition: The named entity concept involves all physical and real objects which can be designated with a specific name. A named entity recognition (NER) model detects entities in the text and classifies them (e.g., a person's name, a location name, an organization's name, a date).

Typology : persons, locations, organizations, stars . . (Tran, 2006 ; Ehrmann 2008)

Recognition and classification tasks (Nadeau and Sekine, 2007 ; Friburger and Maurel, 2004)

3 of 15

<person xml:id="Adam"> …..<persName>Adam</persName>

</person>

diex fist <persName ref="#Adam">adam</persName> le premier pere ne fu ou ques de

Reference is a fundamental semiotic and hermeneutical concept

  • We can talk about the real world using natural languages because we know that some types of word are closely associated with real, specific, objects
  • Proper names and technical terms are canonical examples of this kind of word

Annotating a named entity: a person

There are many vocabularies to annotate people that also provides several ways of marking up names and nominal expressions

4 of 15

Named entities: TEI/XML

Names entities appear in most texts.

Why does the TEI have a module to describe them?

Because an entity (person, place, organisation) might be known by many names or might be referred to by some other description entirely.

TeiHeader:

<person xml:id="Adam"> …..<persName>Adam</persName>

</person>

Text:

diex fist <persName ref="#Adam">adam</persName> le premier pere ne fu ou ques de

people

places

5 of 15

Named entities: TEI/XML

<place xml:id="Armenia">

<placeName>Graunt Ermenie</placeName>

</place>

people

places

6 of 15

Places: gazetteer

gazetteer is a geographical index or directory used in conjunction with a map or atlas.[1][2] It typically contains information concerning the geographical makeup, social statistics and physical features of a country, region, or continent. Content of a gazetteer can include a subject's location, dimensions of peaks and waterways, population, gross domestic product and literacy rate. This information is generally divided into topics with entries listed in alphabetical order. (Wikipedia)

people

places

7 of 15

Evolution of the Web

8 of 15

“Words”: Global Entities

  • each entity gets its globally unique ID, just like each webpage has its address
  • the web being open, IDs are not assigned by a central authority; everyone can create them, as the infrastructure grants their uniqueness
  • many entity “vocabularies” (ontologies) are available for use�or extension

9 of 15

The evolution of digital paradigm

Document Centric

Data Centric

10 of 15

What are Linked (open) data?

Linked data is a way of publishing structured data based on open web technologies and standards such as HTTP, RDF (Resource Description Framework) and URI (Uniform Resource Identifier). If linked data links open data, it is called linked open data (LOD).

Let’s link the named entities to online resources

<xenoData>

<rdf:RDF>

<rdf:Description tei:ref="#Adam" rdf:about="http://dbpedia.org/resource/Adam">

<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>

<rdfs:label xml:lang="en">Adam</rdfs:label>

</rdf:Description>

</rdf:RDF>

</xenoData>

11 of 15

What are Linked (open) data?

<?xml version="1.0"?>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"

xmlns:foaf="http://xmlns.com/foaf/0.1/"

xmlns:dbo="http://dbpedia.org/ontology/"

xmlns:dc="http://purl.org/dc/elements/1.1/"

xmlns:tei="http://www.tei-c.org/ns/1.0">

<rdf:Description tei:ref="#Adam" rdf:about="http://dbpedia.org/resource/Adam">

<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>

<rdfs:label xml:lang="en">Adam</rdfs:label>

</rdf:Description>

</rdf:RDF>

https://www.w3.org/RDF/Validator/

12 of 15

What are Linked (open) data?

  • Linked data is one of the technologies behind the so-called Semantic Web (a kind of global space of interconnected data with semantically qualified relationships) in which data, structured and linked together, build an ever-widening information lattice that software can read and interpret directly by extracting information through semantic queries.

  • Data and the relationships between them are described semantically through metadata and ontologies. In linking (or referencing), therefore, relationships ("links") are used that have a precise meaning and explain the type of connection between the two entities involved in the link. Linked open data is thus an elegant and effective way to solve problems of identity and provenance, semantics, integration and interoperability.

13 of 15

The DPpedia

DBpedia (from "DB" for "database") is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web.[1] DBpedia allows users to semantically query relationships and properties of Wikipedia resources, including links to other related datasets.[2]

14 of 15

The Power�of the Network

Marco Polo

person

explorer of Asia

Marco Polo�@it

Поло, Марко @ru

rdfs:label

Venice

Venezia�@it

1254

rdf:type

foaf:Person

dbr:Marco_Polo

dbr:Venice

dbc:Explorers_of_Asia

Venedig�@de

rdfs:label

Venetian lagoon

dbr:Venetian_Lagoon

dbr:Ferdinand_Magellan

Magellan

rdf:type

rdf:type

Giulia Lama

dbr:Giulia_Lama

Alpi Eagles

dbr:Alpi_Eagles

dbo:hubAirport

Anthony Quinn

dbr:Anthony_Quinn

rdf:type

rdfs:label

rdfs:label

dbo:birthDate

dbo:birthPlace

dbo:nearestCity

dbo:deathPlace

15 of 15

In conclusion - it is scary!

But FUN!