TREC Entity 2011 Guidelines

(v1.15, 2011-06-10)

Overview

The overall aim of the Entity track is to evaluate entity-related searches on Web data. The third edition of the track features two main tasks and a pilot task; the tasks are summarized in the table below. Changes to last year’s edition are highlighted in the description of the corresponding task.

Task

Input

Output

Collection

(REF) Related Entity Finding (main)

Entity (name and HP)

Target type

Narrative

Entity homepages (ClueWeb09 docIDs)

ClueWeb09 English

(REF-LOD) Related Entity Finding LOD (pilot)

Entity (name and HP)

Target type

Narrative

Entity URIs (from Sindice dataset)

Sindice-2011 dataset

ClueWeb09 English

(ELC) Entity List Completion (main)

Entity (name and HP)

Target type

Narrative

Example entities (name and HP)

Entity URIs (from Sindice dataset)

Sindice-2011 dataset

ClueWeb09 English

Overview of Entity 2011 tasks.

Main task 1: Related Entity Finding

The problem of related entity finding (REF) is defined as follows:

Given an input entity, by its name and homepage, the type of the target entity, as well as the nature of their relation, described in free text, find related entities that are of target type, standing in the required relation to the input entity.

Input

For each request (query) the following information is provided:

Example input

An example information need, “find recording companies that now sell the Kingston Trio’s songs” is formulated as follows:

<query>

   <num>23</num>

   <entity_name>The Kingston Trio</entity_name>

   <entity_URL>clueweb09-en0009-81-29533</entity_URL>

   <target_entity>recording company</target_entity>

   <narrative>What recording companies now sell the Kingston Trio's

   songs? </narrative>

</query>

Output

Submission format

Each answer record must have the following format:

topicID Q0 docno rank score runID supportDoc entityName

where

Example output

23 Q0 clueweb09-en0003-03-28260 1 0.98 exampleRun clueweb09-en0001-02-28120 Sony_BMG

23 Q0 clueweb09-en0001-07-19878 2 0.94 exampleRun clueweb09-en0001-07-15300

Document collection

English portion of the ClueWeb09 collection (about 500 million pages).

Topics and assessments

Entity homepages

Key changes

The key changes introduced to the 2010 edition of the REF task are as follows:

Pilot task: Related Entity Finding, LOD-variant

In this pilot we investigate using Linked Open Data (LOD) URIs instead of homepages for entity identification.

The task and the topics are the same as for the main REF task. Below, we indicate only the differences with respect to the main REF task.

Input

<query>

   <num>23</num>

   <entity_name>The Kingston Trio</entity_name>

   <entity_URL>clueweb09-en0009-81-29533</entity_URL>

   <entity_URI>http://dbpedia.org/resource/The_Kingston_Trio

   </entity_URI>

   <target_entity>recording company</target_entity>

   <narrative>What recording companies now sell the Kingston Trio's

   songs? </narrative>

</query>

Output

Submission format

Each answer record must have the following format:

topicID Q0 URI rank score runID supportDoc entityName

Example output

23 Q0 http://dbpedia.org/resource/Sony_BMG 1 0.98 exampleRun clueweb09-en0001-02-28120 Sony_BMG

23 Q0 http://dbpedia.org/resource/Universal_Music_Group exampleRun clueweb09-en0001-07-15300

Document collection

The LOD crawl used is the “Sindice dump” data set (courtesy of the Sindice team at DERI). Details and a sample of the collection will follow later this week.

The collection will be accompanied by tools for indexing and searching the collection.

Topics and assessments

Entity URIs

Main task 2: Entity List Completion

Entity List Completion (ELC) addresses essentially the same task as REF does: finding entities that are engaged in a specific relation with an input entity. There are two main differences to REF:

The ELC task then is defined as follows:

Given an information need and a list of known relevant entity homepages, return a list of relevant entity URIs from a specific collection of Linked Open Data.

Input

For each request (query) the following information is provided:

* We will make available the content of these ClueWeb documents to those who don’t have access/resources to process ClueWeb.

Example input

An example information need, “find recording companies that now sell the Kingston Trio’s songs” is formulated as follows:

<query>

   <num>23</num>

   <entity_name>The Kingston Trio</entity_name>

   <entity_homepage id=”clueweb09-en0009-81-29533”>

     http://www.kingstontrio.com/html/home.htm</entity_homepage>

   <target_entity>recording company</target_entity>

   <target_type_dbpedia>RecordLabel</target_type_dbpedia>

   <narrative>What recording companies now sell the Kingston Trio's

   songs? </narrative>

   <examples>

           <entity>

                         <homepage id="clueweb09-en0005-91-31233">

               http://www.capitolrecords.com/</homepage>

               <name>capitol records</name>

          </entity>

          <entity>

              <homepage id="clueweb09-en0007-52-02787">

               http://www.decca.be/en-index.asp</homepage>

              <homepage id="clueweb09-en0130-54-34988">

               http://www.deccarecords-us.com/</homepage>

              <name>decca records</name>

          </entity>

    ...

  </examples>

</query>

Output

Submission format

Each answer record must have the following format:

topicID Q0 URI rank score runID entityName

Example output

23 Q0 http://dbpedia.org/resource/Sony_BMG 1 0.98 exampleRun Sony_BMG

23 Q0 http://dbpedia.org/resource/Universal_Music_Group exampleRun

Document collections

Topics and assessments

Key changes