1 of 22

AER: Autoregressive Entity Retrieval

The Natural Language Processing Reading Group Reviews:

2 of 22

Housekeeping

Anything the mod team needs to mention ahead of time

3 of 22

Reviewer

Adithya

4 of 22

Paper Metadata

Title: Autoregressive Entity Retrieval

Authors: Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela

Publication Venue: NeurIPS

Date: December 2020

Publication: https://arxiv.org/abs/2005.11401

5 of 22

Problem Statement

Encyclopedias like Wikipedia are structured by entities. We need to retrieve entities given a query, but this is knowledge intensive

Current approaches are classifiers and have some disadvantages:

Context is obtained through dot-product. So no fine-grained interaction between them
The memory increases linearly with the number of entities
We need to subsample a hard set of negative data at training time

Aim of the paper:

Capture the relation between context and entity name directly
Make it linear with the vocabulary
Remove the need for subsampling negative data

6 of 22

Terms to know

Entity set ε is the set of wiki articles
KB - Knowledge Base = Wikipedia
Each entity “e” in “ε” is assigned a unique sequence of tokens (wikipedia article title)
Entity disambiguation: The input x is annotated with a “mention”. We need to retrieve the entity (based on context of x)
Document retrieval: x is a query and entities are documents identified by unique titles

7 of 22

Formulation

Each entity is ranked using a score with autoregressive formulation

Maximize log likelihood of p
No need negative sampling to approximate loss normalizer

To preventive expensive scoring for every element in the entity set, they use beam search - top-k using k beams

Constrained to the set of valid entity identifiers - This is done using Trie (prefix tree)
Since we use only beam search, time cost is independent of entity set size. On average, 6 tokens and beam width 10

8 of 22

Formulation

End-to-End Entity Linking - detect entity mentions and link to KB entities

Annotate span boundaries with special tokens
Generation follows this diagram

9 of 22

Types of Tasks

11 of 22

🏺 Archaeologist

Adithya

12 of 22

Prior Paper 3

Title: CHOLAN: A Modular Approach for Neural Entity Linking on Wikipedia and Wikidata

Authors: Manoj Prabhakar Kannan Ravi, Kuldeep Singh, Isaiah Onando Mulang, Saeedeh Shekarpour, Johannes Hoffart, and Jens Lehmann

Publication Venue: European Chapter of the Association for Computational Linguistics

Date: April, 2021

Publication: https://aclanthology.org/2021.eacl-main.40.pdf

13 of 22

Approach

Three step approach for end-to-end neural entity linking:

Mention detection (does it automatically unlike the other case) using BERT
Candidates Generation
Entity Disambiguation

14 of 22

Approach - Stage 1 and 2

Logistic regression based classifier from fine tuned BERT output in the mention detection stage. Similar to the regression stage in the main paper.
For the candidate generation stage, they use:

DCA candidates: Prior probabilities for candidate entities for each mention are calculated. In the probabilistic entity map, each entity mention has 30 potential entity candidates. DCA also provides associated Wikipedia description of each entity
Falcon candidates: Local index of KG items from Wikidata entities expanded with entity aliases. The local KG index is adopted to generate entity candidates per entity mention in the employed datasets. The local KG has a querying mechanism using BM25 algorithm and ranked by the calculated score. 30 candidates from wiki are generated per mention

15 of 22

Approach - Stage 3

Token embedding: embedding of the corresponding token. The entity mention tokens appended at the beginning of S1 and separated from the sentence context tokens by a single vertical token bar |, likewise, for the entity context sequence S2, we prepend the entity title tokens from the KB before adding the descriptions.
Segment embedding: each of the sequences receive a single representation; ELC => local context (S1), EEC => extended context (S2)
Position embedding: represents the position of the token
Negative sampling is used to make it a binary classification task

16 of 22

🏺 Archaeologist

Aiswarya

17 of 22

Paper 2: Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking

Background

End to end entity linking systems consist of 3 steps

mention detection
candidate generation
entity disambiguation

Investigates the following

Can all those steps be learned jointly with a model for contextualized text representations?
How much entity knowledge is already contained in pretrained BERT
Does additional entity knowledge improve BERT’s performance in downstream tasks?

18 of 22

Motivation and Model

The goal of entity linking is - given a knowledge base (KB) and unstructured data, detect mentions of the KB’s entities in the unstructured data and link them to the correct KB entry

The entity linking task is generally defined through the following steps

mention detection (MD) - text spans of potential entity mentions are identified
candidate generation (CD) - entity candidates for each mention are retrieved from the KB
entity disambiguation (ED) - a mix of useful coreference and coherence features together with a classifier determine the entity link

19 of 22