Information Extraction and Named Entity Recognition
Introducing the tasks:
Getting simple structured information out of text
Christopher Manning
Information Extraction
Christopher Manning
Information Extraction (IE)
Christopher Manning
Low-level information extraction
Christopher Manning
Low-level information extraction
Christopher Manning
Named Entity Recognition (NER)
Christopher Manning
Named Entity Recognition (NER)
Christopher Manning
Named Entity Recognition (NER)
Person
Date
Location
Organi-
zation
Christopher Manning
Named Entity Recognition (NER)
Christopher Manning
Information Extraction and Named Entity Recognition
Introducing the tasks:
Getting simple structured information out of text
Christopher Manning
Evaluation of Named Entity Recognition
The extension of Precision, Recall, and the F measure to sequences
Christopher Manning
The Named Entity Recognition Task
Task: Predict entities in a text
Foreign ORG
Ministry ORG
spokesman O
Shen PER
Guofang PER
told O
Reuters ORG
: :
}
Standard evaluation
is per entity, not per token
Christopher Manning
Precision/Recall/F1 for IE/NER
Christopher Manning
Evaluation of Named Entity Recognition
The extension of Precision, Recall, and the F measure to sequences
Christopher Manning
Sequence Models for Named Entity Recognition
Christopher Manning
The ML sequence model approach to NER
Training
Testing
Christopher Manning
Encoding classes for sequence labeling
IO encoding IOB encoding
Fred PER B-PER
showed O O
Sue PER B-PER
Mengqiu PER B-PER
Huang PER I-PER
‘s O O
new O O
painting O O
Christopher Manning
Features for sequence labeling
18
Christopher Manning
Features: Word substrings
Cotrimoxazole
Wethersfield
Alien Fury: Countdown to Invasion
oxa
:
field
Christopher Manning
Features: Word shapes
Varicella-zoster | Xx-xxx |
mRNA | xXXX |
CPA1 | XXXd |
Christopher Manning
Sequence Models for Named Entity Recognition
Christopher Manning
Maximum entropy sequence models
Maximum entropy Markov models (MEMMs) or Conditional Markov models
Christopher Manning
Sequence problems
VBG | NN | IN | DT | NN | IN | NN |
Chasing | opportunity | in | an | age | of | upheaval |
POS tagging
B | B | I | I | B | I | B | I | B | B |
而 | 相 | 对 | 于 | 这 | 些 | 品 | 牌 | 的 | 价 |
Word segmentation
PERS | O | O | O | ORG | ORG |
Murdoch | discusses | future | of | News | Corp. |
Named entity recognition
Text segmen-tation
Q
A
Q
A
A
A
Q
A
Christopher Manning
MEMM inference in systems
-3 | -2 | -1 | 0 | +1 |
DT | NNP | VBD | ??? | ??? |
The | Dow | fell | 22.6 | % |
Local Context
Features
W0 | 22.6 |
W+1 | % |
W-1 | fell |
T-1 | VBD |
T-1-T-2 | NNP-VBD |
hasDigit? | true |
… | … |
(Ratnaparkhi 1996; Toutanova et al. 2003, etc.)
Decision Point
Christopher Manning
Example: POS Tagging
-3 | -2 | -1 | 0 | +1 |
DT | NNP | VBD | ??? | ??? |
The | Dow | fell | 22.6 | % |
Local Context
Features
W0 | 22.6 |
W+1 | % |
W-1 | fell |
T-1 | VBD |
T-1-T-2 | NNP-VBD |
hasDigit? | true |
… | … |
Decision Point
(Ratnaparkhi 1996; Toutanova et al. 2003, etc.)
Christopher Manning
Example: POS Tagging
-3 | -2 | -1 | 0 | +1 |
DT | NNP | VBD | ??? | ??? |
The | Dow | fell | 22.6 | % |
Local Context
Features
W0 | 22.6 |
W+1 | % |
W-1 | fell |
T-1 | VBD |
T-1-T-2 | NNP-VBD |
hasDigit? | true |
… | … |
(Ratnaparkhi 1996; Toutanova et al. 2003, etc.)
Decision Point
Christopher Manning
Inference in Systems
Sequence Level
Local Level
Local
Data
Feature
Extraction
Features
Label
Optimization
Smoothing
Classifier Type
Features
Label
Sequence
Data
Maximum Entropy Models
Quadratic
Penalties
Conjugate
Gradient
Sequence Model
Inference
Local
Data
Local
Data
Christopher Manning
Greedy Inference
Sequence Model
Inference
Best Sequence
Christopher Manning
Beam Inference
Sequence Model
Inference
Best Sequence
Christopher Manning
Viterbi Inference
Sequence Model
Inference
Best Sequence
Christopher Manning
CRFs [Lafferty, Pereira, and McCallum 2001]
Christopher Manning
Maximum entropy sequence models
Maximum entropy Markov models (MEMMs) or Conditional Markov models
Christopher Manning
Relation Extraction
What is relation extraction?
Dan Jurafsky
Extracting relations from text
Company-Founding
Company IBM
Location New York
Date June 16, 1911
Original-Name Computing-Tabulating-Recording Co.
Founding-year(IBM,1911)
Founding-location(IBM,New York)
Dan Jurafsky
Extracting Relation Triples from Text
The Leland Stanford Junior University, commonly referred to as Stanford University or Stanford, is an American private research university located in Stanford, California … near Palo Alto, California… Leland Stanford…founded the university in 1891
Stanford EQ Leland Stanford Junior University
Stanford LOC-IN California
Stanford IS-A research university
Stanford LOC-NEAR Palo Alto
Stanford FOUNDED-IN 1891
Stanford FOUNDER Leland Stanford
Dan Jurafsky
Why Relation Extraction?
(acted-in ?x “E.T.”)(is-a ?y actor)(granddaughter-of ?x ?y)
36
Dan Jurafsky
Automated Content Extraction (ACE)
17 relations from 2008 “Relation Extraction Task”
Dan Jurafsky
Automated Content Extraction (ACE)
He was in Tennessee
XYZ, the parent company of ABC
John’s wife Yoko
Steve Jobs, co-founder of Apple…
38
Dan Jurafsky
UMLS: Unified Medical Language System
Injury disrupts Physiological Function
Bodily Location location-of Biologic Function
Anatomical Structure part-of Organism
Pharmacologic Substance causes Pathological Function
Pharmacologic Substance treats Pathologic Function
Dan Jurafsky
Extracting UMLS relations from a sentence
Doppler echocardiography can be used to diagnose left anterior descending artery stenosis in patients with type 2 diabetes
🡻
Echocardiography, Doppler DIAGNOSES Acquired stenosis
40
Dan Jurafsky
Databases of Wikipedia Relations
41
Relations extracted from Infobox
Stanford state California
Stanford motto “Die Luft der Freiheit weht”
…
Wikipedia Infobox
Dan Jurafsky
Relation databases �that draw from Wikipedia
subject predicate object
Golden Gate Park location San Francisco
dbpedia:Golden_Gate_Park dbpedia-owl:location dbpedia:San_Francisco
people/person/nationality, location/location/contains
people/person/profession, people/person/place-of-birth
biology/organism_higher_classification film/film/genre
42
Dan Jurafsky
Ontological relations
Examples from the WordNet Thesaurus
Dan Jurafsky
How to build relation extractors
Dan Jurafsky
Relation Extraction
What is relation extraction?
Dan Jurafsky
Relation Extraction
Using patterns to extract relations
Dan Jurafsky
Rules for extracting IS-A relation
Early intuition from Hearst (1992)
Dan Jurafsky
Rules for extracting IS-A relation
Early intuition from Hearst (1992)
Dan Jurafsky
Hearst’s Patterns for extracting IS-A relations
(Hearst, 1992): Automatic Acquisition of Hyponyms
“Y such as X ((, X)* (, and|or) X)”
“such Y as X”
“X or other Y”
“X and other Y”
“Y including X”
“Y, especially X”
Dan Jurafsky
Hearst’s Patterns for extracting IS-A relations
Hearst pattern | Example occurrences |
X and other Y | ...temples, treasuries, and other important civic buildings. |
X or other Y | Bruises, wounds, broken bones or other injuries... |
Y such as X | The bow lute, such as the Bambara ndang... |
Such Y as X | ...such authors as Herrick, Goldsmith, and Shakespeare. |
Y including X | ...common-law countries, including Canada and England... |
Y , especially X | European countries, especially France, England, and Spain... |
Dan Jurafsky
Extracting Richer Relations Using Rules
Dan Jurafsky
Named Entities aren’t quite enough.�Which relations hold between 2 entities?
Drug
Disease
Cure?
Prevent?
Cause?
Dan Jurafsky
What relations hold between 2 entities?
PERSON
ORGANIZATION
Founder?
Investor?
Member?
Employee?
President?
Dan Jurafsky
Extracting Richer Relations Using Rules and�Named Entities
Who holds what office in what organization?
PERSON, POSITION of ORG
PERSON(named|appointed|chose|etc.) PERSON Prep? POSITION
PERSON [be]? (named|appointed|etc.) Prep? ORG POSITION
Dan Jurafsky
Hand-built patterns for relations
Dan Jurafsky
Relation Extraction
Using patterns to extract relations
Dan Jurafsky
Relation Extraction
Supervised relation extraction
Dan Jurafsky
Supervised machine learning for relations
58
Dan Jurafsky
How to do classification in supervised relation extraction
59
Dan Jurafsky
Automated Content Extraction (ACE)
17 sub-relations of 6 relations from 2008 “Relation Extraction Task”
Dan Jurafsky
Relation Extraction
Classify the relation between two entities in a sentence
American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said.
SUBSIDIARY
FAMILY
EMPLOYMENT
NIL
FOUNDER
CITIZEN
INVENTOR
…
Dan Jurafsky
Word Features for Relation Extraction
Airlines Wagner Airlines-Wagner
{American, Airlines, Tim, Wagner, American Airlines, Tim Wagner}
M2: -1 spokesman
M2: +1 said
{a, AMR, of, immediately, matched, move, spokesman, the, unit}
American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said
Mention 1
Mention 2
Dan Jurafsky
Named Entity Type and Mention Level�Features for Relation Extraction
American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said
Mention 1
Mention 2
Dan Jurafsky
Parse Features for Relation Extraction
NP NP PP VP NP NP
NP 🡹 NP 🡹 S 🡹 S 🡻 NP
Airlines matched Wagner said
American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said
Mention 1
Mention 2
Dan Jurafsky
Gazeteer and trigger word features for relation extraction
Dan Jurafsky
American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said.
Dan Jurafsky
Classifiers for supervised methods
Dan Jurafsky
Evaluation of Supervised Relation Extraction
68
Dan Jurafsky
Summary: Supervised Relation Extraction
+ Can get high accuracies with enough hand-labeled training data, if test similar enough to training
- Labeling a large training set is expensive
- Supervised models are brittle, don’t generalize well to different genres
Dan Jurafsky
Relation Extraction
Supervised relation extraction
Dan Jurafsky
Relation Extraction
Semi-supervised and unsupervised relation extraction
Dan Jurafsky
Seed-based or bootstrapping approaches to relation extraction
Dan Jurafsky
Relation Bootstrapping (Hearst 1992)
Dan Jurafsky
Bootstrapping
“Mark Twain is buried in Elmira, NY.”
X is buried in Y
“The grave of Mark Twain is in Elmira”
The grave of X is in Y
“Elmira is Mark Twain’s final resting place”
Y is X’s final resting place.
Dan Jurafsky
Dipre: Extract <author,book> pairs
The Comedy of Errors, by William Shakespeare, was
The Comedy of Errors, by William Shakespeare, is
The Comedy of Errors, one of William Shakespeare's earliest attempts
The Comedy of Errors, one of William Shakespeare's most
?x , by ?y , ?x , one of ?y ‘s
Brin, Sergei. 1998. Extracting Patterns and Relations from the World Wide Web.
Author | Book |
Isaac Asimov | The Robots of Dawn |
David Brin | Startide Rising |
James Gleick | Chaos: Making a New Science |
Charles Dickens | Great Expectations |
William Shakespeare | The Comedy of Errors |
Dan Jurafsky
�Snowball
{’s, in, headquarters}
{in, based}
ORGANIZATION
LOCATION
Organization | Location of Headquarters |
Microsoft | Redmond |
Exxon | Irving |
IBM | Armonk |
E. Agichtein and L. Gravano 2000. Snowball: Extracting Relations
from Large Plain-Text Collections. ICDL
ORGANIZATION
LOCATION
.69
.75
Dan Jurafsky
Distant Supervision
Snow, Jurafsky, Ng. 2005. Learning syntactic patterns for automatic hypernym discovery. NIPS 17
Fei Wu and Daniel S. Weld. 2007. Autonomously Semantifying Wikipeida. CIKM 2007
Mintz, Bills, Snow, Jurafsky. 2009. Distant supervision for relation extraction without labeled data. ACL09
Dan Jurafsky
Distant supervision paradigm
Dan Jurafsky
Distantly supervised learning �of relation extraction patterns
For each relation
For each tuple in big database
Find sentences in large corpus with both entities
Extract frequent features (parse, words, etc)
Train supervised classifier using thousands of patterns
4
1
2
3
5
PER was born in LOC
PER, born (XXXX), LOC
PER’s birthplace in LOC
<Edwin Hubble, Marshfield>
<Albert Einstein, Ulm>
Born-In
Hubble was born in Marshfield
Einstein, born (1879), Ulm
Hubble’s birthplace in Marshfield
P(born-in | f1,f2,f3,…,f70000)
Dan Jurafsky
Unsupervised relation extraction
(FCI, specializes in, software development)
(Tesla, invented, coil transformer)
80
M. Banko, M. Cararella, S. Soderland, M. Broadhead, and O. Etzioni. 2007. Open information extraction from the web. IJCAI
Dan Jurafsky
Evaluation of Semi-supervised and�Unsupervised Relation Extraction
81
Dan Jurafsky
Relation Extraction
Semi-supervised and unsupervised relation extraction
Dan Jurafsky