Relation Extraction
What is relation extraction?
Dan Jurafsky
Extracting relations from text
Company-Founding
Company IBM
Location New York
Date June 16, 1911
Original-Name Computing-Tabulating-Recording Co.
Founding-year(IBM,1911)
Founding-location(IBM,New York)
Dan Jurafsky
Extracting Relation Triples from Text
The Leland Stanford Junior University, commonly referred to as Stanford University or Stanford, is an American private research university located in Stanford, California … near Palo Alto, California… Leland Stanford…founded the university in 1891
Stanford EQ Leland Stanford Junior University
Stanford LOC-IN California
Stanford IS-A research university
Stanford LOC-NEAR Palo Alto
Stanford FOUNDED-IN 1891
Stanford FOUNDER Leland Stanford
Dan Jurafsky
Why Relation Extraction?
(acted-in ?x “E.T.”)(is-a ?y actor)(granddaughter-of ?x ?y)
4
Dan Jurafsky
Automated Content Extraction (ACE)
17 relations from 2008 “Relation Extraction Task”
Dan Jurafsky
Automated Content Extraction (ACE)
He was in Tennessee
XYZ, the parent company of ABC
John’s wife Yoko
Steve Jobs, co-founder of Apple…
6
Dan Jurafsky
UMLS: Unified Medical Language System
Injury disrupts Physiological Function
Bodily Location location-of Biologic Function
Anatomical Structure part-of Organism
Pharmacologic Substance causes Pathological Function
Pharmacologic Substance treats Pathologic Function
Dan Jurafsky
Extracting UMLS relations from a sentence
Doppler echocardiography can be used to diagnose left anterior descending artery stenosis in patients with type 2 diabetes
🡻
Echocardiography, Doppler DIAGNOSES Acquired stenosis
8
Dan Jurafsky
Databases of Wikipedia Relations
9
Relations extracted from Infobox
Stanford state California
Stanford motto “Die Luft der Freiheit weht”
…
Wikipedia Infobox
Dan Jurafsky
Relation databases �that draw from Wikipedia
subject predicate object
Golden Gate Park location San Francisco
dbpedia:Golden_Gate_Park dbpedia-owl:location dbpedia:San_Francisco
people/person/nationality, location/location/contains
people/person/profession, people/person/place-of-birth
biology/organism_higher_classification film/film/genre
10
Dan Jurafsky
Ontological relations
Examples from the WordNet Thesaurus
Dan Jurafsky
How to build relation extractors
Dan Jurafsky
Relation Extraction
What is relation extraction?
Dan Jurafsky
Relation Extraction
Using patterns to extract relations
Dan Jurafsky
Rules for extracting IS-A relation
Early intuition from Hearst (1992)
Dan Jurafsky
Rules for extracting IS-A relation
Early intuition from Hearst (1992)
Dan Jurafsky
Hearst’s Patterns for extracting IS-A relations
(Hearst, 1992): Automatic Acquisition of Hyponyms
“Y such as X ((, X)* (, and|or) X)”
“such Y as X”
“X or other Y”
“X and other Y”
“Y including X”
“Y, especially X”
Dan Jurafsky
Hearst’s Patterns for extracting IS-A relations
Hearst pattern | Example occurrences |
X and other Y | ...temples, treasuries, and other important civic buildings. |
X or other Y | Bruises, wounds, broken bones or other injuries... |
Y such as X | The bow lute, such as the Bambara ndang... |
Such Y as X | ...such authors as Herrick, Goldsmith, and Shakespeare. |
Y including X | ...common-law countries, including Canada and England... |
Y , especially X | European countries, especially France, England, and Spain... |
Dan Jurafsky
Extracting Richer Relations Using Rules
Dan Jurafsky
Named Entities aren’t quite enough.�Which relations hold between 2 entities?
Drug
Disease
Cure?
Prevent?
Cause?
Dan Jurafsky
What relations hold between 2 entities?
PERSON
ORGANIZATION
Founder?
Investor?
Member?
Employee?
President?
Dan Jurafsky
Extracting Richer Relations Using Rules and�Named Entities
Who holds what office in what organization?
PERSON, POSITION of ORG
PERSON(named|appointed|chose|etc.) PERSON Prep? POSITION
PERSON [be]? (named|appointed|etc.) Prep? ORG POSITION
Dan Jurafsky
Hand-built patterns for relations
Dan Jurafsky
Relation Extraction
Using patterns to extract relations
Dan Jurafsky
Relation Extraction
Supervised relation extraction
Dan Jurafsky
Supervised machine learning for relations
26
Dan Jurafsky
How to do classification in supervised relation extraction
27
Dan Jurafsky
Automated Content Extraction (ACE)
17 sub-relations of 6 relations from 2008 “Relation Extraction Task”
Dan Jurafsky
Relation Extraction
Classify the relation between two entities in a sentence
American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said.
SUBSIDIARY
FAMILY
EMPLOYMENT
NIL
FOUNDER
CITIZEN
INVENTOR
…
Dan Jurafsky
Word Features for Relation Extraction
Airlines Wagner Airlines-Wagner
{American, Airlines, Tim, Wagner, American Airlines, Tim Wagner}
M2: -1 spokesman
M2: +1 said
{a, AMR, of, immediately, matched, move, spokesman, the, unit}
American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said
Mention 1
Mention 2
Dan Jurafsky
Named Entity Type and Mention Level�Features for Relation Extraction
American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said
Mention 1
Mention 2
Dan Jurafsky
Parse Features for Relation Extraction
NP NP PP VP NP NP
NP 🡹 NP 🡹 S 🡹 S 🡻 NP
Airlines matched Wagner said
American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said
Mention 1
Mention 2
Dan Jurafsky
Gazeteer and trigger word features for relation extraction
Dan Jurafsky
American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said.
Dan Jurafsky
Classifiers for supervised methods
Dan Jurafsky
Evaluation of Supervised Relation Extraction
36
Dan Jurafsky
Summary: Supervised Relation Extraction
+ Can get high accuracies with enough hand-labeled training data, if test similar enough to training
- Labeling a large training set is expensive
- Supervised models are brittle, don’t generalize well to different genres
Dan Jurafsky
Relation Extraction
Supervised relation extraction
Dan Jurafsky
Relation Extraction
Semi-supervised and unsupervised relation extraction
Dan Jurafsky
Seed-based or bootstrapping approaches to relation extraction
Dan Jurafsky
Relation Bootstrapping (Hearst 1992)
Dan Jurafsky
Bootstrapping
“Mark Twain is buried in Elmira, NY.”
X is buried in Y
“The grave of Mark Twain is in Elmira”
The grave of X is in Y
“Elmira is Mark Twain’s final resting place”
Y is X’s final resting place.
Dan Jurafsky
Dipre: Extract <author,book> pairs
The Comedy of Errors, by William Shakespeare, was
The Comedy of Errors, by William Shakespeare, is
The Comedy of Errors, one of William Shakespeare's earliest attempts
The Comedy of Errors, one of William Shakespeare's most
?x , by ?y , ?x , one of ?y ‘s
Brin, Sergei. 1998. Extracting Patterns and Relations from the World Wide Web.
Author | Book |
Isaac Asimov | The Robots of Dawn |
David Brin | Startide Rising |
James Gleick | Chaos: Making a New Science |
Charles Dickens | Great Expectations |
William Shakespeare | The Comedy of Errors |
Dan Jurafsky
�Snowball
{’s, in, headquarters}
{in, based}
ORGANIZATION
LOCATION
Organization | Location of Headquarters |
Microsoft | Redmond |
Exxon | Irving |
IBM | Armonk |
E. Agichtein and L. Gravano 2000. Snowball: Extracting Relations
from Large Plain-Text Collections. ICDL
ORGANIZATION
LOCATION
.69
.75
Dan Jurafsky
Distant Supervision
Snow, Jurafsky, Ng. 2005. Learning syntactic patterns for automatic hypernym discovery. NIPS 17
Fei Wu and Daniel S. Weld. 2007. Autonomously Semantifying Wikipeida. CIKM 2007
Mintz, Bills, Snow, Jurafsky. 2009. Distant supervision for relation extraction without labeled data. ACL09
Dan Jurafsky
Distant supervision paradigm
Dan Jurafsky
Distantly supervised learning �of relation extraction patterns
For each relation
For each tuple in big database
Find sentences in large corpus with both entities
Extract frequent features (parse, words, etc)
Train supervised classifier using thousands of patterns
4
1
2
3
5
PER was born in LOC
PER, born (XXXX), LOC
PER’s birthplace in LOC
<Edwin Hubble, Marshfield>
<Albert Einstein, Ulm>
Born-In
Hubble was born in Marshfield
Einstein, born (1879), Ulm
Hubble’s birthplace in Marshfield
P(born-in | f1,f2,f3,…,f70000)
Dan Jurafsky
Unsupervised relation extraction
(FCI, specializes in, software development)
(Tesla, invented, coil transformer)
48
M. Banko, M. Cararella, S. Soderland, M. Broadhead, and O. Etzioni. 2007. Open information extraction from the web. IJCAI
Dan Jurafsky
Evaluation of Semi-supervised and�Unsupervised Relation Extraction
49
Dan Jurafsky
Relation Extraction
Semi-supervised and unsupervised relation extraction
Dan Jurafsky