CS 162: Natural Language Processing
Saadia Gabriel
Lecture 2:
Semantics & Pragmatics Part 1
Announcements
Lecture Outline
2. Introduce Concepts of Lexical and Vector Semantics
1. Pros and Cons of Word Representations
Slides Courtesy of Nanyun (Violet) Peng
How do we represent a word?
Representing words as discrete symbols
egg student talk university happy buy
`
***Discussion***
Is this a good representation?
Issues?
Vhappy = [0 0 0 1 0 ... 0 ]
Vsad = [0 0 1 0 0 ... 0 ]
Vmilk = [1 0 0 0 0 ... 0 ]
Vhappy • Vsad = Vhappy • Vmilk = 0
`
How about unseen words/phrases
What is Lexical Semantics
Word meanings that can help decide:
Distributional Hypothesis (Harris, 1954):
A word is characterized by the company it keeps. In other terms, words that occur in the same contexts tend to have similar meanings.
Intuition of Semantic Similarity
Why Are Two Words Related?
Validity of Semantic Similarity
***Discussion***
What can we use semantic similarity for?
Word similarity for plagiarism detection
Word similarity for historical linguistics:�semantic change over time
Kulkarni, Al-Rfou, Perozzi, Skiena 2015
Word similarity reflects gender stereotype
327 gender neutral occupations. Project on to she—he direction.
Two classes of similarity algorithms
Two classes of similarity algorithms
WordNet: Online thesaurus
Developed at Princeton University in 1980s
https://en-word.net/
A hierarchically organized lexical database of English
There are now multilingual Wordnets for 200+ languages: https://globalwordnet.org/resources/wordnets-in-the-world
Terminology: lemma and wordform
Lemmas have senses
Sense 1:
…a bank can hold the investments in a custodial account…
Sense 2:
“…as agriculture burgeons on the east bank the river will shrink even more”
of an aspect of a word’s meaning.
Homonymy:
multi-sense as an artifact
Homonyms: words that share a form (spelling or pronunciation) but have unrelated, distinct meanings:
bat1: club for hitting a ball,
bat2: nocturnal flying mammal
Homonymy:
multi-sense as an artifact
A related multilingual concept is “false friends,” which have identical or similar forms in 2 languages but have different meanings across languages.
Think “pain” in French vs. “pain” in English.
***Discussion***
Why might homonymy be problematic in real-world applications?
Homonymy causes problems for NLP applications
“bat care”
bat: murciélago (animal) or bate (for baseball)
bass (stringed instrument) vs. bass (fish)
Polysemy: related multi-sense
Sense 2: “A financial institution”
Sense 1: “The building belonging to a financial institution”
How do we know when a word has more than one sense?
Synonyms
Antonyms
short/long, fast/slow
dark/light short/long
hot/cold fast/slow
be reversives:
rise/fall, up/down
Hyponymy and Hypernymy
One sense is a hyponym of another if the first sense is more specific, denoting a subclass of the other
mango is a hyponym of fruit
Conversely hypernym/superordinate (“hyper is super”)
vehicle is a hypernym of car
fruit is a hypernym of mango
Hyponymy more formally
A sense A is a hyponym of sense B if being an A entails being a B
B subsumes A
Meronymy
A leg is part of a chair; a wheel is part of a car.
How is “sense” defined in WordNet?
“a person who is gullible and easy to take advantage of”
This sense of “chump” is shared by 9 words:
chump1, fool2, gull1, mark9, patsy1, fall guy1, sucker1, soft touch1, mug2
Senses of “bass” in Wordnet
WordNet Hypernym Hierarchy for “bass”
Word Similarity
But we can compute similarity over both words and senses!
Two classes of similarity algorithms
Path based similarity
V
V
V
V
Refinements to path-based similarity
Example: path-based similarity�simpath(c1,c2) = 1/pathlen(c1,c2)
Thesaurus Methods: Limitations
Next...
Theoretical foundation of distributional semantics
Intuitions: Zellig Harris (1954):
Intuition for distributional word similarity
?
More intuition for distributional word similarity
A bottle of tesgüino is on the table
Everybody likes tesgüino
Tesgüino makes you drunk
We make tesgüino out of corn.
Let’s use this observation to define a new similarity algorithm.
Modeling words with vectors
Vector models are also called “embeddings”
Two classes of vector representation
Term-document matrix
Term-document matrix
The words in a term-document matrix
The words in a term-document matrix
Issues about the term-document matrix
The word-word or word-context matrix
What dimensions do we have now?
The word-word or word-context matrix
Note: Very sparse! (~ 50,000 x 50,000)
We know the meanings are similar because of similar contexts
Word-word matrix
Problem with raw counts
Next Monday…
Mutual Information (X ; Y): measures how the information captured by a variable X decreases our uncertainty about a variable Y.