1 of 60

CS 162: Natural Language Processing

Saadia Gabriel

Lecture 2:

Semantics & Pragmatics Part 1

2 of 60

Announcements

  • The Bruin Learn course website is visible
  • Everyone enrolled should be added to Piazza
  • We may need to shift the timing of Discussion B, an update is coming soon (take poll linked under week 1 on Bruin Learn)

3 of 60

Lecture Outline

2. Introduce Concepts of Lexical and Vector Semantics

1. Pros and Cons of Word Representations

Slides Courtesy of Nanyun (Violet) Peng

4 of 60

How do we represent a word?

  • How do we “understand” a word?

  • How can we know the relation/distance/similarity between words computationally?

5 of 60

Representing words as discrete symbols

  • Naïve way: represent words as atomic symbols: student, talk, university (BoW)

  • Represent word as a “one-hot” vector �[ 0      0     0        1         0   …   0 ]

egg   student   talk    university     happy      buy     

  • How large is (what’s the dimension of) this vector?
    • Vector dimension = number of words in vocabulary 
      • PTB data: ~50k
      • Google 1T data: 13M

`

6 of 60

***Discussion***

Is this a good representation?

7 of 60

Issues?

  • Dimensionality is large; vector is sparse

  • No similarity

   Vhappy  =  [0  0  0  1  0 ... 0 ]

   Vsad      = [0  0  1  0  0 ... 0 ]

   Vmilk     = [1  0  0  0  0 ... 0 ]

   Vhappy • Vsad  = Vhappy • Vmilk = 0

  • Cannot represent new words

`

8 of 60

How about unseen words/phrases

9 of 60

What is Lexical Semantics

Word meanings that can help decide:

  • Word Similarity 
    • Distributional (Vector) Models of Meaning

  • Word Relations

  • Word Sense Disambiguation

  • Semantic Roles

Distributional Hypothesis (Harris, 1954):

A word is characterized by the company it keeps. In other terms, words that occur in the same contexts tend to have similar meanings.

10 of 60

Intuition of Semantic Similarity

  • Semantically Close
    • bank-money
    • apple-fruit
    • tree-forest
    • bank-river
    • pen-paper
    • run-walk
    • mistake-error
    • car-wheel
  • Semantically Distant
    • doctor-mall
    • painting-January
    • math-river
    • apple-penguin
    • nurse-fruit
    • pen-river
    • clown-rocket
    • car-algebra

11 of 60

Why Are Two Words Related?

  • Meaning
    • Two concepts are close in terms of meaning (want-desire)

  • World knowledge
    • Two concepts have similar properties, often occur together, or occur in similar contexts (pencil-pen, pen-ink, dog-cat)

  • Psychology
    • We often think of the two concepts together (voting-home address, red-luck[in some culture])

12 of 60

Validity of Semantic Similarity

  • How to approach semantic distance as a valid linguistic phenomenon? How would you approach this problem?
  • Experiment (Rubenstein and Goodenough, 1965)
    • Compiled a list of word pairs
    • Subjects asked to judge semantic distance (from 0 to 4) for each pair
  • Results
    • Rank correlation between subjects is ~ 0.9
    • People are consistent!

13 of 60

***Discussion***

What can we use semantic similarity for?

14 of 60

Word similarity for plagiarism detection

15 of 60

Word similarity for historical linguistics:�semantic change over time

Kulkarni, Al-Rfou, Perozzi, Skiena 2015

16 of 60

Word similarity reflects gender stereotype

327 gender neutral occupations. Project on to she—he direction.

17 of 60

Two classes of similarity algorithms

  • Thesaurus-based algorithms
    • Are words “nearby” in a thesaurus hierarchy?
    • Do words have similar glosses (definitions)?

18 of 60

  • Distributional algorithms
    • Do words behave similarly in real-world usage?
  • Thesaurus-based algorithms
    • Are words “nearby” in a thesaurus hierarchy?
    • Do words have similar glosses (definitions)?

Two classes of similarity algorithms

19 of 60

WordNet: Online thesaurus

Developed at Princeton University in 1980s

20 of 60

https://en-word.net/

A hierarchically organized lexical database of English

There are now multilingual Wordnets for 200+ languages: https://globalwordnet.org/resources/wordnets-in-the-world

21 of 60

22 of 60

Terminology: lemma and wordform

  • A lemma or citation form
    • Representation of all forms with the same stem, part of speech, rough semantics
  • A wordform
    • The inflected word as it appears in text

23 of 60

Lemmas have senses

  • One lemma “bank” can have many meanings:

Sense 1:

…a bank can hold the investments in a custodial account…

Sense 2:

“…as agriculture burgeons on the east bank the river will shrink even more

  • Sense (or word sense)
    • A discrete representation 

                  of an aspect of a word’s meaning.

24 of 60

Homonymy:

multi-sense as an artifact

Homonyms: words that share a form (spelling or pronunciation) but have unrelated, distinct meanings:

  • bank1: financial institution,   
  • bank2:  sloping land

bat1: club for hitting a ball,   

bat2nocturnal flying mammal

25 of 60

Homonymy:

multi-sense as an artifact

A related multilingual concept is “false friends,” which have identical or similar forms in 2 languages but have different meanings across languages.

Think “pain” in French vs. “pain” in English.

26 of 60

***Discussion***

Why might homonymy be problematic in real-world applications?

27 of 60

Homonymy causes problems for NLP applications

  • Information retrieval

bat care”

  • Machine Translation

batmurciélago  (animal) or  bate (for baseball)

  • Text-to-Speech

bass (stringed instrument) vs. bass (fish)

28 of 60

Polysemy: related multi-sense

  • 1. The bank was constructed in 1875 out of local red brick.

  • 2. I withdrew the money from the bank 

  • Are those the same sense?

Sense 2: “A financial institution”

Sense 1: “The building belonging to a financial institution”

  • A polysemous word has related meanings. Most non-rare words have multiple meanings.

29 of 60

How do we know when a word has more than one sense?

30 of 60

Synonyms

  • Words (different forms) that have the same meaning in some or all contexts.

    • couch / sofa
    • big / large
    • automobile / car
    • vomit / throw up
    • Water / H20

31 of 60

32 of 60

Antonyms

  • Senses that are opposites with respect to one feature of meaning
  • Otherwise, they are very similar!
  • More formally: antonyms can
    • define a binary opposition

 short/long, fast/slow

dark/light   short/long

hot/cold   fast/slow   

be reversives:

 rise/fall, up/down

33 of 60

Hyponymy and Hypernymy

One sense is a hyponym of another if the first sense is more specific, denoting a subclass of the other

    • car is a hyponym of vehicle

mango is a hyponym of fruit

Conversely hypernym/superordinate (“hyper is super”)

vehicle is a hypernym of car

fruit is a hypernym of mango

34 of 60

Hyponymy more formally

  • Entailment:

A sense A is a hyponym of sense B if being an A entails being a B

  • Hyponymy is usually transitive 
    • (A hypo B and B hypo C entails A hypo C)
  • Another name: the IS-A hierarchy
    • A IS-A B      (or A ISA B)

B subsumes A

35 of 60

Meronymy

  • The part-whole relation

A leg is part of a chair; a wheel is part of a car.

  • Wheel is a meronym of car, and car is a holonym of wheel

36 of 60

How is “sense” defined in WordNet?

  • The synset (synonym set), the set of near-synonyms, instantiates a sense or concept, with a gloss
  • Example: chump as a noun with the gloss:

“a person who is gullible and easy to take advantage of”

This sense of “chump” is shared by 9 words:

chump1, fool2, gull1, mark9, patsy1, fall guy1, sucker1, soft touch1, mug2

37 of 60

Senses of “bass” in Wordnet

38 of 60

WordNet Hypernym Hierarchy for “bass”

39 of 60

Word Similarity

  • Synonymy: a binary relation
    • Two words are either synonymous or not
  • Similarity (or distance): a looser metric
    • Two words are more similar if they share more features of meaning
  • Similarity is properly a relation between senses
    • It’s not the word “bank” that is similar to the word “slope
    • Rather, Bank1 is similar to fund3
    • Bank2 is similar to slope5

But we can compute similarity over both words and senses!

40 of 60

Two classes of similarity algorithms

41 of 60

Path based similarity

  • Two concepts (senses/synsets) are similar if they are near each other in the thesaurus hierarchy 
    • =have a short path between them
    • concepts have path 1 to themselves

V

V

V

V

42 of 60

Refinements to path-based similarity

43 of 60

Example: path-based similarity�simpath(c1,c2) = 1/pathlen(c1,c2)

44 of 60

Thesaurus Methods: Limitations

  • Measure is only as good as the resource
    • Missing nuances (e.g., good vs. proficient)
    • Missing new concepts/new meanings of words
  • Limited in scope
    • Assumes IS-A relations
    • Works mostly for nouns
  • Role of context not accounted for
  • Not easily domain-adaptable
  • Resources not available in many languages

45 of 60

Next...

46 of 60

Theoretical foundation of distributional semantics

Intuitions:  Zellig Harris (1954):

  • “oculist and eye-doctor … occur in almost the same environments”

  • “If A and B have almost identical environments we say that they are synonyms.”

47 of 60

Intuition for distributional word similarity

  • Words that occur in the same contexts tend to have similar meanings

?

48 of 60

More intuition for distributional word similarity

A bottle of tesgüino is on the table

Everybody likes tesgüino

Tesgüino makes you drunk

We make tesgüino out of corn.

Let’s use this observation to define a new similarity algorithm.

  • From context words humans can guess tesgüino means an alcoholic beverage like beer

49 of 60

Modeling words with vectors

  • Model the meaning of a word by “embedding” in a vector space.
  • The meaning of a word is a vector of numbers

Vector models are also called “embeddings

  • Contrast: previously, word meaning is represented by a vocabulary index (“word number 545” → one hot vector)

50 of 60

Two classes of vector representation

51 of 60

Term-document matrix

52 of 60

 Term-document matrix

53 of 60

The words in a term-document matrix

54 of 60

The words in a term-document matrix

55 of 60

Issues about the term-document matrix

56 of 60

The word-word or word-context matrix

What dimensions do we have now?

57 of 60

The word-word or word-context matrix

Note: Very sparse! (~ 50,000 x 50,000)

We know the meanings are similar because of similar contexts

58 of 60

Word-word matrix

59 of 60

Problem with raw counts

60 of 60

Next Monday…

Mutual Information (X ; Y): measures how the information captured by a variable X decreases our uncertainty about a variable Y.