1 of 39

Word Embeddings

Unit 2, Module 2.4

1

2 of 39

First 10 minutes for studying~

Quiz on Word Embeddings :)

2

3 of 39

Warm Up:

3

What is an “apple”?

  • What do you get when you google “apple”?

What is a “monarch”?

  • What do you get when you google “monarch”?

What’s going on here?

4 of 39

Play Semantris

How does the computer know�which words are related?

4

Play Blocks version

5 of 39

How does the computer know which

words are related?

5

Discussion

6 of 39

6

Gender

Age

man

1

7

woman

boy

girl

9

2

Fill in the missing Feature values

7 of 39

7

Gender

Age

man

1

7

woman

9

7

boy

1

2

girl

9

2

grandfather

adult

child

infant

Fill in the missing values

8 of 39

8

Gender

Age

man

1

7

woman

9

7

boy

1

2

girl

9

2

grandfather

1

9

adult

5

7

child

5

2

infant

5

1

9 of 39

9

How would you represent:�

  • grandmother�
  • grandparent�
  • teenager�
  • octogenarian

10 of 39

More Semantic Dimensions

How would you represent king, queen prince, and princess?

We need another semantic dimension.

What would you call it?

10

11 of 39

11

Gender

Age

Royalty

man

1

7

1

woman

9

7

1

boy

1

2

1

girl

9

2

1

king

1

8

8

queen

9

7

8

prince

1

2

8

princess

9

2

8

monarch

12 of 39

12

Gender

Age

Royalty

man

1

7

1

woman

9

7

1

boy

1

2

1

girl

9

2

1

king

1

8

8

queen

9

7

8

prince

1

2

8

princess

9

2

8

monarch

5

7

8

13 of 39

Semantic Distance

Is “boy” closer to “girl” or to “queen”?

How can we measure the distance?

  • Count the number of features where they differ:� boy:girl - 1 feature� boy:queen - 3 features�
  • Measure distance between points using the Pythagorean theorem:� boy:girl distance 8� boy:queen distance 11.75

13

Gender

Age

Royalty

man

1

7

1

woman

9

7

1

boy

1

2

1

girl

9

2

1

king

1

8

8

queen

9

7

8

prince

1

2

8

princess

9

2

8

14 of 39

14

Gender

Age

man

1

7

woman

9

7

boy

1

2

girl

9

2

Pythagorean theorem

8

5

√(82 + 52) = 9.43

The distance from boy to woman is 9.43

Measure distance between points using the Pythagorean theorem:

15 of 39

Remember Semantris

To do what Semantris does, we can select a clue word and measure the distances from our clue to the target words in the blocks.

The target word with the smallest distance should be selected.

Our chosen target: Pasta (red block)

Our clue word: Macaroni

Note another target word might have a smaller distance and be selected instead, e.g., Cheese (blue block) might be chosen because “macaroni and cheese” is a common phrase.

15

16 of 39

Remember Semantris

Using multiple words as clues: We can average the clue words together and measure the distance from the average to each of the target words.

Our chosen target: Pasta

Clue words: Macaroni, Spaghetti, Linguini, Italian.

16

17 of 39

What other types of reasoning could a computer do using semantic feature space?

17

18 of 39

Lots of stuff!

18

How Google understands the questions you ask it.

Translating between languages

Summarizing an online article

Writing a fictional story

Solving analogy problems

19 of 39

Analogies

“Man” is to “king” as “woman” is to .

19

AnalogY - A comparison between two things based on their relationships.

20 of 39

Analogies

“Man” is to “king” as “woman” is to queen .

20

Can we use semantic features to solve this problem?

21 of 39

21

This is what you were doing inside your head:

Woman is to queen as…

Man is to___________

Girl is to ___________

Boy is to ___________

22 of 39

What is “king” - “man” ?

22

Gender

Age

Royalty

king

1

8

8

man

1

7

1

king - man

0

1

7

red arrow

23 of 39

What is “king” - “man” + “woman” ?

23

Gender

Age

Royalty

king

1

8

8

man

1

7

1

king - man

0

1

7

woman

9

7

1

+ woman

9

8

8

queen

9

7

8

red arrow

green arrow

24 of 39

Try it:

24

“woman” is to “girl” as�“king” is to ______.

Draw the red and green arrows.

25 of 39

Try it:

25

“woman” is to “girl” as�“king” is to prince .

26 of 39

Try it #2:

26

“boy” is to “king” as�“prince” is to ______.

Draw the red and green arrows.

27 of 39

“woman” is to “girl” as “king” is to ______.

27

Gender

Age

Royalty

girl

9

2

1

woman

9

7

1

girl - woman

king

1

8

8

+ king

Answer: ____________

Directions: Use the table to look up the feature values and compute the answer.

green arrow

red arrow

28 of 39

How many dimensions do we need?

How would you represent “father”, “uncle”, “sister”, “cousin”?

What dimension would you add to help you represent “cucumber”?

Your Answer: __________________________

What dimension would you add to help you represent “smile”? Or “honesty”?

Your Answer:__________________________

We need many more dimensions. Can we get the computer to help us?

Word embeddings: type of word representation that allows words with similar meaning to have a similar representation

28

29 of 39

How are word embeddings created?

Start with lots of text. For each word, pay attention to which words occur before it and after it. Create features that capture these statistics.

  • “king” and “queen” should be similar because both occur with words like “the”, “palace”, “crown”, “castle”, etc.
  • But they are also different because “king” occurs more with “he” and “his” while “queen” occurs more with “she” and “her”.

29

were

visiting

the

king

in

his

palace

reported

to

the

queen

in

her

palace

●●●

●●●

●●●

●●●

30 of 39

30

feature

A 300

Dimensional word embedding used in the online “WordEmbeddingDemo”.

31 of 39

Introduction to Dave’s Word Embedding Demos

& Student Activities

31

32 of 39

Introduction to the 3D Semantic Space

Try It: 3D Semantic Space

  1. Get comfortable interacting with the demo
    1. Rotate the word display
    2. Zoom in and out
  2. Hover over a word to see which words are closest to it
  3. Add new words
    • Does the demo know that word?
    • The demo has a 50,000 word vocabulary, so it doesn’t know all words
  4. Add related words (e.g., apple, cherry, orange, lemon, and lime) and see how they cluster.

32

Watch It

Word Embedding Demo, Part 1 (3:37)

33 of 39

Introduction to the Feature Vector Display

Try it: Vector Visualization

  1. Drag the mouse across a word and look at its feature values. �Observe the relationship between the numerical value and the colors
    1. The more positive the number the redder it should be
    2. The more negative the bluer it should be
    3. Zero is gray
  2. Compare the feature vectors of two words that have similar meanings
    • Try boy and girl - What do you notice
    • Try another pair of words
  3. Replace a word in the feature vector display
    • Look at the features of the word
    • Compare it to another word, what do you notice?
  4. Choose a specific feature (gender is 126), see how the value differs for different words, are the values of the features marked correctly?
    • Check out 121, what do you think that feature might be related to.

33

Watch It

Word Embedding Demo, Part 2 (2:56)

34 of 39

Introduction to Analogies

Try it: Analogies

  1. Start by trying the analogies in the video and rotate and zoom so you can see what the 3D positioning of these words looks like
    1. man is to king as woman is to ______�What do you notice?

    • cat is to cats as goose is to ________�What do you notice?���
  • Change the Age semantic dimension to Number
    • What do you notice now about the positions of singular versus plural���
    • Add additional words to the semantic space, what do you notice��
  • Create your own analogies and see if the demo finds the answer that you expect.
    • ________ is to______as ______ is to ______�What do you notice?��
    • ________ is to______as ______ is to ______�What do you notice?

34

Watch It

35 of 39

Take away

Real AI systems use word embeddings�

  • for handling search queries�
  • for machine translation�
  • for intent recognition

35

36 of 39

36

37 of 39

Exit Ticket

Explain: How does the computer know which words are related?

37

38 of 39

Limitations

  • Limitations
    • Homonyns - same word with same spelling or different meaning. Their statistics will be glommed together.
    • Word embeddings are imperfect encoding
      • Ex. apple - near chair and computer.
      • What are the closet things to apple. Expect orange
      • But instead it ipad, iphone cupertino
      • This is because of the training data. This word embedding relates to the tech company
      • Ex. man
      • Man mankind
      • Man adult man
      • Man - verb
      • Very different than woman
      • Cherry - new unspoiled, flavor, fruit, color
      • Peach - peachy
      • Bananas - crazy
      • Orange - neutral
      • Plum - sweet job
      • Prune - age
      • Idiomatic usages, fun fact about language that we didn’t think about before.
    • Biased toward cisgender.
      • Gender non-binary will fit into the middle (neutral)
      • They, Them, their (neutral)
  • Ethical issues
    • Analogies -> gender and professions
      • Man : doctor , woman : nurse
      • Look for some of these biases.
      • This comes up in google translate.
      • Google image -> is something different.

38

Christina - turn into pictures - These were some other notes we had in the Unit 2 Overview Table

39 of 39

  • Ethical issues
    • Analogies -> gender and professions
      • Man : doctor , woman : nurse
      • Look for some of these biases.
      • This comes up in google translate.
      • Google image -> is something different.

39