Assignment 3 and Vector Embeddings
CSE 447 / 517
Feb 6TH, 2025 (WEEK 5)
Logistics
Agenda
Assignment 3 Preview
Geometry of Word Embeddings
Analogies
Bias in Word Embeddings
Bias in Word Embeddings
From Word Embeddings to Sentence Level Embeddings
KNN Classifier using Glove-Based Sentence Embeddings
Word Embeddings: A Quick Review
Man-woman relations in embeddings
Comparative-superlative relations in embeddings
Distributional Hypothesis, again
These context words define banking.
Dense Word Vectors
0.281 | 0.271 | -0.121 | |||
0.129 | 0.110 | 0.930 | |||
U.S. = | 0.312 | Washington = | 0.311 | grass = | 0.121 |
| -1.29 | | -1.33 | | 1.53 |
| -0.21 | | -0.11 | | -0.51 |
If words appear in similar contexts, they have similar vectors!
“U.S.” and “Washington” occur in similar contexts!
"Static" Word Embeddings
Each word maps to a single vector, based on their occurrence with other words in a large corpus.
Connects to LSA/I, parallels to LMs
Examples of popular pretrained word embeddings:
Word2Vec: Overview
Word2Vec: Overview
Word2Vec: Overview
Word2Vec: Loss Function
For each position in the text.
For each word within the window
Probability of word in window given center word.
Word2Vec: Loss Function
Word2Vec: Loss Function
Word2Vec: Now with Vectors!
Word2Vec: Now with Vectors!
Word2Vec: Why this prediction function?
Clusters of dense word vectors
Why separate center and context vectors?
Why separate center and context vectors?
Two Variants of Word2Vec
CBOW in practice
Skipgram is like the reverse of CBOW?
Okay, okay just kidding, here's the real SkipGram diagram:
Contextualized Word Embeddings
Premise: define a vector for each token based its context in the data
Contextualized Word Embeddings
ELMo (Peters et al., 2018)
ELMo, visually
BERT
BERT (Devlin et al., 2019) :
BERT
Pretrain + finetune like we discussed!
BERT’s Performance on GLUE tasks (Devlin et al., 2019)
BERTology
Interactive Word Embeddings
Questions?