CMSC 470 (Spring 2024) 🍀
1
Reminders
2
(Credit to the Fall 2021 TA: Neha Srikanth. Some slides are borrowed from her.)
Topics we have learned
3
Topics covered today (from Piazza vote)
4
Logistic Regression
Logistic Regression is an example of classification (instead of predicting a real number, i.e house price, age of child, etc), we’ll predict probabilities of a set of outcomes.
5
(weight vector)
(input observations)
(bias term)
Logistic / sigmoid function:
Logistic Regression: Practice Problems
Given the document X = {Mother,Work,Viagra,Mother}, how we could calculate P(Y=0|X)?
6
Step 1:
X1 = 1
X2 = 2
X3 = 1
X4 = 0
Step 2:
Logistic Regression
How do we obtain the parameters (weights) of our logistic regression model from empirically observed data?
Goal: Find parameters that give you the highest probability of the data
7
https://users.umiacs.umd.edu/~jbg/teaching/CMSC_470/lr_sgd.pdf
x0 = 1
Logistic Regression: Training
8
https://users.umiacs.umd.edu/~jbg/teaching/CMSC_470/lr_sgd.pdf
Logistic Regression: Algorithm
Algorithm to help us optimize objective function (log likelihood in this case)
9
Logistic Regression: Practice Problems
Imagine we have feature vector xi = [1, 2] and corresponding actual label yi = 1 for the ith example in our training set.
Suppose we have our current parameter vector be β=[−1, 2, −1]. (For now, assume e = 2 for easy calculation.)
Q1. Which class will the logistic regression classifier predict at this stage?
Q2. Which class will the logistic regression classifier predict for this example after one update has been done? (learning rate = 1.0)
10
Logistic Regression: Practice Problems
Imagine we have feature vector xi = [1, 2] and corresponding actual label yi = 1 for the ith example in our training set.
Suppose we have our current parameter vector be β=[−1, 2, −1].
(For now, assume e = 2 for easy calculation.)
Q1. Which class will the logistic regression classifier predict at this stage?
11
Logistic Regression: Practice Problems
Imagine we have feature vector xi = [1, 2] and corresponding actual label yi = 1 for the ith example in our training set.
Suppose we have our current parameter vector be β=[−1, 2, −1].
(For now, assume e = 2 for easy calculation.)
Q2. Which class will the logistic regression classifier predict for this example after one update has been done? (learning rate = 1.0)
12
Word2Vec
13
https://users.umiacs.umd.edu/~jbg/teaching/CMSC_470/06b_word2vec.pdf
Word2Vec
14
https://users.umiacs.umd.edu/~jbg/teaching/CMSC_470/06b_word2vec.pdf
How to measure similarity?
cosine similarity!
Word2Vec
Continuous Bag of Words (CBOW); Skip-grams
15
Word2Vec
Skip-grams: Predict context word(s) from focus word
16
https://users.umiacs.umd.edu/~jbg/teaching/CMSC_470/06b_word2vec.pdf
Word2Vec
We want to learn parameters so the below value is high:
So that the below value is low:
17
TF-IDF
18
TF-IDF
19
TF-IDF
20
LSTM
We will refer to the workshop slides. See Piazza post @194
21
Additional MC Questions
22
You have two distinct types w1 and w2. Their word2vec representations are very similar. Somebody tells you the part of speech of w1. What have you learned about the part of speech of w2?
23
What does the “hidden” in hidden Markov model refer to?
24
What can you not get from the computation graph?
25
What is the best analogy between types and tokens and OOP?
26