1 of 26

Midterm Review

CMSC 470 (Spring 2024) 🍀

2 of 26

Reminders

Make sure that you spend some time understanding the algorithms or concepts we’ve learned about from a foundational perspective
The exam is really not written to trip you up!

Just probing your understanding of what we’ve covered so far

The exam includes multiple choice and free response format questions.
This exam is open notes. You may use a calculator (highly recommended) and a one page (front and back) page of notes on US letter or A4 paper.
(Check Piazza post @192 for more exam information)

(Credit to the Fall 2021 TA: Neha Srikanth. Some slides are borrowed from her.)

3 of 26

Topics we have learned

Linguistic Concepts
Information Retrieval / TF-IDF: TF-IDF, Intrinsic Evaluation …
Regression: Logistic Regression, Stochastic Gradient Descent …
Distributional Semantics: Word2Vec …
Syntax: POS, Hidden Markov Model, Dependency Parsing …
Deep Learning: Multi-layer networks, Backpropagation, Computation Graphs, RNN, LSTM …

4 of 26

Topics covered today (from Piazza vote)

Linguistic Concepts
Information Retrieval / TF-IDF: TF-IDF, Intrinsic Evaluation …
Regression: Logistic Regression, Stochastic Gradient Descent …
Distributional Semantics: Word2Vec …
Syntax: POS, Hidden Markov Model, Dependency Parsing …
Deep Learning: Multi-layer networks, Backpropagation, Computation Graphs, RNN, LSTM …

5 of 26

Logistic Regression

Logistic Regression is an example of classification (instead of predicting a real number, i.e house price, age of child, etc), we’ll predict probabilities of a set of outcomes.

(weight vector)

(input observations)

(bias term)

Logistic / sigmoid function:

6 of 26

Logistic Regression: Practice Problems

Given the document X = {Mother,Work,Viagra,Mother}, how we could calculate P(Y=0|X)?

https://users.umiacs.umd.edu/~jbg/teaching/CMSC_723/01b_lr.pdf

Step 1:

X1 = 1

X2 = 2

X3 = 1

X4 = 0

Step 2:

7 of 26

Logistic Regression

How do we obtain the parameters (weights) of our logistic regression model from empirically observed data?

Goal: Find parameters that give you the highest probability of the data

Maximize the log likelihood of the training data with respect to model parameters beta

Maximizing the product of probabilities over the training set is equivalent to maximizing the sum of log probabilities

https://users.umiacs.umd.edu/~jbg/teaching/CMSC_470/lr_sgd.pdf

x0 = 1

8 of 26

Logistic Regression: Training

https://users.umiacs.umd.edu/~jbg/teaching/CMSC_470/lr_sgd.pdf

9 of 26

Logistic Regression: Algorithm

Algorithm to help us optimize objective function (log likelihood in this case)

10 of 26

Logistic Regression: Practice Problems

Imagine we have feature vector xi = [1, 2] and corresponding actual label yi = 1 for the ith example in our training set.

Suppose we have our current parameter vector be β=[−1, 2, −1]. (For now, assume e = 2 for easy calculation.)

Q1. Which class will the logistic regression classifier predict at this stage?

Q2. Which class will the logistic regression classifier predict for this example after one update has been done? (learning rate = 1.0)

11 of 26

Logistic Regression: Practice Problems

Imagine we have feature vector xi = [1, 2] and corresponding actual label yi = 1 for the ith example in our training set.

Suppose we have our current parameter vector be β=[−1, 2, −1].

(For now, assume e = 2 for easy calculation.)

Q1. Which class will the logistic regression classifier predict at this stage?

12 of 26

Logistic Regression: Practice Problems

Imagine we have feature vector xi = [1, 2] and corresponding actual label yi = 1 for the ith example in our training set.

Suppose we have our current parameter vector be β=[−1, 2, −1].

(For now, assume e = 2 for easy calculation.)

Q2. Which class will the logistic regression classifier predict for this example after one update has been done? (learning rate = 1.0)

13 of 26

Word2Vec

Represent words with their meaning (semantics)

https://users.umiacs.umd.edu/~jbg/teaching/CMSC_470/06b_word2vec.pdf

14 of 26

Word2Vec

Distributional hypothesis: Learn something about a meaning of a word based on the other words it appears with
Encode words with similar context to be close in some vector space

https://users.umiacs.umd.edu/~jbg/teaching/CMSC_470/06b_word2vec.pdf

How to measure similarity?

cosine similarity!

15 of 26

Word2Vec

Continuous Bag of Words (CBOW); Skip-grams

16 of 26

Word2Vec

Skip-grams: Predict context word(s) from focus word

https://users.umiacs.umd.edu/~jbg/teaching/CMSC_470/06b_word2vec.pdf

17 of 26

Word2Vec

Extract a word window:

We want to learn parameters so the below value is high:

Create a corrupt / negative example:

So that the below value is low:

19 of 26

TF-IDF

20 of 26

TF-IDF

21 of 26

LSTM

We will refer to the workshop slides. See Piazza post @194

22 of 26

Additional MC Questions

23 of 26

You have two distinct types w₁ and w₂. Their word2vec representations are very similar. Somebody tells you the part of speech of w₁. What have you learned about the part of speech of w₂?

Nothing, as syntax and semantics are distinct
w₂ must have a different part of speech, otherwise w₁ must be the same as w₂, which we know is not the case
w₂ probably has the same part of speech, as they appear in similar contexts
w₂ is probably a modifier (adjective or adverb) as it appears in similar contexts as w₁, it must somehow affect w₁'s meaning

24 of 26

What does the “hidden” in hidden Markov model refer to?

The depth of the corresponding computation graph
A continuous vector that is fed into a softmax to generate the observed word
A discrete latent variable that explains the multinomial distribution that generates the observed word
Which tagset was used to annotate parts of speech

25 of 26

What can you not get from the computation graph?

Gradient
Forward function
Input dimension
Learning rate

26 of 26

What is the best analogy between types and tokens and OOP?

Tokens are classes, types are polymorphisms
Tokens are instances, types are classes
Tokens are classes, types are methods
Tokens are methods, types are classes

1 of 26

2 of 26

3 of 26

4 of 26

5 of 26

6 of 26

7 of 26

8 of 26

9 of 26

10 of 26

11 of 26

12 of 26

13 of 26

14 of 26

15 of 26

16 of 26

17 of 26

18 of 26

19 of 26

20 of 26

21 of 26

22 of 26

23 of 26

24 of 26

25 of 26

26 of 26