1 of 26

CMSC 470 (Spring 2024) 🍀

1

2 of 26

Reminders

  • Make sure that you spend some time understanding the algorithms or concepts we’ve learned about from a foundational perspective
  • The exam is really not written to trip you up!
    • Just probing your understanding of what we’ve covered so far
  • The exam includes multiple choice and free response format questions.
  • This exam is open notes. You may use a calculator (highly recommended) and a one page (front and back) page of notes on US letter or A4 paper.
  • (Check Piazza post @192 for more exam information)

2

(Credit to the Fall 2021 TA: Neha Srikanth. Some slides are borrowed from her.)

3 of 26

Topics we have learned

  • Linguistic Concepts
  • Information Retrieval / TF-IDF: TF-IDF, Intrinsic Evaluation …
  • Regression: Logistic Regression, Stochastic Gradient Descent …
  • Distributional Semantics: Word2Vec …
  • Syntax: POS, Hidden Markov Model, Dependency Parsing …
  • Deep Learning: Multi-layer networks, Backpropagation, Computation Graphs, RNN, LSTM …

3

4 of 26

Topics covered today (from Piazza vote)

  • Linguistic Concepts
  • Information Retrieval / TF-IDF: TF-IDF, Intrinsic Evaluation …
  • Regression: Logistic Regression, Stochastic Gradient Descent …
  • Distributional Semantics: Word2Vec
  • Syntax: POS, Hidden Markov Model, Dependency Parsing …
  • Deep Learning: Multi-layer networks, Backpropagation, Computation Graphs, RNN, LSTM

4

5 of 26

Logistic Regression

Logistic Regression is an example of classification (instead of predicting a real number, i.e house price, age of child, etc), we’ll predict probabilities of a set of outcomes.

5

(weight vector)

(input observations)

(bias term)

Logistic / sigmoid function:

6 of 26

Logistic Regression: Practice Problems

Given the document X = {Mother,Work,Viagra,Mother}, how we could calculate P(Y=0|X)?

6

Step 1:

X1 = 1

X2 = 2

X3 = 1

X4 = 0

Step 2:

7 of 26

Logistic Regression

How do we obtain the parameters (weights) of our logistic regression model from empirically observed data?

Goal: Find parameters that give you the highest probability of the data

  • Maximize the log likelihood of the training data with respect to model parameters beta
    • Maximizing the product of probabilities over the training set is equivalent to maximizing the sum of log probabilities

7

https://users.umiacs.umd.edu/~jbg/teaching/CMSC_470/lr_sgd.pdf

x0 = 1

8 of 26

Logistic Regression: Training

8

https://users.umiacs.umd.edu/~jbg/teaching/CMSC_470/lr_sgd.pdf

9 of 26

Logistic Regression: Algorithm

Algorithm to help us optimize objective function (log likelihood in this case)

9

10 of 26

Logistic Regression: Practice Problems

Imagine we have feature vector xi = [1, 2] and corresponding actual label yi = 1 for the ith example in our training set.

Suppose we have our current parameter vector be β=[−1, 2, −1]. (For now, assume e = 2 for easy calculation.)

Q1. Which class will the logistic regression classifier predict at this stage?

Q2. Which class will the logistic regression classifier predict for this example after one update has been done? (learning rate = 1.0)

10

11 of 26

Logistic Regression: Practice Problems

Imagine we have feature vector xi = [1, 2] and corresponding actual label yi = 1 for the ith example in our training set.

Suppose we have our current parameter vector be β=[−1, 2, −1].

(For now, assume e = 2 for easy calculation.)

Q1. Which class will the logistic regression classifier predict at this stage?

11

12 of 26

Logistic Regression: Practice Problems

Imagine we have feature vector xi = [1, 2] and corresponding actual label yi = 1 for the ith example in our training set.

Suppose we have our current parameter vector be β=[−1, 2, −1].

(For now, assume e = 2 for easy calculation.)

Q2. Which class will the logistic regression classifier predict for this example after one update has been done? (learning rate = 1.0)

12

13 of 26

Word2Vec

  • Represent words with their meaning (semantics)

13

https://users.umiacs.umd.edu/~jbg/teaching/CMSC_470/06b_word2vec.pdf

14 of 26

Word2Vec

  • Distributional hypothesis: Learn something about a meaning of a word based on the other words it appears with
  • Encode words with similar context to be close in some vector space

14

https://users.umiacs.umd.edu/~jbg/teaching/CMSC_470/06b_word2vec.pdf

How to measure similarity?

cosine similarity!

15 of 26

Word2Vec

Continuous Bag of Words (CBOW); Skip-grams

15

16 of 26

Word2Vec

Skip-grams: Predict context word(s) from focus word

16

https://users.umiacs.umd.edu/~jbg/teaching/CMSC_470/06b_word2vec.pdf

17 of 26

Word2Vec

  1. Extract a word window:

We want to learn parameters so the below value is high:

  • Create a corrupt / negative example:

So that the below value is low:

17

18 of 26

TF-IDF

18

19 of 26

TF-IDF

19

20 of 26

TF-IDF

20

21 of 26

LSTM

We will refer to the workshop slides. See Piazza post @194

21

22 of 26

Additional MC Questions

22

23 of 26

You have two distinct types w1 and w2. Their word2vec representations are very similar. Somebody tells you the part of speech of w1. What have you learned about the part of speech of w2?

  1. Nothing, as syntax and semantics are distinct
  2. w2 must have a different part of speech, otherwise w1 must be the same as w2, which we know is not the case
  3. w2 probably has the same part of speech, as they appear in similar contexts
  4. w2 is probably a modifier (adjective or adverb) as it appears in similar contexts as w1, it must somehow affect w1's meaning

23

24 of 26

What does the “hidden” in hidden Markov model refer to?

  1. The depth of the corresponding computation graph
  2. A continuous vector that is fed into a softmax to generate the observed word
  3. A discrete latent variable that explains the multinomial distribution that generates the observed word
  4. Which tagset was used to annotate parts of speech

24

25 of 26

What can you not get from the computation graph?

  1. Gradient
  2. Forward function
  3. Input dimension
  4. Learning rate

25

26 of 26

What is the best analogy between types and tokens and OOP?

  1. Tokens are classes, types are polymorphisms
  2. Tokens are instances, types are classes
  3. Tokens are classes, types are methods
  4. Tokens are methods, types are classes

26