Smoothing and Classifiers
COSC 426A: Natural Language Processing (Fall 2024)
Prof. Forrest Davis
Warm-up 🔥
Logistics
Learning Objectives
Smoothing Techniques
Katz back-off
Katz back-off (Katz, 1987)
Idea:
If we have reliable estimates of an ngram, great!
If not use a shorter context
Katz back-off (Katz, 1987)
Pros
Cons
P(eats | ants usually) ≟ P(eats | usually)
Absolute Discounting
zooming out to the common core
Discounting - Stealing Probability
Absolute Discounting: Magic Number
Church and Gale (1991)
Bigram count in training data | Bigram count in held-out data |
0 | 0.000270 |
1 | 0.448 |
2 | 1.25 |
3 | 2.24 |
4 | 3.23 |
5 | 4.21 |
6 | 5.23 |
7 | 6.21 |
8 | 7.21 |
9 | 8.26 |
0.75!
Absolute Discounting
Generation
Generation
Naive Bayes
Classification
Applications?? 👷‍♀️
Text Classification
Text Classification
Text Classification
P(class | text) - “the class is that which is most likely
for the text”
Bayes’ rule
Bayes’ rule for text classification
Prior
Evidence
Likelihood
Bayes’ rule for text classification
Bayes’ rule for text classification
Compute the prior probability
How could you compute the probability of a class (i.e. P(c))?
Compute the likelihood
How could you compute the probability that a particular text belong to some class (i.e. P(text | class))?
Naive Bayes
“Bag of words assumption”: each word is independent (i.e. position does not matter)
Before next class