CS 188: Artificial Intelligence�
Machine Learning
Instructor: Nicholas Tomlin --- University of California, Berkeley
[These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.]
Machine Learning
Classification
Example: Spam Filter
Dear Sir.
First, I must solicit your confidence in this transaction, this is by virture of its nature as being utterly confidencial and top secret. …
TO BE REMOVED FROM FUTURE MAILINGS, SIMPLY REPLY TO THIS MESSAGE AND PUT "REMOVE" IN THE SUBJECT.
99 MILLION EMAIL ADDRESSES
FOR ONLY $99
Ok, Iknow this is blatantly OT but I'm beginning to go insane. Had an old Dell Dimension XPS sitting in the corner and decided to put it to use, I know it was working pre being stuck in the corner, but when I plugged it in, hit the power nothing happened.
Example: Digit Recognition
0
1
2
1
??
Other Classification Tasks
classes: diseases)
classes: fraud / no fraud)
classes: grades)
Model-Based Classification
Model-Based Classification
Naïve Bayes for Digits
is more or less than 0.5 in underlying image
Y
F1
Fn
F2
General Naïve Bayes
Y
F1
Fn
F2
|Y| parameters
n x |F| x |Y| parameters
|Y| x |F|n values
Inference for Naïve Bayes
+
General Naïve Bayes
Example: Conditional Probabilities
1 | 0.1 |
2 | 0.1 |
3 | 0.1 |
4 | 0.1 |
5 | 0.1 |
6 | 0.1 |
7 | 0.1 |
8 | 0.1 |
9 | 0.1 |
0 | 0.1 |
1 | 0.01 |
2 | 0.05 |
3 | 0.05 |
4 | 0.30 |
5 | 0.80 |
6 | 0.90 |
7 | 0.05 |
8 | 0.60 |
9 | 0.50 |
0 | 0.80 |
1 | 0.05 |
2 | 0.01 |
3 | 0.90 |
4 | 0.80 |
5 | 0.90 |
6 | 0.90 |
7 | 0.25 |
8 | 0.85 |
9 | 0.60 |
0 | 0.80 |
A Spam Filter
Dear Sir.
First, I must solicit your confidence in this transaction, this is by virture of its nature as being utterly confidencial and top secret. …
TO BE REMOVED FROM FUTURE MAILINGS, SIMPLY REPLY TO THIS MESSAGE AND PUT "REMOVE" IN THE SUBJECT.
99 MILLION EMAIL ADDRESSES
FOR ONLY $99
Ok, Iknow this is blatantly OT but I'm beginning to go insane. Had an old Dell Dimension XPS sitting in the corner and decided to put it to use, I know it was working pre being stuck in the corner, but when I plugged it in, hit the power nothing happened.
Naïve Bayes for Text
Word at position i, not ith word in the dictionary!
Example: Spam Filtering
the : 0.0156
to : 0.0153
and : 0.0115
of : 0.0095
you : 0.0093
a : 0.0086
with: 0.0080
from: 0.0075
...
the : 0.0210
to : 0.0133
of : 0.0119
2002: 0.0110
with: 0.0108
from: 0.0107
and : 0.0105
a : 0.0100
...
ham : 0.66
spam: 0.33
Spam Example
Word | P(w|spam) | P(w|ham) | Tot Spam | Tot Ham |
(prior) | 0.33333 | 0.66666 | -1.1 | -0.4 |
Gary | 0.00002 | 0.00021 | -11.8 | -8.9 |
would | 0.00069 | 0.00084 | -19.1 | -16.0 |
you | 0.00881 | 0.00304 | -23.8 | -21.8 |
like | 0.00086 | 0.00083 | -30.9 | -28.9 |
to | 0.01517 | 0.01339 | -35.1 | -33.2 |
lose | 0.00008 | 0.00002 | -44.5 | -44.0 |
weight | 0.00016 | 0.00002 | -53.3 | -55.0 |
while | 0.00027 | 0.00027 | -61.5 | -63.2 |
you | 0.00881 | 0.00304 | -66.2 | -69.0 |
sleep | 0.00006 | 0.00001 | -76.0 | -80.5 |
P(spam | w) = 98.9
Training and Testing
Empirical Risk Minimization
Important Concepts
Training
Data
Held-Out
Data
Test
Data
Generalization and Overfitting
Overfitting
0
2
4
6
8
10
12
14
16
18
20
-15
-10
-5
0
5
10
15
20
25
30
Degree 15 polynomial
Example: Overfitting
2 wins!!
Example: Overfitting
south-west : inf
nation : inf
morally : inf
nicely : inf
extent : inf
seriously : inf
...
What went wrong here?
screens : inf
minute : inf
guaranteed : inf
$205.00 : inf
delivery : inf
signature : inf
...
Generalization and Overfitting
Parameter Estimation
Parameter Estimation
r
r
b
r
b
b
r
b
b
r
b
b
r
b
b
r
b
b
Smoothing
Maximum Likelihood?
????
Unseen Events
Laplace Smoothing
r
r
b
Laplace Smoothing
r
r
b
Estimation: Linear Interpolation*
Real NB: Smoothing
helvetica : 11.4
seems : 10.8
group : 10.2
ago : 8.4
areas : 8.3
...
verdana : 28.8
Credit : 28.4
ORDER : 27.2
<FONT> : 26.9
money : 26.5
...
Do these make more sense?
Tuning
Tuning on Held-Out Data
Features
Errors, and What to Do
Dear GlobalSCAPE Customer,
GlobalSCAPE has partnered with ScanSoft to offer you the latest version of OmniPage Pro, for just $99.99* - the regular list price is $499! The most common question we've received about this offer is - Is this genuine? We would like to assure you that this offer is authorized by ScanSoft, is genuine and valid. You can get the . . .
. . . To receive your $30 Amazon.com promotional certificate, click through to
http://www.amazon.com/apparel
and see the prominent link for the $30 offer. All details are there. We hope you enjoyed receiving this message. However, if you'd rather not receive future e-mails announcing new store launches, please click . . .
What to Do About Errors?
Baselines
Confidences from a Classifier
Summary
Linear Classifiers
Feature Vectors
Hello,
Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just
# free : 2
YOUR_NAME : 0
MISSPELLED : 2
FROM_FRIEND : 0
...
SPAM
or
+
PIXEL-7,12 : 1
PIXEL-7,13 : 0
...
NUM_LOOPS : 1
...
“2”
Some (Simplified) Biology
Linear Classifiers
Σ
f1
f2
f3
w1
w2
w3
>0?
Weights
# free : 2
YOUR_NAME : 0
MISSPELLED : 2
FROM_FRIEND : 0
...
# free : 4
YOUR_NAME :-1
MISSPELLED : 1
FROM_FRIEND :-3
...
# free : 0
YOUR_NAME : 1
MISSPELLED : 1
FROM_FRIEND : 1
...
Dot product positive means the positive class
Decision Rules
Binary Decision Rule
BIAS : -3
free : 4
money : 2
...
0
1
0
1
2
free
money
+1 = SPAM
-1 = HAM
Weight Updates
Learning: Binary Perceptron
Learning: Binary Perceptron
Examples: Perceptron
Multiclass Decision Rule
Binary = multiclass where the negative class has weight zero
Learning: Multiclass Perceptron
Example: Multiclass Perceptron
BIAS : 1
win : 0
game : 0
vote : 0
the : 0
...
BIAS : 0
win : 0
game : 0
vote : 0
the : 0
...
BIAS : 0
win : 0
game : 0
vote : 0
the : 0
...
“win the vote”
“win the election”
“win the game”
Properties of Perceptrons
Separable
Non-Separable
Problems with the Perceptron
Improving the Perceptron
Non-Separable Case: Deterministic Decision
Even the best linear boundary makes at least one mistake
Non-Separable Case: Probabilistic Decision
0.5 | 0.5
0.3 | 0.7
0.1 | 0.9
0.7 | 0.3
0.9 | 0.1
How to get probabilistic decisions?
Best w?
with:
= Logistic Regression
Separable Case: Deterministic Decision – Many Options
Separable Case: Probabilistic Decision – Clear Preference
0.5 | 0.5
0.3 | 0.7
0.7 | 0.3
0.5 | 0.5
0.3 | 0.7
0.7 | 0.3
Multiclass Logistic Regression
original activations
softmax activations
Best w?
with:
= Multi-Class Logistic Regression
Maximum Likelihood Estimation
Parameter Estimation with Maximum Likelihood
r
r
b
X | red | blue |
| | |
Parameter Estimation with Maximum Likelihood
r
r
b
X | red | blue |
| | |
Parameter Estimation (General Case)
r
r
b
X | red | blue |
| | |
Parameter Estimation (General Case)
Example from Discussion 6B
Regularization
Recall: Overfitting
0
2
4
6
8
10
12
14
16
18
20
-15
-10
-5
0
5
10
15
20
25
30
Degree 15 polynomial
Example: Overfitting
2 wins!!
Recall: Overfitting
L1 and L2 Regularization
L1
(aka lasso regression)
L2
(aka ridge regression)