1
Lecture 2:
Nearest Neighbor and�Linear Classification
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
2
Course web page
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 1 -
7 Sep. 2016
Lecture 2 -
6 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
3
Moodle
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 1 -
7 Sep. 2016
Lecture 2 -
6 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
4
Piazza
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 1 -
7 Sep. 2016
Lecture 2 -
6 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
5
Python: Up and running
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 1 -
7 Sep. 2016
Lecture 2 -
6 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
6
Hang’s Office Hours
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 1 -
7 Sep. 2016
Lecture 2 -
6 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
7
Readings
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 1 -
7 Sep. 2016
Lecture 2 -
6 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
8
Today’s Fun: WaveNet
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 1 -
7 Sep. 2016
Lecture 2 -
6 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
9
Back to Nearest Neighbor...
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
10
Challenges: Intraclass variation
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
An image classifier
11
Unlike e.g. sorting a list of numbers,
no obvious way to hard-code the algorithm for recognizing a cat, or other classes.
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Data-driven approach:
12
Example training set
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
First classifier: Nearest Neighbor Classifier
13
Remember all training images and their labels
Predict the label of the most similar training image
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
14
Example dataset: CIFAR-10
10 labels
50,000 training images, each image is tiny: 32x32
10,000 test images.
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
15
For every test image (first column), examples of nearest neighbors in rows
Example dataset: CIFAR-10
10 labels
50,000 training images
10,000 test images.
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
16
How do we compare the images? What is the distance metric?
L1 distance:
add
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
17
Nearest Neighbor classifier
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
18
Nearest Neighbor classifier
remember the training data
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
19
Nearest Neighbor classifier
for every test image:
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
20
Nearest Neighbor classifier
Q: how does the classification speed depend on the size of the training data?
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
21
Nearest Neighbor classifier
Q: how does the classification speed depend on the size of the training data? linearly :(
This is backwards:
- test time performance is usually much more important in practice.
- CNNs flip this: expensive training, cheap test evaluation
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
22
The choice of distance is a hyperparameter
common choices:
L1 (Manhattan) distance
L2 (Euclidean) distance
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
23
k-Nearest Neighbor
find the k nearest images, have them vote on the label
the data
NN classifier
5-NN classifier
http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
24
For every test image (first column), examples of nearest neighbors in rows
Example dataset: CIFAR-10
10 labels
50,000 training images
10,000 test images.
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
25
Q: what is the accuracy of the nearest neighbor classifier on the training data, when using the Euclidean distance?
the data
NN classifier
5-NN classifier
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
26
Q2: what is the accuracy of the k-nearest neighbor classifier on the training data?
the data
NN classifier
5-NN classifier
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
27
What is the best distance to use?
What is the best value of k to use?
i.e. how do we set the hyperparameters?
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
28
What is the best distance to use?
What is the best value of k to use?
i.e. how do we set the hyperparameters?
Very problem-dependent.
Must try them all out and see what works best.
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
29
Try out what hyperparameters work best on test set.
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
30
Trying out what hyperparameters work best on test set:
Very bad idea. The test set is a proxy for the generalization performance!
Use only VERY SPARINGLY, at the end.
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
31
Validation data
use to tune hyperparameters
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
32
Cross-validation
cycle through the choice of which fold is the validation fold, average results.
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
33
Example of
5-fold cross-validation
for the value of k.
Each point: single
outcome.
The line goes
through the mean, bars
indicated standard
deviation
(Seems that k ~= 7 works best
for this data)
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
34
k-Nearest Neighbor on images never used.
(all 3 images have same L2 distance to the one on the left)
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
35
Summary
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
36
Before moving on
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
37
Linear Classification
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
38
Example dataset: CIFAR-10
10 labels
50,000 training images
each image is 32x32x3
10,000 test images.
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
39
Parametric approach
[32x32x3]
array of numbers 0...1
(3072 numbers total)
f(x,W)
image
parameters
10 numbers, indicating class scores
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
40
Parametric approach: Linear classifier
[32x32x3]
array of numbers 0...1
10 numbers, indicating class scores
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
41
Parametric approach: Linear classifier
[32x32x3]
array of numbers 0...1
10 numbers, indicating class scores
3072x1
10x1
10x3072
parameters, or “weights”
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
42
Parametric approach: Linear classifier
[32x32x3]
array of numbers 0...1
10 numbers, indicating class scores
3072x1
10x1
10x3072
parameters, or “weights”
(+b)
10x1
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
43
Example with an image with 4 pixels, and 3 classes (cat/dog/ship)
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
44
Interpreting a Linear Classifier
Q: what does the linear classifier do, in English?
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
45
Interpreting a Linear Classifier
Example trained weights of a linear classifier trained on CIFAR-10:
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
46
Interpreting a Linear Classifier
[32x32x3]
array of numbers 0...1
(3072 numbers total)
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
47
Interpreting a Linear Classifier
Q2: what would be a very hard set of classes for a linear classifier to distinguish?
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
48
So far:
We defined a (linear) score function:
Example class
scores for 3
images, with a
random W:
-3.45
-8.87
0.09
2.9
4.48
8.02
3.78
1.06
-0.36
-0.72
-0.51
6.04
5.31
-4.22
-4.19
3.58
4.49
-4.37
-2.09
-2.93
3.42
4.64
2.65
5.1
2.64
5.55
-4.34
-1.5
-4.79
6.14
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
49
Coming up:
(quantifying what it means to have a “good” W)
(start with random W and find a W that minimizes the loss)
(tweak the functional form of f)
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson
Lecture 2 -
12 Sep. 2016
Lecture 2 -
12 Sep. 2016
Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson