1 of 49

Lecture 2:

Nearest Neighbor and�Linear Classification

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

2 of 49

Course web page

Go to my web page � (Google “Learned-Miller”)
Go to my teaching page
Click on top link

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 1 -

7 Sep. 2016

Lecture 2 -

6 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

3 of 49

Moodle

We will be using Moodle, but mostly just to turn in assignments.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 1 -

7 Sep. 2016

Lecture 2 -

6 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

4 of 49

Piazza

Good discussions already happening there.

Check it out!

Please post most questions there, rather than sending email to me or Hang.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 1 -

7 Sep. 2016

Lecture 2 -

6 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

5 of 49

Python: Up and running

We’re using Python 2.7, not 3.4.
If you have not installed Python and tried a simple program yet, please do so as soon as possible.
Try loading up the first Python Notebook for Homework 1. Even if you don’t have time to do the assignment yet, at least make sure the Python Notebook is working properly. This will make sure you don’t incur a major delay later.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 1 -

7 Sep. 2016

Lecture 2 -

6 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

6 of 49

Hang’s Office Hours

Hang has office hours

Monday: 4-5
Tuesday: 4-5

Please get Python Notebooks running by end of the day Tuesday. Go to office hours for help if necessary.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 1 -

7 Sep. 2016

Lecture 2 -

6 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

7 of 49

Readings

On Syllabus page,

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 1 -

7 Sep. 2016

Lecture 2 -

6 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

8 of 49

Today’s Fun: WaveNet

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 1 -

7 Sep. 2016

Lecture 2 -

6 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

9 of 49

Back to Nearest Neighbor...

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

10 of 49

Challenges: Intraclass variation

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

11 of 49

An image classifier

Unlike e.g. sorting a list of numbers,

no obvious way to hard-code the algorithm for recognizing a cat, or other classes.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

12 of 49

Data-driven approach:

Collect a dataset of images and labels
Use Machine Learning to train an image classifier
Evaluate the classifier on a withheld set of test images

Example training set

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

13 of 49

First classifier: Nearest Neighbor Classifier

Remember all training images and their labels

Predict the label of the most similar training image

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

14 of 49

Example dataset: CIFAR-10

10 labels

50,000 training images, each image is tiny: 32x32

10,000 test images.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

15 of 49

For every test image (first column), examples of nearest neighbors in rows

Example dataset: CIFAR-10

10 labels

50,000 training images

10,000 test images.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

16 of 49

How do we compare the images? What is the distance metric?

L1 distance:

add

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

17 of 49

Nearest Neighbor classifier

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

18 of 49

Nearest Neighbor classifier

remember the training data

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

19 of 49

Nearest Neighbor classifier

for every test image:

find nearest train image with L1 distance
predict the label of nearest training image

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

20 of 49

Nearest Neighbor classifier

Q: how does the classification speed depend on the size of the training data?

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

21 of 49

Nearest Neighbor classifier

Q: how does the classification speed depend on the size of the training data? linearly :(

This is backwards:

- test time performance is usually much more important in practice.

- CNNs flip this: expensive training, cheap test evaluation

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

22 of 49

The choice of distance is a hyperparameter

common choices:

L1 (Manhattan) distance

L2 (Euclidean) distance

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

23 of 49

k-Nearest Neighbor

find the k nearest images, have them vote on the label

the data

NN classifier

5-NN classifier

http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

24 of 49

For every test image (first column), examples of nearest neighbors in rows

Example dataset: CIFAR-10

10 labels

50,000 training images

10,000 test images.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

25 of 49

Q: what is the accuracy of the nearest neighbor classifier on the training data, when using the Euclidean distance?

the data

NN classifier

5-NN classifier

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

26 of 49

Q2: what is the accuracy of the k-nearest neighbor classifier on the training data?

the data

NN classifier

5-NN classifier

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

27 of 49

What is the best distance to use?

What is the best value of k to use?

i.e. how do we set the hyperparameters?

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

28 of 49

What is the best distance to use?

What is the best value of k to use?

i.e. how do we set the hyperparameters?

Very problem-dependent.

Must try them all out and see what works best.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

29 of 49

Try out what hyperparameters work best on test set.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

30 of 49

Trying out what hyperparameters work best on test set:

Very bad idea. The test set is a proxy for the generalization performance!

Use only VERY SPARINGLY, at the end.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

31 of 49

Validation data

use to tune hyperparameters

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

32 of 49

Cross-validation

cycle through the choice of which fold is the validation fold, average results.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

33 of 49

Example of

5-fold cross-validation

for the value of k.

Each point: single

outcome.

The line goes

through the mean, bars

indicated standard

deviation

(Seems that k ~= 7 works best

for this data)

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

34 of 49

k-Nearest Neighbor on images never used.

terrible performance at test time
distance metrics on level of whole images can be very unintuitive

(all 3 images have same L2 distance to the one on the left)

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

35 of 49

Summary

Image Classification: We are given a Training Set of labeled images, asked to predict labels on Test Set. Common to report the Accuracy of predictions (fraction of correctly predicted images)
We introduced the k-Nearest Neighbor Classifier, which predicts the labels based on nearest images in the training set
We saw that the choice of distance and the value of k are hyperparameters that are tuned using a validation set, or through cross-validation if the size of the data is small.
Once the best set of hyperparameters is chosen, the classifier is evaluated once on the test set, and reported as the performance of kNN on that data.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

36 of 49

Before moving on

K-NN: the Rodney Dangerfield of classifiers

Convergence of K-NN to the Bayes error rate.
Universality of K-NN.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

37 of 49

Linear Classification

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

38 of 49

Example dataset: CIFAR-10

10 labels

50,000 training images

each image is 32x32x3

10,000 test images.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

39 of 49

Parametric approach

[32x32x3]

array of numbers 0...1

(3072 numbers total)

f(x,W)

image

parameters

10 numbers, indicating class scores

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

40 of 49

Parametric approach: Linear classifier

[32x32x3]

array of numbers 0...1

10 numbers, indicating class scores

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

41 of 49

Parametric approach: Linear classifier

[32x32x3]

array of numbers 0...1

10 numbers, indicating class scores

3072x1

10x1

10x3072

parameters, or “weights”

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

42 of 49

Parametric approach: Linear classifier

[32x32x3]

array of numbers 0...1

10 numbers, indicating class scores

3072x1

10x1

10x3072

parameters, or “weights”

(+b)

10x1

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

43 of 49

Example with an image with 4 pixels, and 3 classes (cat/dog/ship)

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

44 of 49

Interpreting a Linear Classifier

Q: what does the linear classifier do, in English?

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

45 of 49

Interpreting a Linear Classifier

Example trained weights of a linear classifier trained on CIFAR-10:

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

46 of 49

Interpreting a Linear Classifier

[32x32x3]

array of numbers 0...1

(3072 numbers total)

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

47 of 49

Interpreting a Linear Classifier

Q2: what would be a very hard set of classes for a linear classifier to distinguish?

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

48 of 49

So far:

We defined a (linear) score function:

Example class

scores for 3

images, with a

random W:

-3.45

-8.87

0.09

2.9

4.48

8.02

3.78

1.06

-0.36

-0.72

-0.51

6.04

5.31

-4.22

-4.19

3.58

4.49

-4.37

-2.09

-2.93

3.42

4.64

2.65

5.1

2.64

5.55

-4.34

-1.5

-4.79

6.14

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

49 of 49

Coming up:

Loss function
Optimization
ConvNets!

(quantifying what it means to have a “good” W)

(start with random W and find a W that minimizes the loss)

(tweak the functional form of f)

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson