1 of 49

1

Lecture 2:

Nearest Neighbor and�Linear Classification

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

2 of 49

2

Course web page

  • Go to my web page � (Google “Learned-Miller”)
  • Go to my teaching page
  • Click on top link

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 1 -

7 Sep. 2016

Lecture 2 -

6 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

3 of 49

3

Moodle

  • We will be using Moodle, but mostly just to turn in assignments.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 1 -

7 Sep. 2016

Lecture 2 -

6 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

4 of 49

4

Piazza

  • Good discussions already happening there.
    • Check it out!
  • Please post most questions there, rather than sending email to me or Hang.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 1 -

7 Sep. 2016

Lecture 2 -

6 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

5 of 49

5

Python: Up and running

  • We’re using Python 2.7, not 3.4.
  • If you have not installed Python and tried a simple program yet, please do so as soon as possible.
  • Try loading up the first Python Notebook for Homework 1. Even if you don’t have time to do the assignment yet, at least make sure the Python Notebook is working properly. This will make sure you don’t incur a major delay later.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 1 -

7 Sep. 2016

Lecture 2 -

6 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

6 of 49

6

Hang’s Office Hours

  • Hang has office hours
    • Monday: 4-5
    • Tuesday: 4-5
  • Please get Python Notebooks running by end of the day Tuesday. Go to office hours for help if necessary.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 1 -

7 Sep. 2016

Lecture 2 -

6 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

7 of 49

7

Readings

  • On Syllabus page,

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 1 -

7 Sep. 2016

Lecture 2 -

6 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

8 of 49

8

Today’s Fun: WaveNet

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 1 -

7 Sep. 2016

Lecture 2 -

6 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

9 of 49

9

Back to Nearest Neighbor...

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

10 of 49

10

Challenges: Intraclass variation

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

11 of 49

An image classifier

11

Unlike e.g. sorting a list of numbers,

no obvious way to hard-code the algorithm for recognizing a cat, or other classes.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

12 of 49

Data-driven approach:

  1. Collect a dataset of images and labels
  2. Use Machine Learning to train an image classifier
  3. Evaluate the classifier on a withheld set of test images

12

Example training set

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

13 of 49

First classifier: Nearest Neighbor Classifier

13

Remember all training images and their labels

Predict the label of the most similar training image

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

14 of 49

14

Example dataset: CIFAR-10

10 labels

50,000 training images, each image is tiny: 32x32

10,000 test images.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

15 of 49

15

For every test image (first column), examples of nearest neighbors in rows

Example dataset: CIFAR-10

10 labels

50,000 training images

10,000 test images.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

16 of 49

16

How do we compare the images? What is the distance metric?

L1 distance:

add

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

17 of 49

17

Nearest Neighbor classifier

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

18 of 49

18

Nearest Neighbor classifier

remember the training data

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

19 of 49

19

Nearest Neighbor classifier

for every test image:

  • find nearest train image with L1 distance
  • predict the label of nearest training image

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

20 of 49

20

Nearest Neighbor classifier

Q: how does the classification speed depend on the size of the training data?

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

21 of 49

21

Nearest Neighbor classifier

Q: how does the classification speed depend on the size of the training data? linearly :(

This is backwards:

- test time performance is usually much more important in practice.

- CNNs flip this: expensive training, cheap test evaluation

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

22 of 49

22

The choice of distance is a hyperparameter

common choices:

L1 (Manhattan) distance

L2 (Euclidean) distance

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

23 of 49

23

k-Nearest Neighbor

find the k nearest images, have them vote on the label

the data

NN classifier

5-NN classifier

http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

24 of 49

24

For every test image (first column), examples of nearest neighbors in rows

Example dataset: CIFAR-10

10 labels

50,000 training images

10,000 test images.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

25 of 49

25

Q: what is the accuracy of the nearest neighbor classifier on the training data, when using the Euclidean distance?

the data

NN classifier

5-NN classifier

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

26 of 49

26

Q2: what is the accuracy of the k-nearest neighbor classifier on the training data?

the data

NN classifier

5-NN classifier

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

27 of 49

27

What is the best distance to use?

What is the best value of k to use?

i.e. how do we set the hyperparameters?

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

28 of 49

28

What is the best distance to use?

What is the best value of k to use?

i.e. how do we set the hyperparameters?

Very problem-dependent.

Must try them all out and see what works best.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

29 of 49

29

Try out what hyperparameters work best on test set.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

30 of 49

30

Trying out what hyperparameters work best on test set:

Very bad idea. The test set is a proxy for the generalization performance!

Use only VERY SPARINGLY, at the end.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

31 of 49

31

Validation data

use to tune hyperparameters

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

32 of 49

32

Cross-validation

cycle through the choice of which fold is the validation fold, average results.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

33 of 49

33

Example of

5-fold cross-validation

for the value of k.

Each point: single

outcome.

The line goes

through the mean, bars

indicated standard

deviation

(Seems that k ~= 7 works best

for this data)

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

34 of 49

34

k-Nearest Neighbor on images never used.

  • terrible performance at test time
  • distance metrics on level of whole images can be very unintuitive

(all 3 images have same L2 distance to the one on the left)

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

35 of 49

35

Summary

  • Image Classification: We are given a Training Set of labeled images, asked to predict labels on Test Set. Common to report the Accuracy of predictions (fraction of correctly predicted images)
  • We introduced the k-Nearest Neighbor Classifier, which predicts the labels based on nearest images in the training set
  • We saw that the choice of distance and the value of k are hyperparameters that are tuned using a validation set, or through cross-validation if the size of the data is small.
  • Once the best set of hyperparameters is chosen, the classifier is evaluated once on the test set, and reported as the performance of kNN on that data.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

36 of 49

36

Before moving on

  • K-NN: the Rodney Dangerfield of classifiers

  • Convergence of K-NN to the Bayes error rate.
  • Universality of K-NN.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

37 of 49

37

Linear Classification

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

38 of 49

38

Example dataset: CIFAR-10

10 labels

50,000 training images

each image is 32x32x3

10,000 test images.

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

39 of 49

39

Parametric approach

[32x32x3]

array of numbers 0...1

(3072 numbers total)

f(x,W)

image

parameters

10 numbers, indicating class scores

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

40 of 49

40

Parametric approach: Linear classifier

[32x32x3]

array of numbers 0...1

10 numbers, indicating class scores

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

41 of 49

41

Parametric approach: Linear classifier

[32x32x3]

array of numbers 0...1

10 numbers, indicating class scores

3072x1

10x1

10x3072

parameters, or “weights”

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

42 of 49

42

Parametric approach: Linear classifier

[32x32x3]

array of numbers 0...1

10 numbers, indicating class scores

3072x1

10x1

10x3072

parameters, or “weights”

(+b)

10x1

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

43 of 49

43

Example with an image with 4 pixels, and 3 classes (cat/dog/ship)

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

44 of 49

44

Interpreting a Linear Classifier

Q: what does the linear classifier do, in English?

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

45 of 49

45

Interpreting a Linear Classifier

Example trained weights of a linear classifier trained on CIFAR-10:

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

46 of 49

46

Interpreting a Linear Classifier

[32x32x3]

array of numbers 0...1

(3072 numbers total)

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

47 of 49

47

Interpreting a Linear Classifier

Q2: what would be a very hard set of classes for a linear classifier to distinguish?

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

48 of 49

48

So far:

We defined a (linear) score function:

Example class

scores for 3

images, with a

random W:

-3.45

-8.87

0.09

2.9

4.48

8.02

3.78

1.06

-0.36

-0.72

-0.51

6.04

5.31

-4.22

-4.19

3.58

4.49

-4.37

-2.09

-2.93

3.42

4.64

2.65

5.1

2.64

5.55

-4.34

-1.5

-4.79

6.14

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

49 of 49

49

Coming up:

  • Loss function
  • Optimization
  • ConvNets!

(quantifying what it means to have a “good” W)

(start with random W and find a W that minimizes the loss)

(tweak the functional form of f)

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson

Lecture 2 -

12 Sep. 2016

Lecture 2 -

12 Sep. 2016

Erik Learned-Miller and Hang Su�Some slides kindly provided by Fei-Fei Li, Andrej Karpathy, Justin Johnson