1 of 40

1

Lecture 6:

Neural Networks part 2

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

2 of 40

Optional help session this Friday

  • Vector, Matrix, and Tensor Derivatives + the Chain Rule
  • 9am CS building

2

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

3 of 40

3

f

activations

gradients

“local gradient”

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

4 of 40

4

Implementation: forward/backward API

Graph (or Net) object. (Rough pseudo code)

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

5 of 40

5

Implementation: forward/backward API

(x,y,z are scalars)

*

x

y

z

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

6 of 40

6

Implementation: forward/backward API

(x,y,z are scalars)

*

x

y

z

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

7 of 40

7

Vectorized operations

f(x) = max(0,x)

(elementwise)

4096-d

input vector

4096-d

output vector

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

8 of 40

8

Vectorized operations

f(x) = max(0,x)

(elementwise)

4096-d

input vector

4096-d

output vector

Q: what is the size of the Jacobian matrix?

Jacobian matrix

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

9 of 40

9

Gradients for vectorized code

f

“local gradient”

This is now the Jacobian matrix (derivative of each element of z w.r.t. each element of x)

(x,y,z are now vectors)

gradients

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

10 of 40

10

max(0,x)

(elementwise)

4096-d

input vector

4096-d

output vector

Q: what is the size of the Jacobian matrix?

[4096 x 4096!]

Q2: what does it look like?

Vectorized operations

Jacobian matrix

f(x) = max(0,x)

(elementwise)

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

11 of 40

11

Aside: Image Features

12 of 40

12

Example: Color (Hue) Histogram

hue bins

+1

13 of 40

13

Example: HOG/SIFT features

8x8 pixel region,

quantize the edge orientation into 9 bins

(image from vlfeat.org)

14 of 40

14

Example: HOG/SIFT features

8x8 pixel region,

quantize the edge orientation into 9 bins

(image from vlfeat.org)

Many more:

GIST, LBP, Texton, SSIM, ...

15 of 40

15

Jointly learning about edges and colors

16 of 40

16

Example: Bag of Words

144

visual word vectors

learn k-means centroids

“vocabulary” of visual words

e.g. 1000 centroids

1000-d vector

1000-d vector

1000-d vector

histogram of visual words

17 of 40

17

[32x32x3]

f

10 numbers, indicating class scores

Feature Extraction

vector describing various image statistics

[32x32x3]

f

10 numbers, indicating class scores

training

training

18 of 40

18

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

19 of 40

19

Neural Network:

(Before) Linear score function:

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

20 of 40

20

Neural Network:

(Before) Linear score function:

(Now) 2-layer Neural Network

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

21 of 40

21

Neural Network:

(Before) Linear score function:

(Now) 2-layer Neural Network

x

h

W1

s

W2

3072

100

10

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

22 of 40

22

Neural Network:

(Before) Linear score function:

(Now) 2-layer Neural Network

x

h

W1

s

W2

3072

100

10

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

23 of 40

23

Neural Network:

(Before) Linear score function:

(Now) 2-layer Neural Network

or 3-layer Neural Network

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

24 of 40

24

Full implementation of training a 2-layer Neural Network needs ~11 lines:

from @iamtrask, http://iamtrask.github.io/2015/07/12/basic-python-network/

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

25 of 40

25

Assignment: Writing 2layer Net

Stage your forward/backward computation!

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

26 of 40

26

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

27 of 40

27

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

28 of 40

28

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 of 40

29

sigmoid activation function

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

30 of 40

30

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

31 of 40

Hubel and Wiesel demo.

31

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

32 of 40

32

Be very careful with your Brain analogies:

Biological Neurons:

  • Many different types
  • Dendrites can perform complex non-linear computations
  • Synapses are not a single weight but a complex non-linear dynamical system

[Dendritic Computation. London and Hausser]

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

33 of 40

33

Activation Functions

Sigmoid

tanh tanh(x)

ReLU max(0,x)

Leaky ReLU

max(0.1x, x)

Maxout

ELU

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

34 of 40

34

Neural Networks: Architectures

“Fully-connected” layers

“2-layer Neural Net”, or

“1-hidden-layer Neural Net”

“3-layer Neural Net”, or

“2-hidden-layer Neural Net”

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

35 of 40

35

Example Feed-forward computation of a Neural Network

We can efficiently evaluate an entire layer of neurons.

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

36 of 40

36

Example Feed-forward computation of a Neural Network

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

37 of 40

37

Setting the number of layers and their sizes

more neurons = more capacity

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

38 of 40

38

(you can play with this demo over at ConvNetJS: http://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html)

Do not use size of neural network as a regularizer. Use stronger regularization instead:

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

39 of 40

39

Summary

  • we arrange neurons into fully-connected layers
  • the abstraction of a layer has the nice property that it allows us to use efficient vectorized code (e.g. matrix multiplies)
  • neural networks are not really neural
  • neural networks: bigger = better (but might have to regularize more strongly)

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

40 of 40

40

reverse-mode differentiation (if you want effect of many things on one thing)

forward-mode differentiation (if you want effect of one thing on many things)

for many different x

for many different y

complex graph

inputs x

outputs y

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson