1 of 40

Lecture 6:

Neural Networks part 2

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

2 of 40

Optional help session this Friday

Vector, Matrix, and Tensor Derivatives + the Chain Rule
9am CS building

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

3 of 40

activations

gradients

“local gradient”

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

4 of 40

Implementation: forward/backward API

Graph (or Net) object. (Rough pseudo code)

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

5 of 40

Implementation: forward/backward API

(x,y,z are scalars)

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

6 of 40

Implementation: forward/backward API

(x,y,z are scalars)

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

7 of 40

Vectorized operations

f(x) = max(0,x)

(elementwise)

4096-d

input vector

4096-d

output vector

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

8 of 40

Vectorized operations

f(x) = max(0,x)

(elementwise)

4096-d

input vector

4096-d

output vector

Q: what is the size of the Jacobian matrix?

Jacobian matrix

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

9 of 40

Gradients for vectorized code

“local gradient”

This is now the Jacobian matrix (derivative of each element of z w.r.t. each element of x)

(x,y,z are now vectors)

gradients

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

10 of 40

max(0,x)

(elementwise)

4096-d

input vector

4096-d

output vector

Q: what is the size of the Jacobian matrix?

[4096 x 4096!]

Q2: what does it look like?

Vectorized operations

Jacobian matrix

f(x) = max(0,x)

(elementwise)

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

11 of 40

Aside: Image Features

12 of 40

Example: Color (Hue) Histogram

hue bins

13 of 40

Example: HOG/SIFT features

8x8 pixel region,

quantize the edge orientation into 9 bins

(image from vlfeat.org)

14 of 40

Example: HOG/SIFT features

8x8 pixel region,

quantize the edge orientation into 9 bins

(image from vlfeat.org)

Many more:

GIST, LBP, Texton, SSIM, ...

15 of 40

Jointly learning about edges and colors

16 of 40

Example: Bag of Words

144

visual word vectors

learn k-means centroids

“vocabulary” of visual words

e.g. 1000 centroids

1000-d vector

histogram of visual words

17 of 40

[32x32x3]

10 numbers, indicating class scores

Feature Extraction

vector describing various image statistics

[32x32x3]

10 numbers, indicating class scores

training

18 of 40

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

19 of 40

Neural Network:

(Before) Linear score function:

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

20 of 40

Neural Network:

(Before) Linear score function:

(Now) 2-layer Neural Network

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

21 of 40

Neural Network:

(Before) Linear score function:

(Now) 2-layer Neural Network

3072

100

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

22 of 40

Neural Network:

(Before) Linear score function:

(Now) 2-layer Neural Network

3072

100

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

23 of 40

Neural Network:

(Before) Linear score function:

(Now) 2-layer Neural Network

or 3-layer Neural Network

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

24 of 40

Full implementation of training a 2-layer Neural Network needs ~11 lines:

from @iamtrask, http://iamtrask.github.io/2015/07/12/basic-python-network/

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

25 of 40

Assignment: Writing 2layer Net

Stage your forward/backward computation!

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

26 of 40

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

27 of 40

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

28 of 40

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 of 40

sigmoid activation function

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

30 of 40

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

31 of 40

Hubel and Wiesel demo.

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

32 of 40

Be very careful with your Brain analogies:

Biological Neurons:

Many different types
Dendrites can perform complex non-linear computations
Synapses are not a single weight but a complex non-linear dynamical system

[Dendritic Computation. London and Hausser]

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

33 of 40

Activation Functions

Sigmoid

tanh tanh(x)

ReLU max(0,x)

Leaky ReLU

max(0.1x, x)

Maxout

ELU

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

34 of 40

Neural Networks: Architectures

“Fully-connected” layers

“2-layer Neural Net”, or

“1-hidden-layer Neural Net”

“3-layer Neural Net”, or

“2-hidden-layer Neural Net”

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

35 of 40

Example Feed-forward computation of a Neural Network

We can efficiently evaluate an entire layer of neurons.

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

36 of 40

Example Feed-forward computation of a Neural Network

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

37 of 40

Setting the number of layers and their sizes

more neurons = more capacity

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

38 of 40

(you can play with this demo over at ConvNetJS: http://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html)

Do not use size of neural network as a regularizer. Use stronger regularization instead:

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

39 of 40

Summary

we arrange neurons into fully-connected layers
the abstraction of a layer has the nice property that it allows us to use efficient vectorized code (e.g. matrix multiplies)
neural networks are not really neural
neural networks: bigger = better (but might have to regularize more strongly)

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

40 of 40

reverse-mode differentiation (if you want effect of many things on one thing)

forward-mode differentiation (if you want effect of one thing on many things)

for many different x

for many different y

complex graph

inputs x

outputs y

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 6 -

Sept 19, 2019

Lecture 6 -

19 Sep 2019

Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson