1
Lecture 6:
Neural Networks part 2
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Optional help session this Friday
2
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
3
f
activations
gradients
“local gradient”
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
4
Implementation: forward/backward API
Graph (or Net) object. (Rough pseudo code)
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
5
Implementation: forward/backward API
(x,y,z are scalars)
*
x
y
z
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
6
Implementation: forward/backward API
(x,y,z are scalars)
*
x
y
z
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
7
Vectorized operations
f(x) = max(0,x)
(elementwise)
4096-d
input vector
4096-d
output vector
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
8
Vectorized operations
f(x) = max(0,x)
(elementwise)
4096-d
input vector
4096-d
output vector
Q: what is the size of the Jacobian matrix?
Jacobian matrix
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
9
Gradients for vectorized code
f
“local gradient”
This is now the Jacobian matrix (derivative of each element of z w.r.t. each element of x)
(x,y,z are now vectors)
gradients
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
10
max(0,x)
(elementwise)
4096-d
input vector
4096-d
output vector
Q: what is the size of the Jacobian matrix?
[4096 x 4096!]
Q2: what does it look like?
Vectorized operations
Jacobian matrix
f(x) = max(0,x)
(elementwise)
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
11
Aside: Image Features
12
Example: Color (Hue) Histogram
hue bins
+1
13
Example: HOG/SIFT features
8x8 pixel region,
quantize the edge orientation into 9 bins
(image from vlfeat.org)
14
Example: HOG/SIFT features
8x8 pixel region,
quantize the edge orientation into 9 bins
(image from vlfeat.org)
Many more:
GIST, LBP, Texton, SSIM, ...
15
Jointly learning about edges and colors
16
Example: Bag of Words
144
visual word vectors
learn k-means centroids
“vocabulary” of visual words
e.g. 1000 centroids
1000-d vector
1000-d vector
1000-d vector
histogram of visual words
17
[32x32x3]
f
10 numbers, indicating class scores
Feature Extraction
vector describing various image statistics
[32x32x3]
f
10 numbers, indicating class scores
training
training
18
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
19
Neural Network:
(Before) Linear score function:
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
20
Neural Network:
(Before) Linear score function:
(Now) 2-layer Neural Network
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
21
Neural Network:
(Before) Linear score function:
(Now) 2-layer Neural Network
x
h
W1
s
W2
3072
100
10
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
22
Neural Network:
(Before) Linear score function:
(Now) 2-layer Neural Network
x
h
W1
s
W2
3072
100
10
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
23
Neural Network:
(Before) Linear score function:
(Now) 2-layer Neural Network
or 3-layer Neural Network
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
24
Full implementation of training a 2-layer Neural Network needs ~11 lines:
from @iamtrask, http://iamtrask.github.io/2015/07/12/basic-python-network/
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
25
Assignment: Writing 2layer Net
Stage your forward/backward computation!
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
26
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
27
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
28
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
29
sigmoid activation function
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
30
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Hubel and Wiesel demo.
31
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
32
Be very careful with your Brain analogies:
Biological Neurons:
[Dendritic Computation. London and Hausser]
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
33
Activation Functions
Sigmoid
tanh tanh(x)
ReLU max(0,x)
Leaky ReLU
max(0.1x, x)
Maxout
ELU
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
34
Neural Networks: Architectures
“Fully-connected” layers
“2-layer Neural Net”, or
“1-hidden-layer Neural Net”
“3-layer Neural Net”, or
“2-hidden-layer Neural Net”
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
35
Example Feed-forward computation of a Neural Network
We can efficiently evaluate an entire layer of neurons.
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
36
Example Feed-forward computation of a Neural Network
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
37
Setting the number of layers and their sizes
more neurons = more capacity
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
38
(you can play with this demo over at ConvNetJS: http://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html)
Do not use size of neural network as a regularizer. Use stronger regularization instead:
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
39
Summary
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
40
reverse-mode differentiation (if you want effect of many things on one thing)
forward-mode differentiation (if you want effect of one thing on many things)
for many different x
for many different y
complex graph
inputs x
outputs y
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 6 -
Sept 19, 2019
Lecture 6 -
19 Sep 2019
Erik Learned-Miller and TAs�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson