Building Blocks
of
Deep Learning
Aaditya Prakash
April 9, 2018
Scream
Shipwreck�of minataur
Udnie
Wave
Rain�Princess
Art Lives Forever
Munch
Turner
Picabia
Hokusai
Afremov
Art Lives Forever
ARTIST
What we will Learn
Learning
Process
Neural�Networks
Convolution
Networks
Textbook�http://www.deeplearningbook.org
Course
http://cs231n.stanford.edu
Machine Learning recap
Apply a prediction function to a feature representation of given input (x) and get the desired output.
f ( ) = “apple”
f ( ) = “bird”
f ( ) = “bike”
Machine Learning recap
Training Data model
output
x
y
~
f
y -y
~
error
L(y , y)
~
Loss/Objective
Loss function L quantifies how unhappy you would be if you used ‘f’ to make predictions on x. �It is the objective we want to minimize
f (x)
y
~
f
Loss( f ( xi ), yi )
Machine Learning recap
Training Data model
output
x
y
~
f
Find ‘f’ such that it minimizes the ‘training loss’, and hope that this is also true for ‘test’ loss.
TrainLoss =
i
Model = argmin TrainLoss
f
W ← W - η∇w TrainLoss ( f )
Machine Learning recap
Training Data model
output
x
y
~
f
Process of updating ‘weights’ like this is called gradient descent
Model = argmin TrainLoss
Gradient Descent
Loss
w1
w2
1.
Artificial Neural Network
1.
Artificial Neural Network
Organic is overrated
Neuron
Artificial Neuron
Artificial Neuron
Artificial Neuron
Artificial Neural Networks
x
y
x
y
W
Artificial Neural Networks
x
y
W
F (
,
)
Artificial Neural Networks
x
y
W
F (
,
)
Artificial Neural Networks
Image Classification
Choose among the following ---
Cairn terrier (b) Norwich terrier (c) Australian terrier
Image Classification
Choose among the following ---
Cairn terrier (b) Norwich terrier (c) Australian terrier
Using Magic
Neural networks and backpropagation
Back Propagation
Forward
pass
Compute
error
Update weights
Back Propagation
Forward�Pass
Forward Pass
Forward Pass
Forward Pass
Forward Pass
Forward Pass
Forward Pass
Forward Pass
Back Propagation
Compute
error
Back Propagation
Derivative Chain Rule
Back Propagation
Derivative Chain Rule
Back Propagation
Derivative Chain Rule
Back Propagation
Derivative Chain Rule
Back Propagation
Derivative Chain Rule
Back Propagation
Derivative Chain Rule
Back Propagation
Derivative Chain Rule
Compute Error
Compute Error
Compute Error
Compute Error
Compute Error
Back Propagation
Update weights
Back Propagation
Update weights
Back Propagation
Update weights
Update Weights
Update Weights
Update Weights
��N. Profit???
Summary
...
��N. Profit???
Summary
...
Learning Rate
Loss
Loss
Learning Rate
Loss
Loss
Gradient Descent
Loss
w1
w2
Gradient Descent
Stochastic Gradient Descent
Mini-batch weight update #1
Mini-batch weight update #2
Stochastic Gradient Descent with Momentum
Stochastic Gradient Descent
Credit: Sebastian Rudder
Initializations
Choices
Activation
Optimizers
Cost
3.
Convolution Networks
Convolution
Convolution with Padding
Convolution with Padding + Strides
Deconvolution / Convolution Transposed
Max Pool
Size: 2x2, Stride: 2
Geoff Hinton on Pooling----
The pooling operation used in convolutional neural networks is a big mistake and the fact that it works so well is a disaster.
If the pools do not overlap, pooling loses valuable information about where things are. We need this information to detect precise relationships between the parts of an object.
Convolutional Neural Network
How many layers do I need?
But what do the layers do?
“
Separating Data
ummary
Selfie ! �<= or =>
thanks!
Credits
From Tensorflow
import *
Any questions?
� You can find me at
iamaaditya.github.io
aprakash@brandeis.edu