1 of 80

Building Blocks

Deep Learning

Aaditya Prakash

April 9, 2018

Online version link (with animations)

2 of 80

Scream

Shipwreck�of minataur

Udnie

Wave

Rain�Princess

Art Lives Forever

3 of 80

Munch

Turner

Picabia

Hokusai

Afremov

Art Lives Forever

ARTIST

5 of 80

What we will Learn

Learning

Process

Neural�Networks

Convolution

Networks

6 of 80

Textbook�http://www.deeplearningbook.org

Course

http://cs231n.stanford.edu

Quick Guide

http://neuralnetworksanddeeplearning.com

Youtube: 3Blue1Brown�

7 of 80

Machine Learning recap

Apply a prediction function to a feature representation of given input (x) and get the desired output.

f ( ) = “apple”

f ( ) = “bird”

f ( ) = “bike”

8 of 80

Machine Learning recap

Training Data model

output

y -y

error

L(y , y)

Loss/Objective

Loss function L quantifies how unhappy you would be if you used ‘f’ to make predictions on x. �It is the objective we want to minimize

f (x)

9 of 80

Loss( f ( x_i), y_i )

Machine Learning recap

Training Data model

output

Find ‘f’ such that it minimizes the ‘training loss’, and hope that this is also true for ‘test’ loss.

TrainLoss =

Model = argmin TrainLoss

10 of 80

W ← W - η∇_w TrainLoss ( f )

Machine Learning recap

Training Data model

output

Process of updating ‘weights’ like this is called gradient descent

Model = argmin TrainLoss

11 of 80

Gradient Descent

Loss

w₁

w₂

12 of 80

Artificial Neural Network

13 of 80

Artificial Neural Network

Organic is overrated

15 of 80

Artificial Neuron

16 of 80

Artificial Neuron

17 of 80

Artificial Neuron

18 of 80

Artificial Neural Networks

19 of 80

Artificial Neural Networks

20 of 80

F (

)

Artificial Neural Networks

21 of 80

F (

)

Artificial Neural Networks

22 of 80

Image Classification

Choose among the following ---

Cairn terrier (b) Norwich terrier (c) Australian terrier

23 of 80

Image Classification

Choose among the following ---

Cairn terrier (b) Norwich terrier (c) Australian terrier

24 of 80

Using Magic

Neural networks and backpropagation

25 of 80

Back Propagation

Forward

pass

Compute

error

Update weights

26 of 80

Back Propagation

Forward�Pass

27 of 80

Forward Pass

28 of 80

Forward Pass

29 of 80

Forward Pass

30 of 80

Forward Pass

31 of 80

Forward Pass

32 of 80

Forward Pass

33 of 80

Forward Pass

34 of 80

Back Propagation

Compute

error

35 of 80

Back Propagation

Derivative Chain Rule

36 of 80

Back Propagation

Derivative Chain Rule

37 of 80

Back Propagation

Derivative Chain Rule

38 of 80

Back Propagation

Derivative Chain Rule

39 of 80

Back Propagation

Derivative Chain Rule

40 of 80

Back Propagation

Derivative Chain Rule

41 of 80

Back Propagation

Derivative Chain Rule

42 of 80

Compute Error

43 of 80

Compute Error

44 of 80

Compute Error

45 of 80

Compute Error

46 of 80

Compute Error

47 of 80

Back Propagation

Update weights

48 of 80

Back Propagation

Update weights

49 of 80

Back Propagation

Update weights

50 of 80

Update Weights

51 of 80

Update Weights

52 of 80

Update Weights

53 of 80

Random weights
Forward Pass
Compute Cost
Backward Pass
Update weights

��N. Profit???

Summary

...

54 of 80

Random weights
Forward Pass
Compute Cost
Backward Pass
Update weights

��N. Profit???

Summary

...

55 of 80

Learning Rate

Loss

56 of 80

Learning Rate

Loss

57 of 80

Gradient Descent

Loss

w₁

w₂

58 of 80

Gradient Descent

59 of 80

Stochastic Gradient Descent

Mini-batch weight update #1

Mini-batch weight update #2

60 of 80

Stochastic Gradient Descent with Momentum

Demo: Why Momentum really works?

61 of 80

Stochastic Gradient Descent

Credit: Sebastian Rudder

62 of 80

Initializations

Uniform
Gaussian
TrunNorm
GlorotUni
Xavier
Kaiming

Choices

Activation

TanH
Sigmoid
ReLU
PreLU
LReLU
Swish

Optimizers

SGD
Momentum
RMSProp
ADAGRAD
ADADELTA
Adam

Cost

Cross entropy
MSE
L1 loss
KL Div
JS Div

63 of 80

Convolution Networks

64 of 80

Demo Convolution

65 of 80

Convolution

66 of 80

Convolution with Padding

67 of 80

Convolution with Padding + Strides

68 of 80

Deconvolution / Convolution Transposed

69 of 80

Max Pool

Size: 2x2, Stride: 2

70 of 80

Geoff Hinton on Pooling----

The pooling operation used in convolutional neural networks is a big mistake and the fact that it works so well is a disaster.

If the pools do not overlap, pooling loses valuable information about where things are. We need this information to detect precise relationships between the parts of an object.

71 of 80

Convolutional Neural Network

72 of 80

How many layers do I need?

But what do the layers do?

“

73 of 80

Separating Data

79 of 80

Selfie ! �<= or =>

80 of 80

thanks!

Credits

From Tensorflow

import *

Any questions?

� You can find me at

iamaaditya.github.io

aprakash@brandeis.edu

Presentation template by SlidesCarnival
Awesome diagrams on CNN by http://colah.github.io/
Formulas from, you know where, wikipedia
Selfie http://karpathy.github.io/2015/10/25/selfie/
Neural Networks and Deep Learning
Going Deeper - GoogLeNet
Back propagation images with formulas, Mariusz Bernacki
Geoff Hinton, guy who started all this -- NIPS 2015 tutorial
Yann LeCun, guy who made it popular, Deep Representations
CS231n Lecture Slides
Nervana presentation on DL

1 of 80

2 of 80

3 of 80

4 of 80

5 of 80

6 of 80

7 of 80

8 of 80

9 of 80

10 of 80

11 of 80

12 of 80

13 of 80

14 of 80

15 of 80

16 of 80

17 of 80

18 of 80

19 of 80

20 of 80

21 of 80

22 of 80

23 of 80

24 of 80

25 of 80

26 of 80

27 of 80

28 of 80

29 of 80

30 of 80

31 of 80

32 of 80

33 of 80

34 of 80

35 of 80

36 of 80

37 of 80

38 of 80

39 of 80

40 of 80

41 of 80

42 of 80

43 of 80

44 of 80

45 of 80

46 of 80

47 of 80

48 of 80

49 of 80

50 of 80

51 of 80

52 of 80

53 of 80

54 of 80

55 of 80

56 of 80

57 of 80

58 of 80

59 of 80

60 of 80

61 of 80

62 of 80

63 of 80

64 of 80

65 of 80

66 of 80

67 of 80

68 of 80

69 of 80

70 of 80

71 of 80

72 of 80

73 of 80

74 of 80

75 of 80

76 of 80

77 of 80

78 of 80

79 of 80

80 of 80