1 of 80

Building Blocks

of

Deep Learning

Aaditya Prakash

2 of 80

Scream

Shipwreck�of minataur

Udnie

Wave

Rain�Princess

Art Lives Forever

3 of 80

Munch

Turner

Picabia

Hokusai

Afremov

Art Lives Forever

ARTIST

4 of 80

5 of 80

What we will Learn

Learning

Process

Neural�Networks

Convolution

Networks

6 of 80

Textbookhttp://www.deeplearningbook.org

Course

http://cs231n.stanford.edu

Quick Guide

http://neuralnetworksanddeeplearning.com

Youtube: 3Blue1Brown

7 of 80

Machine Learning recap

Apply a prediction function to a feature representation of given input (x) and get the desired output.

f ( ) = “apple”

f ( ) = “bird”

f ( ) = “bike”

8 of 80

Machine Learning recap

Training Data model

output

x

y

~

f

y -y

~

error

L(y , y)

~

Loss/Objective

Loss function L quantifies how unhappy you would be if you used ‘f’ to make predictions on x. �It is the objective we want to minimize

f (x)

y

~

9 of 80

f

Loss( f ( xi ), yi )

Machine Learning recap

Training Data model

output

x

y

~

f

Find ‘f’ such that it minimizes the ‘training loss’, and hope that this is also true for ‘test’ loss.

TrainLoss =

i

Model = argmin TrainLoss

10 of 80

f

W ← W - η∇w TrainLoss ( f )

Machine Learning recap

Training Data model

output

x

y

~

f

Process of updating ‘weights’ like this is called gradient descent

Model = argmin TrainLoss

11 of 80

Gradient Descent

Loss

w1

w2

12 of 80

1.

Artificial Neural Network

13 of 80

1.

Artificial Neural Network

Organic is overrated

14 of 80

Neuron

15 of 80

Artificial Neuron

16 of 80

Artificial Neuron

17 of 80

Artificial Neuron

18 of 80

Artificial Neural Networks

x

y

19 of 80

x

y

W

Artificial Neural Networks

20 of 80

x

y

W

F (

,

)

Artificial Neural Networks

21 of 80

x

y

W

F (

,

)

  • b

Artificial Neural Networks

22 of 80

Image Classification

Choose among the following ---

Cairn terrier (b) Norwich terrier (c) Australian terrier

23 of 80

Image Classification

Choose among the following ---

Cairn terrier (b) Norwich terrier (c) Australian terrier

24 of 80

Using Magic

Neural networks and backpropagation

25 of 80

Back Propagation

Forward

pass

Compute

error

Update weights

26 of 80

Back Propagation

Forward�Pass

27 of 80

Forward Pass

28 of 80

Forward Pass

29 of 80

Forward Pass

30 of 80

Forward Pass

31 of 80

Forward Pass

32 of 80

Forward Pass

33 of 80

Forward Pass

34 of 80

Back Propagation

Compute

error

35 of 80

Back Propagation

Derivative Chain Rule

36 of 80

Back Propagation

Derivative Chain Rule

37 of 80

Back Propagation

Derivative Chain Rule

38 of 80

Back Propagation

Derivative Chain Rule

39 of 80

Back Propagation

Derivative Chain Rule

40 of 80

Back Propagation

Derivative Chain Rule

41 of 80

Back Propagation

Derivative Chain Rule

42 of 80

Compute Error

43 of 80

Compute Error

44 of 80

Compute Error

45 of 80

Compute Error

46 of 80

Compute Error

47 of 80

Back Propagation

Update weights

48 of 80

Back Propagation

Update weights

49 of 80

Back Propagation

Update weights

50 of 80

Update Weights

51 of 80

Update Weights

52 of 80

Update Weights

53 of 80

  1. Random weights
  2. Forward Pass
  3. Compute Cost
  4. Backward Pass
  5. Update weights

��N. Profit???

Summary

...

54 of 80

  • Random weights
  • Forward Pass
  • Compute Cost
  • Backward Pass
  • Update weights

��N. Profit???

Summary

...

55 of 80

Learning Rate

Loss

Loss

56 of 80

Learning Rate

Loss

Loss

57 of 80

Gradient Descent

Loss

w1

w2

58 of 80

Gradient Descent

59 of 80

Stochastic Gradient Descent

Mini-batch weight update #1

Mini-batch weight update #2

60 of 80

Stochastic Gradient Descent with Momentum

61 of 80

Stochastic Gradient Descent

Credit: Sebastian Rudder

62 of 80

Initializations

  • Uniform
  • Gaussian
  • TrunNorm
  • GlorotUni
  • Xavier
  • Kaiming

Choices

Activation

  • TanH
  • Sigmoid
  • ReLU
  • PreLU
  • LReLU
  • Swish

Optimizers

  • SGD
  • Momentum
  • RMSProp
  • ADAGRAD
  • ADADELTA
  • Adam

Cost

  • Cross entropy
  • MSE
  • L1 loss
  • KL Div
  • JS Div

63 of 80

3.

Convolution Networks

64 of 80

65 of 80

Convolution

66 of 80

Convolution with Padding

67 of 80

Convolution with Padding + Strides

68 of 80

Deconvolution / Convolution Transposed

69 of 80

Max Pool

Size: 2x2, Stride: 2

70 of 80

Geoff Hinton on Pooling----

The pooling operation used in convolutional neural networks is a big mistake and the fact that it works so well is a disaster.

If the pools do not overlap, pooling loses valuable information about where things are. We need this information to detect precise relationships between the parts of an object.

71 of 80

Convolutional Neural Network

72 of 80

How many layers do I need?

But what do the layers do?

73 of 80

Separating Data

74 of 80

75 of 80

76 of 80

77 of 80

78 of 80

ummary

79 of 80

Selfie ! �<= or =>

80 of 80

thanks!

Credits

From Tensorflow

import *

Any questions?

� You can find me at

iamaaditya.github.io

aprakash@brandeis.edu