3 of 21

The Gradient

The gradient is the opposite of the direction that we push in, the direction of steepest ascent of loss

Gradients

(also called a derivative)

Subtraction gives us descent

Slide 3

TJ Machine Learning Club

4 of 21

Calculating One Neural Network Iteration

-2

-3

W₁₃ = 1

W₃₆ = -1

W₂₃ = 3

W₁₄ = 4

W₁₅ = 3

W₂₄ = 2

W₂₅ = -2

W₅₆ = 2

W₄₆ = -3

Linear Activation Function (y = x) and no biases, n₁ = 3, n₂ = -2, y = 9, α = 0.1

E = ½(n₆ - y)²

n₆ = W₃₆n₃ + W₄₆n₄ + W₅₆n₅

W₃₆ = W₃₆ - α

W₃₆ = -1 - 0.1(12) = -1 - 1.2 = -2.2

= (n₆ - y) * n₃ = -4 * -3 = 12

Slide 4

TJ Machine Learning Club

5 of 21

And so on…but how do we backpropagate in a CNN?

Slide 5

TJ Machine Learning Club

6 of 21

Let’s start with the Convolutional Layer:

X is the input from a previous layer
K is the Kernel
B is the Bias
Let L be the loss, assume we are given ∂L/∂Z

K, B

Slide 6

TJ Machine Learning Club

7 of 21

Let’s start with the Convolutional Layer:

Let’s start by finding ∂L/∂K

K, B

Z₁₁ = X₁₁K₁₁+ X₁₂K₁₂+ X₂₁K₂₁+ X₂₂K₂₂+ B

Z₁₂ = X₁₂K₁₁+ X₁₃K₁₂+ X₂₂K₂₁+ X₂₃K₂₂+ B

Z₂₁ = X₂₁K₁₁+ X₂₂K₁₂+ X₃₁K₂₁+ X₃₂K₂₂+ B

Z₂₂ = X₂₂K₁₁+ X₂₃K₁₂+ X₃₂K₂₁+ X₃₃K₂₂+ B

Slide 7

TJ Machine Learning Club

8 of 21

Slide 8

TJ Machine Learning Club

9 of 21

Let’s start with the Convolutional Layer:

Now let’s find ∂L/∂B

K, B

Z₁₁ = X₁₁K₁₁+ X₁₂K₁₂+ X₂₁K₂₁+ X₂₂K₂₂+ B

Z₁₂ = X₁₂K₁₁+ X₁₃K₁₂+ X₂₂K₂₁+ X₂₃K₂₂+ B

Z₂₁ = X₂₁K₁₁+ X₂₂K₁₂+ X₃₁K₂₁+ X₃₂K₂₂+ B

Z₂₂ = X₂₂K₁₁+ X₂₃K₁₂+ X₃₂K₂₁+ X₃₃K₂₂+ B

Slide 9

TJ Machine Learning Club

10 of 21

Slide 10

TJ Machine Learning Club

11 of 21

Let’s start with the Convolutional Layer:

Now let’s find ∂L/∂X

K, B

Z₁₁ = X₁₁K₁₁+ X₁₂K₁₂+ X₂₁K₂₁+ X₂₂K₂₂+ B

Z₁₂ = X₁₂K₁₁+ X₁₃K₁₂+ X₂₂K₂₁+ X₂₃K₂₂+ B

Z₂₁ = X₂₁K₁₁+ X₂₂K₁₂+ X₃₁K₂₁+ X₃₂K₂₂+ B

Z₂₂ = X₂₂K₁₁+ X₂₃K₁₂+ X₃₂K₂₁+ X₃₃K₂₂+ B

Slide 11

TJ Machine Learning Club

12 of 21

Slide 12

TJ Machine Learning Club

13 of 21

Slide 13

TJ Machine Learning Club

14 of 21

Alright, that was the hard part. Now the full network!

Let’s take this as our network:

Slide 14

TJ Machine Learning Club

15 of 21

Forward Propagation

Layer 1:

Z₁ = conv(Input, K₁) + B₁
C₁ = ReLU(Z₁)
P₁ = AvgPool(C₁)

Layer 2:

Z₂ = conv(P₁, K₂) + B₂
C₂ = ReLU(Z₂)
P₂ = MaxPool(C₂)

Layer 3:

f = Flatten(P₂)
Z₃= X₃ ・f + B₃
y_pred = sigmoid(Z₃)

Slide 15

TJ Machine Learning Club

16 of 21

Backpropagation - Layer 3

Layer 3:

f = Flatten(P₂)
Z₃= X₃ ・f + B₃
y_pred = sigmoid(Z₃)

This is a dense Neural Network with no hidden layers. Using normal backpropagation, you get:

∂L/∂Z₃ = y_pred - y

∂L/∂W₃ = dZ₃・ f^T

∂L/∂B₃ = dZ₃

∂L/∂f = w₃^T・dZ₃

Slide 16

TJ Machine Learning Club

17 of 21

Backpropagation - Layer 2

Layer 2:

Z₂ = conv(P₁, K₂) + B₂
C₂ = ReLU(Z₂)
P₂ = MaxPool(C₂)

Need to find:

∂L/∂P₂
∂L/∂C₂
∂L/∂Z₂

Slide 17

TJ Machine Learning Club

18 of 21

Backpropagation - Layer 2

Layer 2:

Z₂ = conv(P₁, K₂) + B₂
C₂ = ReLU(Z₂)
P₂ = MaxPool(C₂)

∂L/∂P₂ = ∂L/∂f reshaped
∂L/∂C_2,ij = ∂L/∂P_2,xyif C_2,ij is max, otherwise 0
∂L/∂Z₂= ∂L/∂C₂・∂C₂/∂Z₂

∂C₂/∂Z_2,mn= 1 if Z_mn> 0, 0 otherwise

∂L/∂K₂ = conv(P₁, ∂L/∂Z₂)
∂L/∂B₂= sum(∂L/∂Z₂)

Slide 18

TJ Machine Learning Club

19 of 21

Backpropagation - Layer 1

Layer 1:

Z₁ = conv(Input, K₁) + B₁
C₁ = ReLU(Z₁)
P₁ = AvgPool(C₁)

Need to find:

∂L/∂P₁
∂L/∂C₁
∂L/∂Z₁

Slide 19

TJ Machine Learning Club

20 of 21

Backpropagation - Layer 1

Layer 1:

Z₁ = conv(Input, K₁) + B₁
C₁ = ReLU(Z₁)
P₁ = AvgPool(C₁)

∂L/∂P₁ = conv(zero_pad(∂L/∂Z₂), inverted(K₂))
∂L/∂C₁_,ij = 1/(i*j) * ∂L/∂P_1,xy
∂L/∂Z₁= ∂L/∂C₁・∂C₁/∂Z₁

∂C₁/∂Z₁_,mn= 1 if Z_mn> 0, 0 otherwise

∂L/∂K₁ = conv(Input, ∂L/∂Z₁)
∂L/∂B₁= sum(∂L/∂Z₁)

Slide 20

TJ Machine Learning Club

1 of 21

2 of 21

3 of 21

4 of 21

5 of 21

6 of 21

7 of 21

8 of 21

9 of 21

10 of 21

11 of 21

12 of 21

13 of 21

14 of 21

15 of 21

16 of 21

17 of 21

18 of 21

19 of 21

20 of 21

21 of 21