1 of 21

CNNs: Backpropagation

TJ Machine Learning

Slide 1

TJ Machine Learning Club

2 of 21

Review: Backpropagation in ANNs

Slide 2

TJ Machine Learning Club

3 of 21

The Gradient

  • The gradient is the opposite of the direction that we push in, the direction of steepest ascent of loss

Gradients

(also called a derivative)

Subtraction gives us descent

Slide 3

TJ Machine Learning Club

4 of 21

Calculating One Neural Network Iteration

3

-2

-3

8

13

5

W13 = 1

W36 = -1

W23 = 3

W14 = 4

W15 = 3

W24 = 2

W25 = -2

W56 = 2

W46 = -3

Linear Activation Function (y = x) and no biases, n1 = 3, n2 = -2, y = 9, α = 0.1

E = ½(n6 - y)2

n6 = W36n3 + W46n4 + W56n5

W36 = W36 - α

W36 = -1 - 0.1(12) = -1 - 1.2 = -2.2

= (n6 - y) * n3 = -4 * -3 = 12

=

Slide 4

TJ Machine Learning Club

5 of 21

And so on…but how do we backpropagate in a CNN?

Slide 5

TJ Machine Learning Club

6 of 21

Let’s start with the Convolutional Layer:

  • X is the input from a previous layer
  • K is the Kernel
  • B is the Bias
  • Let L be the loss, assume we are given ∂L/∂Z

X

K, B

*

Z

Slide 6

TJ Machine Learning Club

7 of 21

Let’s start with the Convolutional Layer:

Let’s start by finding ∂L/∂K

X

K, B

*

Z

Z11 = X11K11+ X12K12+ X21K21+ X22K22+ B

Z12 = X12K11+ X13K12+ X22K21+ X23K22+ B

Z21 = X21K11+ X22K12+ X31K21+ X32K22+ B

Z22 = X22K11+ X23K12+ X32K21+ X33K22+ B

Slide 7

TJ Machine Learning Club

8 of 21

Slide 8

TJ Machine Learning Club

9 of 21

Let’s start with the Convolutional Layer:

Now let’s find ∂L/∂B

X

K, B

*

Z

Z11 = X11K11+ X12K12+ X21K21+ X22K22+ B

Z12 = X12K11+ X13K12+ X22K21+ X23K22+ B

Z21 = X21K11+ X22K12+ X31K21+ X32K22+ B

Z22 = X22K11+ X23K12+ X32K21+ X33K22+ B

Slide 9

TJ Machine Learning Club

10 of 21

Slide 10

TJ Machine Learning Club

11 of 21

Let’s start with the Convolutional Layer:

Now let’s find ∂L/∂X

X

K, B

*

Z

Z11 = X11K11+ X12K12+ X21K21+ X22K22+ B

Z12 = X12K11+ X13K12+ X22K21+ X23K22+ B

Z21 = X21K11+ X22K12+ X31K21+ X32K22+ B

Z22 = X22K11+ X23K12+ X32K21+ X33K22+ B

Slide 11

TJ Machine Learning Club

12 of 21

Slide 12

TJ Machine Learning Club

13 of 21

Slide 13

TJ Machine Learning Club

14 of 21

Alright, that was the hard part. Now the full network!

Let’s take this as our network:

Slide 14

TJ Machine Learning Club

15 of 21

Forward Propagation

Layer 1:

  • Z1 = conv(Input, K1) + B1
  • C1 = ReLU(Z1)
  • P1 = AvgPool(C1)

Layer 2:

  • Z2 = conv(P1, K2) + B2
  • C2 = ReLU(Z2)
  • P2 = MaxPool(C2)

Layer 3:

  • f = Flatten(P2)
  • Z3 = X3 ・f + B3
  • y_pred = sigmoid(Z3)

Slide 15

TJ Machine Learning Club

16 of 21

Backpropagation - Layer 3

Layer 3:

  • f = Flatten(P2)
  • Z3 = X3 ・f + B3
  • y_pred = sigmoid(Z3)

This is a dense Neural Network with no hidden layers. Using normal backpropagation, you get:

∂L/∂Z3 = y_pred - y

∂L/∂W3 = dZ3 fT

∂L/∂B3 = dZ3

∂L/∂f = w3TdZ3

Slide 16

TJ Machine Learning Club

17 of 21

Backpropagation - Layer 2

Layer 2:

  • Z2 = conv(P1, K2) + B2
  • C2 = ReLU(Z2)
  • P2 = MaxPool(C2)

Need to find:

  • ∂L/∂P2
  • ∂L/∂C2
  • ∂L/∂Z2

Slide 17

TJ Machine Learning Club

18 of 21

Backpropagation - Layer 2

Layer 2:

  • Z2 = conv(P1, K2) + B2
  • C2 = ReLU(Z2)
  • P2 = MaxPool(C2)
  • ∂L/∂P2 = ∂L/∂f reshaped
  • ∂L/∂C2,ij = ∂L/∂P2,xy if C2,ij is max, otherwise 0
  • ∂L/∂Z2= ∂L/∂C2・∂C2/∂Z2
    • ∂C2/∂Z2,mn= 1 if Zmn> 0, 0 otherwise
  • ∂L/∂K2 = conv(P1, ∂L/∂Z2)
  • ∂L/∂B2 = sum(∂L/∂Z2)

Slide 18

TJ Machine Learning Club

19 of 21

Backpropagation - Layer 1

Layer 1:

  • Z1 = conv(Input, K1) + B1
  • C1 = ReLU(Z1)
  • P1 = AvgPool(C1)

Need to find:

  • ∂L/∂P1
  • ∂L/∂C1
  • ∂L/∂Z1

Slide 19

TJ Machine Learning Club

20 of 21

Backpropagation - Layer 1

Layer 1:

  • Z1 = conv(Input, K1) + B1
  • C1 = ReLU(Z1)
  • P1 = AvgPool(C1)
  • ∂L/∂P1 = conv(zero_pad(∂L/∂Z2), inverted(K2))
  • ∂L/∂C1,ij = 1/(i*j) * ∂L/∂P1,xy
  • ∂L/∂Z1= ∂L/∂C1・∂C1/∂Z1
    • ∂C1/∂Z1,mn= 1 if Zmn> 0, 0 otherwise
  • ∂L/∂K1 = conv(Input, ∂L/∂Z1)
  • ∂L/∂B1 = sum(∂L/∂Z1)

Slide 20

TJ Machine Learning Club

21 of 21

Useful Sources

  • Carnegie Mellon Presentation
  • Coding Lane Videos
    • Part 1
    • Part 2

Slide 21

TJ Machine Learning Club