1 of 22

Backpropagation

TJ Machine Learning

TJ Machine Learning Club

Slide 1

2 of 22

Review

TJ Machine Learning Club

Slide 2

3 of 22

The Perceptron

  • Linear function followed by a nonlinear activation function

Weights

Bias

TJ Machine Learning Club

Slide 3

4 of 22

Today’s Goal

  • Understand how we find the best values for the weights and biases through gradient descent

Weights

Bias

TJ Machine Learning Club

Slide 4

5 of 22

Error for a very simple function

We can construct a graph of the loss

(3, 220)

TJ Machine Learning Club

Slide 5

6 of 22

Minimizing Error: Gradient Descent

TJ Machine Learning Club

Slide 6

7 of 22

The Intuitive Explanation

  • The ball will roll down a hill

Direction we push the weight

TJ Machine Learning Club

Slide 7

8 of 22

The Gradient

  • The gradient is the opposite of the direction that we push in, the direction of steepest ascent of loss

Gradients

(also called a derivative)

Subtraction gives us descent

TJ Machine Learning Club

Slide 8

9 of 22

The Gradient as Slope

TJ Machine Learning Club

Slide 9

10 of 22

The Intuitive Explanation

  • In a very high dimension space → each weight/bias is a different axis
  • Why not just find the global minimum? → Hard to do in high dimension spaces, too many possibilities

TJ Machine Learning Club

Slide 10

11 of 22

The Learning Rate

  • 𝛼 is the learning rate, and is generally a small positive value
  • It scales how big a step we make
    • Large Alpha = Big Step
    • Small Alpha = Small Step

TJ Machine Learning Club

Slide 11

12 of 22

Optimizing the Learning Rate

  • Getting the learning rate right is one of the most important parts of Neural Network training!

TJ Machine Learning Club

Slide 12

13 of 22

TJ Machine Learning Club

Slide 13

14 of 22

Calculating One Neural Network Iteration

n1

n2

n3

n4

n5

n6

W13 = 1

W36 = -1

W23 = 3

W14 = 4

W15 = 3

W24 = 2

W25 = -2

W56 = 2

W46 = -3

Linear Activation Function (y = x) and no biases, n1 = 3, n2 = -2, y = 9, α = 0.1

Goal: Update W36

W36 = W36 - α

TJ Machine Learning Club

Slide 14

15 of 22

Calculating One Neural Network Iteration

3

-2

-3

8

13

5

W13 = 1

W36 = -1

W23 = 3

W14 = 4

W15 = 3

W24 = 2

W25 = -2

W56 = 2

W46 = -3

Linear Activation Function (y = x) and no biases, n1 = 3, n2 = -2, y = 9, α = 0.1

Goal: Update W36

W36 = W36 - α

TJ Machine Learning Club

Slide 15

16 of 22

Calculating One Neural Network Iteration

3

-2

-3

8

13

5

W13 = 1

W36 = -1

W23 = 3

W14 = 4

W15 = 3

W24 = 2

W25 = -2

W56 = 2

W46 = -3

Linear Activation Function (y = x) and no biases, n1 = 3, n2 = -2, y = 9, α = 0.1

E = ½(n6 - y)2

n6 = W36n3 + W46n4 + W56n5

=

TJ Machine Learning Club

Slide 16

17 of 22

Calculating One Neural Network Iteration

3

-2

-3

8

13

5

W13 = 1

W36 = -1

W23 = 3

W14 = 4

W15 = 3

W24 = 2

W25 = -2

W56 = 2

W46 = -3

Linear Activation Function (y = x) and no biases, n1 = 3, n2 = -2, y = 9, α = 0.1

E = ½(n6 - y)2

n6 = W36n3 + W46n4 + W56n5

W36 = W36 - α

W36 = -1 - 0.1(12) = -1 - 1.2 = -2.2

= (n6 - y) * n3 = -4 * -3 = 12

=

TJ Machine Learning Club

Slide 17

18 of 22

Calculating One Neural Network Iteration

3

-2

-3

8

13

5

W13 = 1

W36 = -1

W23 = 3

W14 = 4

W15 = 3

W24 = 2

W25 = -2

W56 = 2

W46 = -3

Linear Activation Function (y = x) and no biases, n1 = 3, n2 = -2, y = 9, α = 0.1

Goal: Update W13

W13 = W13 - α

TJ Machine Learning Club

Slide 18

19 of 22

Calculating One Neural Network Iteration

3

-2

-3

8

13

5

W13 = 1

W36 = -1

W23 = 3

W14 = 4

W15 = 3

W24 = 2

W25 = -2

W56 = 2

W46 = -3

Linear Activation Function (y = x) and no biases, n1 = 3, n2 = -2, y = 9, α = 0.1

Goal: Update W13

W13 = W13 - α

=

TJ Machine Learning Club

Slide 19

20 of 22

Calculating One Neural Network Iteration

3

-2

-3

8

13

5

W13 = 1

W36 = -1

W23 = 3

W14 = 4

W15 = 3

W24 = 2

W25 = -2

W56 = 2

W46 = -3

Linear Activation Function (y = x) and no biases, n1 = 3, n2 = -2, y = 9, α = 0.1

Goal: Update W13

W13 = W13 - α

=

E = ½(n6 - y)2

n6 = W36n3 + W46n4 + W56n5

n3 = W13n1 + W23 + n2

TJ Machine Learning Club

Slide 20

21 of 22

Calculating One Neural Network Iteration

3

-2

-3

8

13

5

W13 = 1

W36 = -1

W23 = 3

W14 = 4

W15 = 3

W24 = 2

W25 = -2

W56 = 2

W46 = -3

Linear Activation Function (y = x) and no biases, n1 = 3, n2 = -2, y = 9, α = 0.1

Goal: Update W13

W13 = W13 - α

=

(n6 - y) * W36 * n1

=

(-4) * -1 * 3 = 12

W13 = 1 - 0.1 * 12 = -0.2

TJ Machine Learning Club

Slide 21

22 of 22

Calculating One Neural Network Iteration

3

-2

-3

8

13

5

W13 = -0.2

W36 = -2.2

W23 = 3

W14 = 4

W15 = 3

W24 = 2

W25 = -2

W56 = 2

W46 = -3

Linear Activation Function (y = x) and no biases, n1 = 3, n2 = -2, y = 9, α = 0.1

TJ Machine Learning Club

Slide 22