3 of 22

The Perceptron

Linear function followed by a nonlinear activation function

Weights

Bias

TJ Machine Learning Club

Slide 3

4 of 22

Today’s Goal

Understand how we find the best values for the weights and biases through gradient descent

Weights

Bias

TJ Machine Learning Club

Slide 4

5 of 22

Error for a very simple function

We can construct a graph of the loss

(3, 220)

TJ Machine Learning Club

Slide 5

6 of 22

Minimizing Error: Gradient Descent

TJ Machine Learning Club

Slide 6

7 of 22

The Intuitive Explanation

The ball will roll down a hill

Direction we push the weight

TJ Machine Learning Club

Slide 7

8 of 22

The Gradient

The gradient is the opposite of the direction that we push in, the direction of steepest ascent of loss

Gradients

(also called a derivative)

Subtraction gives us descent

TJ Machine Learning Club

Slide 8

9 of 22

The Gradient as Slope

TJ Machine Learning Club

Slide 9

10 of 22

The Intuitive Explanation

In a very high dimension space → each weight/bias is a different axis
Why not just find the global minimum? → Hard to do in high dimension spaces, too many possibilities

TJ Machine Learning Club

Slide 10

11 of 22

The Learning Rate

𝛼 is the learning rate, and is generally a small positive value
It scales how big a step we make

Large Alpha = Big Step
Small Alpha = Small Step

TJ Machine Learning Club

Slide 11

12 of 22

Optimizing the Learning Rate

Getting the learning rate right is one of the most important parts of Neural Network training!

TJ Machine Learning Club

Slide 12

13 of 22

TJ Machine Learning Club

Slide 13

14 of 22

Calculating One Neural Network Iteration

n₁

n₂

n₃

n₄

n₅

n₆

W₁₃ = 1

W₃₆ = -1

W₂₃ = 3

W₁₄ = 4

W₁₅ = 3

W₂₄ = 2

W₂₅ = -2

W₅₆ = 2

W₄₆ = -3

Linear Activation Function (y = x) and no biases, n₁ = 3, n₂ = -2, y = 9, α = 0.1

Goal: Update W₃₆

W₃₆ = W₃₆ - α

TJ Machine Learning Club

Slide 14

15 of 22

Calculating One Neural Network Iteration

-2

-3

W₁₃ = 1

W₃₆ = -1

W₂₃ = 3

W₁₄ = 4

W₁₅ = 3

W₂₄ = 2

W₂₅ = -2

W₅₆ = 2

W₄₆ = -3

Linear Activation Function (y = x) and no biases, n₁ = 3, n₂ = -2, y = 9, α = 0.1

Goal: Update W₃₆

W₃₆ = W₃₆ - α

TJ Machine Learning Club

Slide 15

16 of 22

Calculating One Neural Network Iteration

-2

-3

W₁₃ = 1

W₃₆ = -1

W₂₃ = 3

W₁₄ = 4

W₁₅ = 3

W₂₄ = 2

W₂₅ = -2

W₅₆ = 2

W₄₆ = -3

Linear Activation Function (y = x) and no biases, n₁ = 3, n₂ = -2, y = 9, α = 0.1

E = ½(n₆ - y)²

n₆ = W₃₆n₃ + W₄₆n₄ + W₅₆n₅

TJ Machine Learning Club

Slide 16

17 of 22

Calculating One Neural Network Iteration

-2

-3

W₁₃ = 1

W₃₆ = -1

W₂₃ = 3

W₁₄ = 4

W₁₅ = 3

W₂₄ = 2

W₂₅ = -2

W₅₆ = 2

W₄₆ = -3

Linear Activation Function (y = x) and no biases, n₁ = 3, n₂ = -2, y = 9, α = 0.1

E = ½(n₆ - y)²

n₆ = W₃₆n₃ + W₄₆n₄ + W₅₆n₅

W₃₆ = W₃₆ - α

W₃₆ = -1 - 0.1(12) = -1 - 1.2 = -2.2

= (n₆ - y) * n₃ = -4 * -3 = 12

TJ Machine Learning Club

Slide 17

18 of 22

Calculating One Neural Network Iteration

-2

-3

W₁₃ = 1

W₃₆ = -1

W₂₃ = 3

W₁₄ = 4

W₁₅ = 3

W₂₄ = 2

W₂₅ = -2

W₅₆ = 2

W₄₆ = -3

Linear Activation Function (y = x) and no biases, n₁ = 3, n₂ = -2, y = 9, α = 0.1

Goal: Update W₁₃

W₁₃ = W₁₃ - α

TJ Machine Learning Club

Slide 18

19 of 22

Calculating One Neural Network Iteration

-2

-3

W₁₃ = 1

W₃₆ = -1

W₂₃ = 3

W₁₄ = 4

W₁₅ = 3

W₂₄ = 2

W₂₅ = -2

W₅₆ = 2

W₄₆ = -3

Linear Activation Function (y = x) and no biases, n₁ = 3, n₂ = -2, y = 9, α = 0.1

Goal: Update W₁₃

W₁₃ = W₁₃ - α

TJ Machine Learning Club

Slide 19

20 of 22

Calculating One Neural Network Iteration

-2

-3

W₁₃ = 1

W₃₆ = -1

W₂₃ = 3

W₁₄ = 4

W₁₅ = 3

W₂₄ = 2

W₂₅ = -2

W₅₆ = 2

W₄₆ = -3

Linear Activation Function (y = x) and no biases, n₁ = 3, n₂ = -2, y = 9, α = 0.1

Goal: Update W₁₃

W₁₃ = W₁₃ - α

E = ½(n₆ - y)²

n₆ = W₃₆n₃ + W₄₆n₄ + W₅₆n₅

n₃ = W₁₃n₁ + W₂₃ + n₂

TJ Machine Learning Club

Slide 20

21 of 22

Calculating One Neural Network Iteration

-2

-3

W₁₃ = 1

W₃₆ = -1

W₂₃ = 3

W₁₄ = 4

W₁₅ = 3

W₂₄ = 2

W₂₅ = -2

W₅₆ = 2

W₄₆ = -3

Linear Activation Function (y = x) and no biases, n₁ = 3, n₂ = -2, y = 9, α = 0.1

Goal: Update W₁₃

W₁₃ = W₁₃ - α

(n₆ - y)* W₃₆ * n₁

(-4)* -1 * 3 = 12

W₁₃ = 1 - 0.1 * 12 = -0.2

TJ Machine Learning Club

Slide 21

22 of 22

Calculating One Neural Network Iteration

-2

-3

W₁₃ = -0.2

W₃₆ = -2.2

W₂₃ = 3

W₁₄ = 4

W₁₅ = 3

W₂₄ = 2

W₂₅ = -2

W₅₆ = 2

W₄₆ = -3

Linear Activation Function (y = x) and no biases, n₁ = 3, n₂ = -2, y = 9, α = 0.1

TJ Machine Learning Club

Slide 22