1 of 40

ME 4990: Intro to CS�Object-Oriented Programming & Machine Learning 101

Neural Network Training:�Backpropagation

2 of 40

Outline

  • Forward Propagation
    • Forward propagation
    • Loss Function
  • Training
    • Gradient Descent Introduction
    • Backward Propagation

3 of 40

Forward Propagation Demonstration

  •  

 

 

 

 

 

 

Fully Connected

ReLU

FC

ReLU

FC

input

Hidden 1

Hidden 1a

Hidden 2

Hidden 2a

output

4 of 40

Forward Propagation

  • What is the dimension of weight matrix and bias vector for each fully connected layer?

5 of 40

Forward Propagation

  •  

6 of 40

Forward Propagation

  •  

7 of 40

Forward propagation

  • After ReLU

 

 

 

 

 

 

Fully Connected

ReLU

FC

ReLU

FC

input

Hidden 1

Hidden 1a

Hidden 2

Hidden 2a

output

8 of 40

Forward Propagation

  •  

9 of 40

Forward Propagation

  •  

10 of 40

Forward propagation

  • After ReLU

 

 

 

 

 

 

Fully Connected

ReLU

FC

ReLU

FC

input

Hidden 1

Hidden 1a

Hidden 2

Hidden 2a

output

11 of 40

Forward Propagation

  •  

12 of 40

Forward Propagation

  •  

13 of 40

Forward propagation

  • Forward Propagation

 

 

 

 

 

 

Fully Connected

ReLU

FC

ReLU

FC

input

Hidden 1

Hidden 1a

Hidden 2

Hidden 2a

output

14 of 40

Mean squared error (MSE)

  •  

x

Label y

1

2

3

2

4

5

3

6

6

15 of 40

Mean squared error (MSE)

  •  

 

 

 

 

 

 

Fully Connected

ReLU

FC

ReLU

FC

input

Hidden 1

Hidden 1a

Hidden 2

Hidden 2a

output

16 of 40

Log loss

  •  

17 of 40

Outline

  • Forward Propagation
    • Forward propagation
    • Loss Function
  • Training
    • Gradient Descent Introduction
    • Backward Propagation

18 of 40

Training

  •  

19 of 40

Training

  • In Pytorch it is just easy

20 of 40

Training

  •  

21 of 40

Gradient Descent

  •  

22 of 40

Gradient

  • We have a topographic map

  • If we are at 489, which direction we shall go to climb that ascent fast
  • Go which direction will let us descent fastest?

23 of 40

Gradient

  •  

24 of 40

Use iterative way to find the minimum location

  • Iteratively, we find local minima

  • It is popular simply because it is simple and applicable to any functions (strictly, differentiable function)

25 of 40

Gradient Descent: learning rate

  • If learning rate is too small

  • If learning rate is too big: overshoot

26 of 40

Gradient Descent on 2D

  • Frankly gradient descent on 1D function doesn’t make sense, as there is no “direction”

  • But for 2D or above, to reach the minimum fastest using iterative approach:
    • Always descent on the gradient direction!

27 of 40

Training

  •  

28 of 40

Training

  •  

29 of 40

Outline

  • Loss function
  • Backpropagation

30 of 40

Backpropagation

  • A cool visualization without true understanding

https://youtu.be/Ilg3gGewQ5U?si=p7j0ajHG2yH72syQ

31 of 40

Chain Rule

  •  

32 of 40

Chain Rule

  •  

33 of 40

Chain Rule

  •  

34 of 40

Let’s make it fancier

  •  

35 of 40

Let’s make it fancier

  •  

36 of 40

Backpropagation

  •  
  •  

37 of 40

Backpropagation

  •  

38 of 40

Backpropagation

  •  
  •  

39 of 40

Backpropagation

  •  

40 of 40

Training

  •