Neural Networks Training
CMSC 320 - Intro to Data Science
Follow Textbook: https://ffalam.github.io/CMSC320TextBook/chapter18/Chapter18_0.html
How Neural Networks Learn
(Summary)
Training a Neural Network involves adjusting its weights and biases to minimize the error (or loss) between the predicted output and the actual target values. This process is achieved through Backpropagation and Gradient Descent.
Step: Forward
Propagation
Step 1: FeedForward Pass
Step 1.1 : Forward Pass: The input data is passed forward through the network, producing an output prediction.
Step: Compute Loss
Compute Loss / Error
Step 1.2 : Calculating Loss / Error: Quantifies the difference between predicted and actual values.
After the feedforward pass, the neural network produces a predicted output y^ . The true label is y.
Step: Back
Propagation and Gradient Descent
Neural Network Training: Backpropagation
Backpropagation is an algorithm used to train neural networks by minimizing the error (loss) through gradient descent.
Backpropagation & Gradient Descent in Neural Networks
Backpropagation:
The gradients tell us how much to adjust each weight and bias to reduce the loss.
A gradient is the derivative of the loss function with respect to the model parameters (weights and biases). It tells us how the loss will change when the parameters are adjusted.
Backpropagation along with Gradient Descent forms the backbone and powerhouse of neural networks.
Backpropagation & Gradient Descent in Neural Networks
Gradient Descent Algorithm:
A gradient is the derivative of the loss function with respect to the model parameters (weights and biases). It tells us how the loss will change when the parameters are adjusted.
Backpropagation along with Gradient Descent forms the backbone and powerhouse of neural networks.
Backpropagation Key Idea
Backpropagation is the algorithm by which neural networks are trained.
Gradient of the loss function indicates how much each parameter contributed to the prediction error.
The main idea : For every training example, we compute the loss function, and then iteratively update the weights of the network by calculating the gradient of the error (loss) function with respect to each weight in the network, using the chain rule of calculus. This allows the neural network to learn by progressively reducing the error during the training process.
RECAP: Key Steps in Neural Network: Summary
Bias
Bias
Bias
Backpropagation: Backward Pass
Step 2: Backward Pass (Backpropagation):
Once we calculate the loss (after completing forward pass):
Backpropagation: Backward Pass
Step 2: Backward Pass (Backpropagation):
Backpropagation: Computing Derivative/Gradients
Say we have the following loss function:
What we would like to know is
For all the different weights in the network, weights with a high strongly contributed to the incorrect classification (indicating they had a significant impact on the error); conversely, weights with a low had less influence on the incorrect classification.
represents the partial derivative of the loss function (L) w.r.t the weights (w) of the neural network. This derivative indicates the rate of change of the loss with respect to a particular weight.
Backpropagation: Use Chain Rule to find Derivative
is actually
Using the chain rule, we can compute for every node.
Once we do, we can say:
By applying the chain rule, we can calculate the influence or impact of each node on the final outcome. This helps us understand the role of each node in the network.
-
*** The chain rule tells us how to find the derivative of a composite function (one function is nested over the other)..
Next: Backpropagation: a simple example
Backpropagation: a simple example
Given, f(x,y,z) = (x+y).z
e.g. x = -2, y = 5, z = -4
(consider all weights are 1 and bias is 0 for
simplicity)
Backpropagation: a simple example
Given, f(x,y,z) = (x+y).z
e.g. x = -2, y = 5, z = -4
q = x + y = -2 + 5 = 3
f = q.z = 3 * -4 = 12
Backpropagation: a simple example
Given, f(x,y,z) = (x+y).z
e.g. x = -2, y = 5, z = -4
q = x + y = -2 + 5 = 3
f = q.z = 3 * -4 = 12
(of the output w.r.t each of the input x,y,z)
Backpropagation: Compute:
Given, f(x,y,z) = (x+y).z
e.g. x = -2, y = 5, z = -4
Backward Pass: Compute Derivatives
Start with the base case
Backpropagation: Compute:
Given, f(x,y,z) = (x+y).z
e.g. x = -2, y = 5, z = -4
Backward Pass: Compute Derivatives
Start with the base case
: we know f = qz
= 3
3
Backpropagation: Compute:
Given, f(x,y,z) = (x+y).z
e.g. x = -2, y = 5, z = -4
Backward Pass: Compute Derivatives
: we know f = qz
= -4
-4
Backpropagation: Compute:
Given, f(x,y,z) = (x+y).z
e.g. x = -2, y = 5, z = -4
Backward Pass: Compute Derivatives
Continue process to left
Here, the value of Y is not directly connected to the output value F → so need chain rule to compute derivative
Backpropagation: Compute:
Given, f(x,y,z) = (x+y).z
e.g. x = -2, y = 5, z = -4
Backward Pass: Compute Derivatives
Here, the chain rule take into account the influence of Y on the intermediate variable Q.
Chain Rule
If y = f(g(x)), then y' = f'(g(x)). g'(x).
The chain rule states that the instantaneous rate of change of f relative to g relative to x helps us calculate the instantaneous rate of change of f relative to x.
HELPING SLIDES
Backpropagation: Compute:
Given, f(x,y,z) = (x+y).z
e.g. x = -2, y = 5, z = -4
Backward Pass: Compute Derivatives
= 1 * -4 =-4
Backpropagation: Compute:
Given, f(x,y,z) = (x+y).z
e.g. x = -2, y = 5, z = -4
Backward Pass: Compute Derivatives
= 1 * -4 =-4
GIVEN
(Upsteam)
Downstream gradients
Summary: Neural Networks
The End
A Neural Network with Pytorch (NOT FOR EXAM)
More Complicated Example
Check next calculations by yourselves!
Example: given neural network
A neural network with two inputs, two hidden neurons, two output neurons. A bias is included in the the hidden and output neurons.
Example: given neural network
Here are the initial weights, the biases, and training inputs/outputs
For the rest of this example, we’re going to work with a single training set. Given,
Next: Feed Forward Pass:
Focus: what the neural network currently predicts given the weights and biases and inputs of 0.05 and 0.10?
To do this we’ll feed those inputs i1,i2 forward though the network.
Feed Forward Pass:
Next:
Step 1: Calculate the total net input to each hidden layer neuron.
n
Total net, x = ∑ ijwj
j=1
Step 2: Squash the total net input using an activation function (e.g.: use Sigmoid function)
Repeat the process with the output layer neurons.
σ (Total net, x), where:
Feed Forward Pass:
Step 1: Calculate the total net input to each hidden layer neuron.
First hidden node h1:
Step 1: Calculate the total net input
neth1=(w1*i1)+(w2*i2)+b1
= (.15*0.05) +(.20*.10)+(0.35*1)
= 0.3775
Step 2: Apply Activation Function: squash it using the Sigmoid function to get the output of h1:
Feed Forward Pass:
Step 1: Calculate the total net input to each hidden layer neuron.
Second hidden node h2:
Step 1: Calculate the total net input
neth2=(w3*i1)+(w4*i2)+b1
Step 2: Apply Activation Function: squash it using the Sigmoid function to get the output of h1:
=
outh2=0.596884378
0.593269992
Feed Forward Pass:
Next: We repeat this process for the output layer neurons, using the output from the hidden layer neurons as inputs.
neto1=(w5*outh1)+(w6*outh2)+(b2*1)
= (.40*0.593269992
+(.45*0.596884378 +(0.60*1)
= 1.105905967
0.593269992
0.596884378
Feed Forward Pass:
Next: We repeat this process for the output layer neurons, using the output from the hidden layer neurons as inputs.
neto1=(w7*outh1)+(w8*outh2)+(b2*1)
=
0.593269992
0.596884378
outo2=0.772928465
0.75136507
0.772928465
Next: Calculating the Total Error
We can now calculate the error for each output neuron using the squared error function and sum them to get the total error:
Error = 1/2 (output/predicted -actual/target)2
The 1/2 is included so that exponent is cancelled when we differentiate later on. The result is eventually multiplied by a learning rate anyway (Recap: )
so it doesn’t matter that we introduce a constant here
-
Next: Calculating the Total Error
So, error for o1 is:
E01= 1/2 (output/predicted -actual/target)2
= 1/2 (0.75136507 -0.01)2
= ½* 0.549622167
= 0.274811083
0.75136507
0.772928465
Next: Calculating the Total Error
Error for o2 is, E02 : 0.023560026
= 0.274811083 + 0.023560026
= 0.298371109
0.75136507
0.772928465
Next: Calculating the Total Error
Error for o2 is, E02 : 0.023560026
Total Error, ETotal = E01 + E02
= 0.274811083 + 0.023560026
= 0.298371109
0.75136507
0.772928465
Recap: Backward Pass (Back Propagation)
Goal with backpropagation: Update each of the weights in the network
Why? so that they cause the actual output to be closer the target output,
How? Minimize the error for each output neuron and the network as a whole.
Backpropagation
Next, we will describe how can we calculates the contribution of weight w5 to the overall error (Etotal)
Backward Pass
Output Layer: Consider w5.
We want to know how much a change in w5 affects the total error ETotal :
= the partial derivative of ETotal with
respect to w5.
(we can also say the gradient with respect to w5)
By Applying Chain Rule we know
In the backpropagation process, working backward involves understanding how to start from the total error (Etotal) and trace back to the weight w5.
Therefore, the chain of dependencies follows:
Etotal → o1 → net01 → w5
adjust w5 to minimize the total error
We need to figure out each piece in this equation.
After we computed its respective gradient ( ) b ), we update the weight w5 using the gradient descent formula:
Same process is applicable for other subsequent weights.
= W5new=w5 − α
-
Additional (If you want to see more detailed calculation)
https://hmkcode.com/ai/backpropagation-step-by-step/
Interpretation of Partial Derivatives
HELPING SLIDES