Deep Learning (DEEP-0001)�
Prof. André E. Lazzaretti
https://sites.google.com/site/andrelazzaretti/graduate-courses/deep-learning-cpgei/2025
7 – Gradients
Loss function
or for short:
Returns a scalar that is smaller when model maps inputs to outputs better
Example
Problem 1: Computing gradients
Loss: sum of individual terms:
SGD Algorithm:
Parameters:
Need to compute gradients
Why is this such a big deal?
Problem 2: initialization
Where should we start the parameters before we commence SGD?
Gradients
Problem 1: Computing gradients
Loss: sum of individual terms:
SGD Algorithm:
Parameters:
Need to compute gradients
Algorithm to compute gradient efficiently
BackProp intuition #1: the forward pass
BackProp intuition #2: the backward pass
To calculate how a small change in a weight or bias feeding into hidden layer h3 modifies the loss, we need to know:�
BackProp intuition #2: the backward pass
To calculate how a small change in a weight or bias feeding into hidden layer h2 modifies the loss, we need to know:
BackProp intuition #2: the backward pass
To calculate how a small change in a weight or bias feeding into hidden layer h1 modifies the loss, we need to know:
Gradients
Toy function
Toy function
Derivatives
Gradients of toy function
We want to calculate:
Gradients of composed functions
Calculating expressions by hand:
Forward pass
1. Write this as a series of
intermediate calculations
2. Compute these intermediate quantities
Forward pass
1. Write this as a series of
intermediate calculations
2. Compute these intermediate quantities
Forward pass
1. Write this as a series of
intermediate calculations
2. Compute these intermediate quantities
Backward pass
1. Compute the derivatives of the loss with respect to these intermediate quantities, but in reverse order.
Backward pass
1. Compute the derivatives of the loss with respect to these intermediate quantities, but in reverse order.
Backward pass
1. Compute the derivatives of the loss with respect to these intermediate quantities, but in reverse order.
Backward pass
1. Compute the derivatives of the loss with respect to these intermediate quantities, but in reverse order.
Backward pass
1. Compute the derivatives of the loss with respect to these intermediate quantities, but in reverse order.
Backward pass
1. Compute the derivatives of the loss with respect to these intermediate quantities, but in reverse order.
Already computed!
ω3
Backward pass
1. Compute the derivatives of the loss with respect to these intermediate quantities, but in reverse order.
Backward pass
1. Compute the derivatives of the loss with respect to these intermediate quantities, but in reverse order.
Already computed!
-sin[f2]
Backward pass
1. Compute the derivatives of the loss with respect to these intermediate quantities, but in reverse order.
Backward pass
1. Compute the derivatives of the loss with respect to these intermediate quantities, but in reverse order.
Backward pass
1. Compute the derivatives of the loss with respect to these intermediate quantities, but in reverse order.
Backward pass
2. Find how the loss changes as a function of the parameters β and ω.
Backward pass
2. Find how the loss changes as a function of the parameters β and ω.
Already calculated in part 1.
hk
Backward pass
2. Find how the loss changes as a function of the parameters β and ω.
Backward pass
2. Find how the loss changes as a function of the parameters β and ω.
Examples:
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Backpropagation
Gradients
Matrix calculus
Scalar function f[] of a vector a
Matrix calculus
Scalar function f[] of a matrix A
Matrix calculus
Vector function f[] of vector a
Comparing vector and matrix
Scalar derivatives:
Comparing vector and matrix
Scalar derivatives:
Matrix derivatives:
Comparing vector and matrix
Scalar derivatives:
Matrix derivatives:
Gradients
The forward pass
1. Write this as a series of
intermediate calculations
The forward pass
1. Write this as a series of
intermediate calculations
2. Compute these intermediate quantities
Gradients
The backward pass
1. Write this as a series of
intermediate calculations
2. Compute these intermediate quantities
3. Take derivatives of output with respect to intermediate quantities
The backward pass
1. Write this as a series of
intermediate calculations
2. Compute these intermediate quantities
3. Take derivatives of output with respect to intermediate quantities
Yikes!
The backward pass
1. Write this as a series of
intermediate calculations
2. Compute these intermediate quantities
3. Take derivatives of output with respect to intermediate quantities
The backward pass
1. Write this as a series of
intermediate calculations
2. Compute these intermediate quantities
3. Take derivatives of output with respect to intermediate quantities
Derivative of ReLU
Derivative of ReLU
“Indicator function”
Derivative of RELU
1. Consider:
where:
2. We could equivalently write:
3. Taking the derivative
4. We can equivalently pointwise multiply by diagonal
The backward pass
1. Write this as a series of
intermediate calculations
2. Compute these intermediate quantities
3. Take derivatives of output with respect to intermediate quantities
The backward pass
1. Write this as a series of
intermediate calculations
2. Compute these intermediate quantities
3. Take derivatives of output with respect to intermediate quantities
4. Take derivatives w.r.t.
parameters
The backward pass
1. Write this as a series of
intermediate calculations
2. Compute these intermediate quantities
3. Take derivatives of output with respect to intermediate quantities
4. Take derivatives w.r.t.
parameters
Backprop summary
Backprop summary
Backprop summary
Backprop summary
Backprop summary
Backprop summary
Backprop summary
Pros and cons
Gradients
Algorithmic differentiation