Deep Learning (DEEP-0001)�
Prof. André E. Lazzaretti
https://sites.google.com/site/andrelazzaretti/graduate-courses/deep-learning-cpgei/2025
8 – Initialization
Initialization
Initialization
Initialization
Initialization
Exploding gradients
Vanishing gradients
Exploding gradients
Vanishing gradients
Initialization
Expectations
Interpretation: what is the average value of g[x] when taking into account the probability of x?
Expectations
Rules for manipulating expectation
Initialization
Aim: keep variance same between two layers (Initialization for forward pass)
Consider the mean of the pre-activations:
Rule 1:
Rule 2:
Rule 3:
Rule 4:
Rule 1:
Rule 2:
Rule 3:
Rule 4:
Rule 1:
Rule 2:
Rule 3:
Rule 4:
Initialization
Aim: keep variance same between two layers
Rule 1:
Rule 2:
Rule 3:
Rule 4:
Rule 1:
Rule 2:
Rule 3:
Rule 4:
Rule 1:
Rule 2:
Rule 3:
Rule 4:
Rule 1:
Rule 2:
Rule 3:
Rule 4:
Initialization
Aim: keep variance same between two layers
Should choose:
This is called He initialization.
PyTorch code