1 of 26

Deep Learning (DEEP-0001)�

Prof. André E. Lazzaretti

lazzaretti@utfpr.edu.br

https://sites.google.com/site/andrelazzaretti/graduate-courses/deep-learning-cpgei/2025

8 – Initialization

2 of 26

Initialization

Need for initialization
He initialization
Interlude: Expectations
Show that
Write variance of pre-activations f’ in terms of activations h in previous layer

Write variance of pre-activations f’ in terms of pre-activations f in previous layer

3 of 26

Initialization

Consider standard building block of NN in terms of preactivations:

How do we initialize the biases and weights?

4 of 26

Initialization

5 of 26

Initialization

Need for initialization
He initialization
Interlude: Expectations
Show that
Write ariance of pre-activations f’ in terms of activations h in previous layer

Write variance of pre-activations f’ in terms of pre-activations f in previous layer

6 of 26

Exploding gradients

Vanishing gradients

7 of 26

Exploding gradients

Vanishing gradients

8 of 26

Initialization

Need for initialization
He initialization
Interlude: Expectations
Show that
Write variance of pre-activations f’ in terms of activations h in previous layer

Write variance of pre-activations f’ in terms of pre-activations f in previous layer

9 of 26

Expectations

Interpretation: what is the average value of g[x] when taking into account the probability of x?

10 of 26

Expectations

11 of 26

Rules for manipulating expectation

12 of 26

Initialization

Need for initialization
He initialization
Interlude: Expectations
Show that
Write variance of pre-activations f’ in terms of activations h in previous layer

Write variance of pre-activations f’ in terms of pre-activations f in previous layer

13 of 26

Aim: keep variance same between two layers (Initialization for forward pass)

Consider the mean of the pre-activations:

14 of 26

Rule 1:

Rule 2:

Rule 3:

Rule 4:

15 of 26

Rule 1:

Rule 2:

Rule 3:

Rule 4:

16 of 26

Rule 1:

Rule 2:

Rule 3:

Rule 4:

17 of 26

Initialization

Need for initialization
He initialization
Interlude: Expectations
Show that
Write variance of pre-activations f’ in terms of activations h in previous layer

Write variance of pre-activations f’ in terms of pre-activations f in previous layer

18 of 26

Aim: keep variance same between two layers

19 of 26

Rule 1:

Rule 2:

Rule 3:

Rule 4:

20 of 26

Rule 1:

Rule 2:

Rule 3:

Rule 4:

21 of 26

Rule 1:

Rule 2:

Rule 3:

Rule 4:

22 of 26

Rule 1:

Rule 2:

Rule 3:

Rule 4:

23 of 26

Initialization

Need for initialization
He initialization
Interlude: Expectations
Show that
Write variance of pre-activations f’ in terms of activations h in previous layer

Write variance of pre-activations f’ in terms of pre-activations f in previous layer

24 of 26

25 of 26

Aim: keep variance same between two layers

Should choose:

This is called He initialization.

26 of 26

PyTorch code

Define a neural network
Initialize params with He initialization
Define loss function
Choose optimization algorithm
Choose initial learning rate
Choose learning rates schedule
Make some random data
Train for 100 batches