1 of 37

Deep Learning (DEEP-0001)�

Prof. André E. Lazzaretti

lazzaretti@utfpr.edu.br

https://sites.google.com/site/andrelazzaretti/graduate-courses/deep-learning-cpgei/2025

9 – Performance

2 of 37

Measuring performance

MNIST1D dataset model and performance
Noise, bias, and variance
Reducing variance
Reducing bias & bias-variance trade-off
Double descent
Curse of dimensionality & weird properties of high dimensional space
Choosing hyperparameters

3 of 37

MNIST Dataset

4 of 37

MNIST 1D Dataset

5 of 37

Network

40 inputs
10 outputs
4000 training examples (~400 training examples per class)
Two hidden layers

100 hidden units each

SGD with batch size 100, learning rate 0.1
6000 steps (150 Epochs)

6 of 37

Results

7 of 37

Need to use separate test data

8 of 37

Need to use separate test data

The model has not generalized well to the new data

9 of 37

Measuring performance

MNIST1D dataset model and performance
Noise, bias, and variance
Reducing variance
Reducing bias & bias-variance trade-off
Double descent
Curse of dimensionality & weird properties of high dimensional space
Choosing hyperparameters

10 of 37

Regression example

11 of 37

Toy model

K hidden units
First layer fixed so “joints” divide interval evenly
Second layer trained
But… now linear in h

so convex cost function
can find best sol in closed-form

12 of 37

Noise, bias, and variance

Noise in measurements
Some variables not observed
Data mislabeled

13 of 37

Noise, bias, and variance

14 of 37

Noise, bias, and variance

15 of 37

Noise, bias, and variance

Variance is the uncertainty in fitted model due to choice of training set
Bias is systematic deviation from the mean of the function we are modeling due to limitations in our model
Noise is inherent uncertainty in the true mapping from input to output

16 of 37

Measuring performance

MNIST1D dataset model and performance
Noise, bias, and variance
Reducing variance
Reducing bias & bias-variance trade-off
Double descent
Curse of dimensionality & weird properties of high dimensional space
Choosing hyperparameters

17 of 37

Variance

18 of 37

Variance

19 of 37

Variance

Can reduce variance by adding more samples

20 of 37

Measuring performance

MNIST1D dataset model and performance
Noise, bias, and variance
Reducing variance
Reducing bias & bias-variance trade-off
Double descent
Curse of dimensionality & weird properties of high dimensional space
Choosing hyperparameters

21 of 37

Reducing bias

22 of 37

Reducing bias

23 of 37

Why does variance increase? Overfitting

Describes the training data better, but not the true underlying function (black curve)

model with three regions

model with ten regions

24 of 37

Bias and variance trade-off

model capacity (number of hidden units / linear regions in range of data)

25 of 37

Measuring performance

MNIST1D dataset model and performance
Noise, bias, and variance
Reducing variance
Reducing bias & bias-variance trade-off
Double descent
Curse of dimensionality & weird properties of high dimensional space
Choosing hyperparameters

26 of 37

Number of datapoints

27 of 37

Double descent

28 of 37

29 of 37

Note that train data is very close to zero.
Whatever is happening isn’t happening at training data points
Must be happening between the data points??

30 of 37

Potential explanation:

can make smoother functions with more hidden units
being smooth between the datapoints is a reasonable thing to do

But why?

31 of 37

All of these solutions are equivalent in terms of loss.
Why should the model choose the smooth solution?
Tendency of model to choose one solution over another is inductive bias

32 of 37

Measuring performance

MNIST1D dataset model and performance
Noise, bias, and variance
Reducing variance
Reducing bias & bias-variance trade-off
Double descent
Curse of dimensionality & weird properties of high dimensional space
Choosing hyperparameters

33 of 37

Curse of dimensionality

As dimensionality increases, the volume of space grows so fast that the amount of data needed to densely sample it increases exponentially. This phenomenon is known as the curse of dimensionality.

34 of 37

Weird properties of high-dimensional space

As the distance from the center increases, the probability decreases, but the volume of space at that radius (i.e., the area between adjacent evenly spaced circles) increases.
These factors trade off so that the histogram of distances of samples from the center has a pronounced peak.

35 of 37

Weird properties of high-dimensional space

In higher dimensions, this effect becomes more extreme, and the probability of observing a sample close to the mean becomes vanishingly small. Although the most likely point is at the mean of the distribution, the typical samples are found in a relatively narrow shell.

36 of 37

Measuring performance

MNIST1D dataset model and performance
Noise, bias, and variance
Reducing variance
Reducing bias & bias-variance trade-off
Double descent
Curse of dimensionality & weird properties of high dimensional space
Choosing hyperparameters

37 of 37

Choosing hyperparameters

Don’t know bias or variance
Don’t know how much capacity to add
How do we choose capacity in practice?

Or model structure
Or training algorithm
Or learning rate

Third data set – validation set

Train models with different hyperparameters on training set
Choose best hyperparameters with validation set
Test once with test set

https://www.youtube.com/watch?v=OTc2Q17GK4c