1 of 56

Reconhecimento de Padrões (RPD-0041)�

Prof. André E. Lazzaretti

lazzaretti@utfpr.edu.br

https://sites.google.com/site/andrelazzaretti/graduate-courses/reconhecimento-de-padrões-cpgei/cronograma-2025

Lecture 10 - Deep Feedforward Neural Networks and Backpropagation

2 of 56

Introduction

Feedforward: information ﬂows from the input, through the intermediate computations used to deﬁne the function, and ﬁnally to the output.

3 of 56

Introduction

Feedforward: information ﬂows from the input, through the intermediate computations used to deﬁne the function, and ﬁnally to the output.
Networks: layers of functions f(x) =f⁽³⁾(f⁽²⁾(f⁽¹⁾(x)))

4 of 56

Introduction

Feedforward: information ﬂows from the input, through the intermediate computations used to deﬁne the function, and ﬁnally to the output.
Networks: layers of functions f(x) =f⁽³⁾(f⁽²⁾(f⁽¹⁾(x)))
Depth of the model: deep learning arose from this terminology.

5 of 56

Introduction

Feedforward: information ﬂows from the input, through the intermediate computations used to deﬁne the function, and ﬁnally to the output.
Networks: layers of functions f(x) =f⁽³⁾(f⁽²⁾(f⁽¹⁾(x)))
Depth of the model: deep learning arose from this terminology.
Neural because they are loosely inspired by neuroscience.

Each unit resembles a neuron - it receives input from many other units and computes its own activation value.

6 of 56

Introduction

Feedforward: information ﬂows from the input, through the intermediate computations used to deﬁne the function, and ﬁnally to the output.
Networks: layers of functions f(x) =f⁽³⁾(f⁽²⁾(f⁽¹⁾(x)))
Depth of the model: deep learning arose from this terminology.
Neural because they are loosely inspired by neuroscience.

Each unit resembles a neuron - it receives input from many other units and computes its own activation value.

Intrinsically nonlinear.

7 of 56

Basic Structure

8 of 56

Learning XOR

9 of 56

Learning XOR

10 of 56

Learning XOR

1

1.1 + 1.1 + 1.(-1.5) = 0.5

1.1 + 1.1 + 1.(-0.5) = 1

1

1.(-2) + 1.1 + 1.(-0.5) = -1.5

0

11 of 56

Learning XOR

12 of 56

How to train?

Designing and training a neural network is not much diﬀerent from training any other machine learning model.
Steps:

Cost Function;
Architecture: output units, hidden units;
Training (backpropagation).

13 of 56

Loss Functions: Classification

Multiclass (cross-entropy):

Binary (logistic):

For regression: https://keras.io/api/losses/regression_losses/

14 of 56

Architecture

15 of 56

Hidden Units

16 of 56

Hidden Units

Rectiﬁed Linear Units (ReLU):
Variations:

Leaky ReLU (fixes α_i to a small value, e.g. 0.01).
Parametric ReLU, or PReLU, treats α_i as a learnable parameter.

https://www.youtube.com/watch?v=Xvg00QnyaIY&ab_channel=DeepLearningAI

17 of 56

Output Units

Linear Units:

Sigmoid Units:

Softmax Units:

Binary!

Multiclass!

Regression!

18 of 56

Eﬀect of depth

19 of 56

L² Parameter Regularization

Smooth

Abrupt

Small weights

Large weights

20 of 56

Early Stopping

21 of 56

Dropout

It consists of 'turning off' random neurons from the network at each iteration, based on a user-defined probability (only during training).
Force all neurons to contribute in the final result.
Prevents coadaptation (better when hidden units can detect features independently of each other, i.e., uncorrelated) of neurons.

22 of 56

Dropout

Generally used on fully connected layers.
Typical values of dropout rate, or probability of turning off neurons:

0.5 for fully connected layers.
0.2 for convolutional layers.

23 of 56

How to train? Gradient-Based Learning

24 of 56

Backpropagation

Gradient-based:

25 of 56

Backpropagation

Gradient-based:

Two alternating steps:

Forward propagation

σ(x₄w₄)

σ(x₁w₁)

σ(x₅w₅)

σ(x₆w₆)

σ(x₂w₂)

σ(x₃w₃)

ℇ

26 of 56

Backpropagation

Gradient-based:

Two alternating steps:

Forward propagation
Backward prograpagation

∂ℇ/∂w₄

∂ℇ/∂w₁

∂ℇ/∂w₅

∂ℇ/∂w₆

∂ℇ/∂w₂

∂ℇ/∂w₃

ℇ

σ(x₄w₄)

σ(x₁w₁)

σ(x₅w₅)

σ(x₆w₆)

σ(x₂w₂)

σ(x₃w₃)

ℇ

27 of 56

Backpropagation

Gradient-based:

Two alternating steps:

Forward propagation
Backward prograpagation

∂ℇ/∂w₄

∂ℇ/∂w₁

∂ℇ/∂w₅

∂ℇ/∂w₆

∂ℇ/∂w₂

∂ℇ/∂w₃

ℇ

28 of 56

Backpropagation

29 of 56

Backpropagation

30 of 56

Backpropagation

31 of 56

Backpropagation

32 of 56

Backpropagation

33 of 56

Backpropagation

34 of 56

Backpropagation

35 of 56

Backpropagation

36 of 56

Backpropagation

37 of 56

Backpropagation

38 of 56

Backpropagation

39 of 56

Backpropagation

40 of 56

Backpropagation

41 of 56

Backpropagation

42 of 56

Backpropagation

43 of 56

Backpropagation

44 of 56

Backpropagation

45 of 56

Backpropagation

46 of 56

Backpropagation

47 of 56

Backpropagation

48 of 56

Backpropagation

49 of 56

Backpropagation

50 of 56

Backpropagation

51 of 56

Backpropagation

52 of 56

Backpropagation

53 of 56

Backpropagation

54 of 56

Backpropagation

55 of 56

Backpropagation

56 of 56

Backpropagation

Vectorized Implementation:

This link in [37:35]. This link may also help.

Example in Python: link.