1 of 56

Reconhecimento de Padrões (RPD-0041)�

Lecture 10 - Deep Feedforward Neural Networks and Backpropagation

2 of 56

Introduction

  • Feedforward: information flows from the input, through the intermediate computations used to define the function, and finally to the output.

3 of 56

Introduction

  • Feedforward: information flows from the input, through the intermediate computations used to define the function, and finally to the output.
  • Networks: layers of functions f(x) =f(3)(f(2)(f(1)(x)))

4 of 56

Introduction

  • Feedforward: information flows from the input, through the intermediate computations used to define the function, and finally to the output.
  • Networks: layers of functions f(x) =f(3)(f(2)(f(1)(x)))
  • Depth of the model: deep learning arose from this terminology.

5 of 56

Introduction

  • Feedforward: information flows from the input, through the intermediate computations used to define the function, and finally to the output.
  • Networks: layers of functions f(x) =f(3)(f(2)(f(1)(x)))
  • Depth of the model: deep learning arose from this terminology.
  • Neural because they are loosely inspired by neuroscience.
    • Each unit resembles a neuron - it receives input from many other units and computes its own activation value.

6 of 56

Introduction

  • Feedforward: information flows from the input, through the intermediate computations used to define the function, and finally to the output.
  • Networks: layers of functions f(x) =f(3)(f(2)(f(1)(x)))
  • Depth of the model: deep learning arose from this terminology.
  • Neural because they are loosely inspired by neuroscience.
    • Each unit resembles a neuron - it receives input from many other units and computes its own activation value.
  • Intrinsically nonlinear.

7 of 56

Basic Structure

8 of 56

Learning XOR

9 of 56

Learning XOR

10 of 56

Learning XOR

1

1

1.1 + 1.1 + 1.(-1.5) = 0.5

1.1 + 1.1 + 1.(-0.5) = 1

1

1

1.(-2) + 1.1 + 1.(-0.5) = -1.5

0

11 of 56

Learning XOR

12 of 56

How to train?

  • Designing and training a neural network is not much different from training any other machine learning model.
  • Steps:
    • Cost Function;
    • Architecture: output units, hidden units;
    • Training (backpropagation).

13 of 56

Loss Functions: Classification

Multiclass (cross-entropy):

Binary (logistic):

14 of 56

Architecture

15 of 56

Hidden Units

16 of 56

Hidden Units

  • Rectified Linear Units (ReLU):
  • Variations:
    • Leaky ReLU (fixes αi to a small value, e.g. 0.01).
    • Parametric ReLU, or PReLU, treats αi as a learnable parameter.

17 of 56

Output Units

  • Linear Units:

  • Sigmoid Units:

  • Softmax Units:

Binary!

Multiclass!

Regression!

18 of 56

Effect of depth

19 of 56

L2 Parameter Regularization

Smooth

Abrupt

Small weights

Large weights

20 of 56

Early Stopping

21 of 56

Dropout

  • It consists of 'turning off' random neurons from the network at each iteration, based on a user-defined probability (only during training).
  • Force all neurons to contribute in the final result.
  • Prevents coadaptation (better when hidden units can detect features independently of each other, i.e., uncorrelated) of neurons.

22 of 56

Dropout

  • Generally used on fully connected layers.
  • Typical values of dropout rate, or probability of turning off neurons:
    • 0.5 for fully connected layers.
    • 0.2 for convolutional layers.

23 of 56

How to train? Gradient-Based Learning

24 of 56

Backpropagation

  • Gradient-based:

25 of 56

Backpropagation

  • Gradient-based:

  • Two alternating steps:
    • Forward propagation

σ(x4w4)

σ(x1w1)

σ(x5w5)

σ(x6w6)

σ(x2w2)

σ(x3w3)

26 of 56

Backpropagation

  • Gradient-based:

  • Two alternating steps:
    • Forward propagation
    • Backward prograpagation

∂ℇ/∂w4

∂ℇ/∂w1

∂ℇ/∂w5

∂ℇ/∂w6

∂ℇ/∂w2

∂ℇ/∂w3

σ(x4w4)

σ(x1w1)

σ(x5w5)

σ(x6w6)

σ(x2w2)

σ(x3w3)

27 of 56

Backpropagation

  • Gradient-based:

  • Two alternating steps:
    • Forward propagation
    • Backward prograpagation

∂ℇ/∂w4

∂ℇ/∂w1

∂ℇ/∂w5

∂ℇ/∂w6

∂ℇ/∂w2

∂ℇ/∂w3

28 of 56

Backpropagation

29 of 56

Backpropagation

30 of 56

Backpropagation

31 of 56

Backpropagation

32 of 56

Backpropagation

33 of 56

Backpropagation

34 of 56

Backpropagation

35 of 56

Backpropagation

36 of 56

Backpropagation

37 of 56

Backpropagation

38 of 56

Backpropagation

39 of 56

Backpropagation

40 of 56

Backpropagation

41 of 56

Backpropagation

42 of 56

Backpropagation

43 of 56

Backpropagation

44 of 56

Backpropagation

45 of 56

Backpropagation

46 of 56

Backpropagation

47 of 56

Backpropagation

48 of 56

Backpropagation

49 of 56

Backpropagation

50 of 56

Backpropagation

51 of 56

Backpropagation

52 of 56

Backpropagation

53 of 56

Backpropagation

54 of 56

Backpropagation

55 of 56

Backpropagation

56 of 56

Backpropagation

  • Vectorized Implementation:
    • This link in [37:35]. This link may also help.

  • Example in Python: link.