1 of 44

Neural Networks and Fuzzy Systems

The Multilayer Perceptron

Rizoan Toufiq

Assistant Professor

Department of Computer Science & Engineering

Rajshahi University of Engineering & Technology

2 of 44

Altering The Perceptron Model

  • The Problem:
    • The perceptron's in the second layer do not know which of the real inputs were on or not;
    • It is impossible to strengthen the correct parts of the network
    • Credit assignment problem, since it means that the network is unable to determine which of the input weights should be increased and which should not�

3 of 44

Altering The Perceptron Model

  • The Problem:
    • How are we to overcome the problem of being unable to solve linearly inseparable problems with our perceptron?

4 of 44

Altering The Perceptron Model

  • The Solution:
    • A couple of possibilities for the new thresholding function

5 of 44

The New Model

6 of 44

The New Learning Rule

  • Generalised delta rule”, or the “backpropagation rule”, and
    • Suggested in 1986 by Rumelhart, McClelland, and Williams.
  • Parker had published similar results in 1982
  • Werbos was shown to have done the work in 1974.

7 of 44

The New Learning Rule

  • Generalised delta rule”, or the “backpropagation rule”, and
    • define an error function that represents the difference between the network’s current output and the correct output that we want it to produce.
    • We want to continually reduce the value of this error function
      • adjusting the weights on the links between the units, and the generalised delta rule does this by calculating the value of the error function for that particular input, and then back-propagating (hence the name!) the error from one layer to the previous one.

8 of 44

The New Learning Rule

The Mathematics

    • Ep → the error function for pattern p ,
    • tpj → the target output for pattern p on node j,
    • Opj →the actual output for pattern p on node j
    • wij → the weight from node i to node j.

Hidden Layer to output layer

9 of 44

The New Learning Rule

The Mathematics

    • Ep → the error function for pattern p ,
    • tpj → the target output for pattern p on node j,
    • Opj →the actual output for pattern p on node j
    • wij → the weight from node i to node j.

Hidden Layer to output layer

10 of 44

The New Learning Rule

The Mathematics

    • Ep → the error function for pattern p ,
    • tpj → the target output for pattern p on node j,
    • Opj →the actual output for pattern p on node j
    • wij → the weight from node i to node j.

Hidden Layer to output layer

11 of 44

The New Learning Rule

The Mathematics

    • Ep → the error function for pattern p ,
    • tpj → the target output for pattern p on node j,
    • Opj →the actual output for pattern p on node j
    • wij → the weight from node i to node j.

Hidden Layer to output layer

12 of 44

The New Learning Rule

The Mathematics

    • Ep → the error function for pattern p ,
    • tpj → the target output for pattern p on node j,
    • Opj →the actual output for pattern p on node j
    • wij → the weight from node i to node j.

Hidden Layer to output layer

13 of 44

The New Learning Rule

The Mathematics

Hidden Layer to output layer

We now need to know what δpjis for each of the units-if we know

this, then we can decrease E .

Using (4.6) and chain rule,

14 of 44

The New Learning Rule

The Mathematics

Hidden Layer to output layer

We now need to know what δpjis for each of the units-if we know

this, then we can decrease E .

Using (4.6) and chain rule,

15 of 44

The New Learning Rule

The Mathematics

Hidden Layer to output layer

16 of 44

The New Learning Rule

The Mathematics

Input layer to Hidden Layer

17 of 44

The New Learning Rule

The Mathematics

Input layer to Hidden Layer

Equations (4.12) and (4.15) together define how we can train our multilayer networks. �

18 of 44

The New Learning Rule

Advantage of Sigmoid Function

19 of 44

The Multilayer Perceptron Algorithm

20 of 44

The Multilayer Perceptron Algorithm

21 of 44

The Multilayer Perceptron Algorithm

22 of 44

The XOR problem Revisited

The hidden unit acts as a feature detector,

23 of 44

The XOR problem Revisited

24 of 44

The XOR problem Revisited

One of the more interesting cases is when there is no direct connection from the input to the output.

25 of 44

The XOR problem Revisited

One of the more interesting cases is when there is no direct connection from the input to the output.

26 of 44

The XOR problem Revisited

The learning rule is not guaranteed to produce convergence, however, and it is possible for the network to fall into a situation in�which it is unable to learn the correct output.

The network shown in figure 4.7 will correctly respond to the input patterns 00 and 10, but fails to produce the correct output for the�patterns 01 or 11.

This local minimum occurs infrequently-about 1%of the time in the XOR problem.

27 of 44

Visualising Network Behavior

28 of 44

Visualising Network Behavior

29 of 44

Visualising Network Behavior

30 of 44

Multilayer perceptrons as classifiers

31 of 44

Multilayer perceptrons as classifiers

32 of 44

Multilayer perceptrons as classifiers

33 of 44

Multilayer perceptrons as classifiers

if we add another layer of perceptron's �

34 of 44

Multilayer perceptrons as classifiers

We never need more than three layers in a network, a statement that is referred to as the Kolmogorov theorem.

35 of 44

Multilayer perceptrons as classifiers

We never need more than three layers in a network, a statement that is referred to as the Kolmogorov theorem.

36 of 44

Multilayer perceptrons as classifiers

We never need more than three layers in a network, a statement that is referred to as the Kolmogorov theorem.

37 of 44

Multilayer perceptrons as classifiers

38 of 44

Generalization

One of the major features of neural networks is their ability to generalise, that is, to successfully classify patterns that have not been previously presented.

Neural networks are good at interpolation, but not so good at extrapolation.

39 of 44

Fault Tolerance

40 of 44

Learning Difficulties

local minimum

Underfitting

Overfitting

Divergency

there is only a slight “lip” to cross before reaching an actual deeper minimum

There are alternative approaches to minimizing these occurrences, which are outlined below.

    • Lowering the gain term
    • Addition of internal nodes
    • Momentum term
    • Addition of noise

41 of 44

Learning Difficulties

Other Learning Problems

The method of gradient descent is intrinsically slow to converge in a complex landscape

One solution: The addition of the momentum term

Other solution: take account of second order effects in the gradient descent algorithm.

42 of 44

Summary

Rizoan Toufiq, Assistant Professor, Dept. of CSE, RUET

42

  • Multilayer perceptron - layers of perceptron-like units.
  • Feedforward, supervised learning.
  • Uses continuously differentiable thresholding function (usually
  • sigmoid).
  • Back-propagation algorithm (generalised delta rule) trains network by passing errors back down the net.
  • Three layers of active units can represent any pattern classification.
  • Net develops internal representations of the input’s structure.
  • Repeated presentations of training data required for learning.
  • Described by energy landscape.
  • Learning process will not always converge.
  • Variety of approaches to overcome learning difficulties.

43 of 44

Read Task

Rizoan Toufiq, Assistant Professor, Dept. of CSE, RUET

43

B1: Neural Computing - An Introduction - R Beale and T Jackson, Publisher: Adam Hilger, 1990 IOP Publishing Ltd.

Chapter 4: The Multilayer Perceptron

44 of 44

Query???

Rizoan Toufiq, Assistant Professor, Dept. of CSE, RUET

44