1 of 33

NEURAL NETWORKS AND DEEP LEARNING

by

Dr. Vikrant Chole

Amity School of Engineering & Technology

2 of 33

MODULE - III�Probabilistic Neural Network

Amity School of Engineering & Technology

3 of 33

Probabilistic Neural Network

A Probabilistic Neural Network (PNN) is a type of supervised machine learning algorithm used for classification tasks. It's based on Bayesian theory and uses Parzen window estimators to approximate the probability density/distribution function (PDF) of each class.

Probabilistic Neural Networks (PNNs) is a type of neural network architecture designed for classification tasks mainly due to the use of principles from Bayesian statistics and probability theory.

A Probabilistic Neural Network (PNN) is a type of feed-forward ANN in which the computation-intensive backpropagation is not used.

It’s a classifier that can estimate the pdf of a given set of data. PNNs are a scalable alternative to traditional backpropagation neural networks in classification and pattern recognition applications.

When used to solve problems on classification, the networks use probability theory to reduce the number of incorrect classifications.

Amity School of Engineering & Technology

4 of 33

Probabilistic Neural Networks (PNNs) is a type of neural network architecture designed for classification tasks mainly due to the use of principles from Bayesian statistics and probability theory.

The structure of PNNs consists of four layers:

  1. Input Layer: Represents the features of the input data.
  2. Pattern Layer: Each neuron in this layer represents a training example. It computes the distance from the input vector to the training samples.
  3. Summation Layer: This layer sums the contributions from the pattern layer neurons belonging to the same class to estimate the probability that a given input vector belongs to a certain class.
  4. Output Layer: Decides the classification of the input by selecting the class with the highest probability.

Amity School of Engineering & Technology

5 of 33

Architecture of PNN

The below image describes the architecture of PNN, which consists of four significant layers:

  • Input Layer
  • Pattern Layer
  • Summation Layer
  • Output Layer

Amity School of Engineering & Technology

6 of 33

1. Input Layer

The Input layer is the first stage of the network where external data is injected. There is one neuron in this layer that corresponds to one input feature. Input layer is responsible for taking the data in and passing it through for further processing.

2. Pattern Layer:

The Pattern layer is the part where the PNN architecture is distinguished from the others. Each Pattern layer neuron corresponds to each training example from the given dataset. The neurons from this layer use RBFs (Radial basis functions) to compute pattern similarity by example. The RBF calculates the distance between the input pattern and the training example in the feature space and outputs an activation value based on this distance.

Amity School of Engineering & Technology

7 of 33

3. Summation Layer:

The outputs of the Pattern layer are summarized and presented on the Summation layer. Each neuron of the layer Summation represents a class and this neuron sums up outputs from the Pattern neurons that correspond to this class. Basically, this layer is a weighted sum of the RBF activations for each class.

4. Output Layer:

The Output layer is the end portion of the network where the computed probabilities are normalized as the final result. Each neuron in the Output layer stands for a class, and this layer employs the soft max function to find the normalized probability of the corresponding class. The soft max function makes the probabilities across all classes add up to one, which gives a valid probability distribution over the classes.

Amity School of Engineering & Technology

8 of 33

Working Principle of Probabilistic Neural Networks

The core operation of Probabilistic Neural Networks (PNNs) revolves around the concept of the Parzen window, a non-parametric approach for estimating probability density functions (PDFs). This methodology is central to PNNs' ability to handle uncertainties and variabilities in input data, enabling them to make highly accurate decisions. 

Parzen Window Estimation

The Parzen window method, also known as kernel density estimation (KDE), is used in PNNs to estimate the PDF of a random variable in a non-parametric way. This method does not assume any underlying distribution for the data, which is particularly useful in real-world scenarios where the data may not follow known or standard distributions.

How it Works:

  • Kernel Function: At the heart of the Parzen window method is the kernel function, typically a Gaussian function, which smooths out the data points to create a continuous density function. Each point in the dataset contributes to the overall probability estimate, with its influence determined by the kernel function centered on that point.
  • Bandwidth Selection: The effectiveness of KDE depends significantly on the selection of the bandwidth (or the width of the kernel). A smaller bandwidth leads to a bumpier estimate that can capture subtle nuances in the data distribution, while a larger bandwidth provides a smoother estimate that may overlook finer details.

Amity School of Engineering & Technology

9 of 33

Example Problem: Classifying Animals

Let’s say we want to classify animals into three categories:

  • Mammal
  • Bird
  • Reptile

Based on these simplified features:

  1. Body Temperature (Warm=1, Cold=0)
  2. Has Feathers (Yes=1, No=0)
  3. Gives Birth (Yes=1, No=0)
  4. Can Fly (Yes=1, No=0)

Animal

Temp

Feathers

Birth

Fly

Class

Human

1

0

1

0

Mammal

Bat

1

0

1

1

Mammal

Eagle

1

1

0

1

Bird

Penguin

1

1

0

0

Bird

Lizard

0

0

0

0

Reptile

Snake

0

0

0

0

Reptile

Dataset (Training Samples)

Amity School of Engineering & Technology

10 of 33

How a PNN Works Here

  1. Input Layer:
    • Receives the feature vector of an unknown animal.
    • Example: X = [1, 0, 1, 0] (Warm-blooded, no feathers, gives birth, can't fly)

  • Pattern Layer:
    • Each training sample becomes a pattern neuron.
    • Uses a Gaussian kernel to measure similarity between input and training samples

  • Summation Layer:
    • Aggregates similarity scores for each class.
    • Computes the estimated probability of input belonging to each class.

  • Output Layer:
    • Chooses the class with the highest probability.

Amity School of Engineering & Technology

11 of 33

Example Classification

Given an unknown animal with:

  • Temp=1, Feathers=0, Birth=1, Fly=0[1, 0, 1, 0] input vector

Let’s compare it using Gaussian distance to:

  • Human: [1, 0, 1, 0] → perfect match
  • Eagle: [1, 1, 0, 1] → very different
  • Lizard: [0, 0, 0, 0] → very very different

The PNN would assign the highest probability to "Mammal", likely classifying this animal as a mammal.

Amity School of Engineering & Technology

12 of 33

Applications of Probabilistic Neural Networks (PNNs)

  • Medical Diagnosis: PNNs classify patient data into various diagnostic categories based on test results and symptoms in order to diagnose diseases.
  • Finance: By examining patterns in customer data, PNNs assist the financial sector in risk management and credit scoring.
  • Quality Control: Based on quality metrics, manufacturing processes classify products using PNNs into acceptable and defective categories.
  • Image Recognition: PNNs are helpful in security and surveillance systems because they can categorize images according to the presence or lack of specific features.

Amity School of Engineering & Technology

13 of 33

Advantages of Probabilistic Neural Networks

  • Speed: PNNs only need to pass the training data through once, hence, they are quicker than neural networks.
  • Efficiency in Classification: When a large sample size is available, they can yield high accuracy in classification tasks.
  • Robust to Noise: PNNs are robust to noise and variations in the input data because of their statistical design.

Amity School of Engineering & Technology

14 of 33

Limitations of Probabilistic Neural Networks

  • Scalability: PNNs are computationally expensive and slow at inference time as the size of the training data increases.
  • Overfitting: There is a risk of overfitting if the training data is not representative of the general population.
  • Memory Intensive: As each neuron in the pattern layer represents a training sample, PNNs can be memory intensive.

Amity School of Engineering & Technology

15 of 33

Convolutional Neural Network

  • Convolutional Neural Network (CNN) is an advanced version of artificial neural networks (ANNs), primarily designed to extract features from grid-like matrix datasets. This is particularly useful for visual datasets such as images or videos, where data patterns play a crucial role.

  • A Convolutional Neural Network (CNN) is a specialized type of deep neural network designed for processing structured grid data, such as images, videos, and audio. CNNs are widely used in computer vision tasks like image classification, object detection, and segmentation due to their ability to automatically learn spatial hierarchies of features.

Amity School of Engineering & Technology

16 of 33

Architecture of a basic Convolutional Neural Network (CNN)

Amity School of Engineering & Technology

17 of 33

Layers in CNN Architecture

CNNs consist of multiple layers like the input layer, Convolutional layer, pooling layer, and fully connected layers.

  1. Input Layer: The raw image data is fed into the network.
  2. Convolutional Layer: Applies filters (kernels) to the input image to extract features like edges and textures.
  3. Activation Function (ReLU): Introduces non-linearity, allowing the network to learn complex patterns.
  4. Pooling Layer: Reduces the spatial dimensions (width and height) of the feature maps, decreasing computational load and helping to prevent overfitting.
  5. Fully Connected Layer: Flattens the pooled feature maps and connects them to a classifier to make final predictions.
  6. Output Layer: Produces the final output, such as class probabilities in classification tasks.

Amity School of Engineering & Technology

18 of 33

Advantages of CNNs

  1. Good at detecting patterns and features in images, videos, and audio signals.
  2. Robust to translation, rotation, and scaling invariance.
  3. End-to-end training, no need for manual feature extraction.
  4. Can handle large amounts of data and achieve high accuracy.

Disadvantages of CNNs

  1. Computationally expensive to train and require a lot of memory.
  2. Can be prone to overfitting if not enough data or proper regularization is used.
  3. Requires large amounts of labeled data.
  4. Interpretability is limited, it’s hard to understand what the network has learned.

Amity School of Engineering & Technology

19 of 33

Applications:

  • Image Classification (e.g., ResNet, VGG).
  • Object Detection (e.g., YOLO, Faster R-CNN).
  • Medical Imaging (e.g., tumor detection).
  • Natural Language Processing (e.g., text classification with 1D convolutions).

Challenges:

  • Requires large labeled datasets (mitigated by transfer learning).
  • Computationally expensive (GPUs/TPUs needed for training).

Amity School of Engineering & Technology

20 of 33

Recurrent Neural Network

  • A Recurrent Neural Network (RNN) is a type of artificial neural network designed to recognize patterns in sequences of data, such as time series, text, or speech. Unlike traditional feedforward neural networks, RNNs have connections that form directed cycles, allowing them to maintain a memory of previous inputs via hidden states. This makes them especially useful for tasks involving sequential data.

  • Recurrent Neural Networks (RNNs) are a type of artificial neural network designed to process sequences of data. They work especially well for jobs requiring sequences, such as time series data, voice, natural language, and other activities.

  • RNN works on the principle of saving the output of a particular layer and feeding this back to the input in order to predict the output of the layer.

Amity School of Engineering & Technology

21 of 33

Below is how you can convert a Feed-Forward Neural Network into a Recurrent Neural Network:

The nodes in different layers of the neural network are compressed to form a single layer of recurrent neural networks. A, B, and C are the parameters of the network.

Amity School of Engineering & Technology

22 of 33

Why Recurrent Neural Networks?

RNN were created because there were a few issues in the feed-forward neural network:

  • Cannot handle sequential data
  • Considers only the current input
  • Cannot memorize previous inputs

The solution to these issues is the RNN.

An RNN can handle sequential data, accepting the current input data, and previously received inputs. RNNs can memorize previous inputs due to their internal memory.

Amity School of Engineering & Technology

23 of 33

How Does Recurrent Neural Networks Work?

  • In Recurrent Neural networks, the information cycles through a loop to the middle hidden layer.
  • The input layer ‘x’ takes in the input to the neural network and processes it and passes it onto the middle layer. 
  • The middle layer ‘h’ can consist of multiple hidden layers, each with its own activation functions and weights and biases. If you have a neural network where the various parameters of different hidden layers are not affected by the previous layer, ie: the neural network does not have memory, then you can use a recurrent neural network.
  • The Recurrent Neural Network will standardize the different activation functions and weights and biases so that each hidden layer has the same parameters. Then, instead of creating multiple hidden layers, it will create one and loop over it as many times as required. 

Amity School of Engineering & Technology

24 of 33

How RNN Differs from Feedforward Neural Networks?

  • Feedforward Neural Networks (FNNs) process data in one direction from input to output without retaining information from previous inputs. This makes them suitable for tasks with independent inputs like image classification. However FNNs struggle with sequential data since they lack memory.

  • Recurrent Neural Networks (RNNs) solve this by incorporating loops that allow information from previous steps to be fed back into the network. This feedback enables RNNs to remember prior inputs making them ideal for tasks where context is important.

Amity School of Engineering & Technology

25 of 33

Types Of Recurrent Neural Networks

There are four types of RNNs based on the number of inputs and outputs in the network:

1. One-to-One RNN

This is the simplest type of neural network architecture where there is a single input and a single output. It is used for straightforward classification tasks such as binary classification where no sequential data is involved.

Amity School of Engineering & Technology

26 of 33

2. One-to-Many RNN

In a One-to-Many RNN the network processes a single input to produce multiple outputs over time. This is useful in tasks where one input triggers a sequence of predictions (outputs). For example in image captioning a single image can be used as input to generate a sequence of words as a caption.

3. Many-to-One RNN

The Many-to-One RNN receives a sequence of inputs and generates a single output. This type is useful when the overall context of the input sequence is needed to make one prediction. In sentiment analysis the model receives a sequence of words (like a sentence) and produces a single output like positive, negative or neutral.

Amity School of Engineering & Technology

27 of 33

4. Many-to-Many RNN

The Many-to-Many RNN type processes a sequence of inputs and generates a sequence of outputs. In language translation task a sequence of words in one language is given as input, and a corresponding sequence in another language is generated as output.

Amity School of Engineering & Technology

28 of 33

Variants of Recurrent Neural Networks (RNNs)

There are several variations of RNNs, each designed to address specific challenges or optimize for certain tasks:

1. Vanilla RNN

This simplest form of RNN consists of a single hidden layer where weights are shared across time steps. Vanilla RNNs are suitable for learning short-term dependencies but are limited by the vanishing gradient problem, which hampers long-sequence learning.

2. Bidirectional RNNs

Bidirectional RNNs process inputs in both forward and backward directions, capturing both past and future context for each time step. This architecture is ideal for tasks where the entire sequence is available, such as named entity recognition and question answering.

Amity School of Engineering & Technology

29 of 33

3. Long Short-Term Memory Networks (LSTMs)

Long Short-Term Memory Networks (LSTMs) introduce a memory mechanism to overcome the vanishing gradient problem. Each LSTM cell has three gates:

  • Input Gate: Controls how much new information should be added to the cell state.
  • Forget Gate: Decides what past information should be discarded.
  • Output Gate: Regulates what information should be output at the current step. This selective memory enables LSTMs to handle long-term dependencies, making them ideal for tasks where earlier context is critical.

4. Gated Recurrent Units (GRUs)

Gated Recurrent Units (GRUs) simplify LSTMs by combining the input and forget gates into a single update gate and streamlining the output mechanism. This design is computationally efficient, often performing similarly to LSTMs, and is useful in tasks where simplicity and faster training are beneficial.

Amity School of Engineering & Technology

30 of 33

Advantages of Recurrent Neural Networks

  • Sequential Memory: RNNs retain information from previous inputs, making them ideal for time-series predictions where past data is crucial. This capability is often called Long Short-Term Memory (LSTM).
  • Enhanced Pixel Neighborhoods: RNNs can be combined with convolutional layers to capture extended pixel neighborhoods improving performance in image and video data processing.

Limitations of Recurrent Neural Networks (RNNs)

While RNNs excel at handling sequential data, they face two main training challenges i.e., vanishing gradient and exploding gradient problem:

  1. Vanishing Gradient: During backpropagation, gradients diminish as they pass through each time step, leading to minimal weight updates. This limits the RNN’s ability to learn long-term dependencies, which is crucial for tasks like language translation.
  2. Exploding Gradient: Sometimes, gradients grow uncontrollably, causing excessively large weight updates that destabilize training. Gradient clipping is a common technique to manage this issue.

These challenges can hinder the performance of standard RNNs on complex, long-sequence tasks.

Amity School of Engineering & Technology

31 of 33

Applications of Recurrent Neural Networks

RNNs are used in various applications where data is sequential or time-based:

  • Time-Series Prediction: RNNs excel in forecasting tasks, such as stock market predictions and weather forecasting.
  • Natural Language Processing (NLP): RNNs are fundamental in NLP tasks like language modeling, sentiment analysis, and machine translation.
  • Speech Recognition: RNNs capture temporal patterns in speech data, aiding in speech-to-text and other audio-related applications.
  • Image and Video Processing: When combined with convolutional layers, RNNs help analyze video sequences, facial expressions, and gesture recognition.

Amity School of Engineering & Technology

32 of 33

Amity School of Engineering & Technology

33 of 33

Amity School of Engineering & Technology