UNIT -4
Recurrent Neural Networks
Recurrent Neural Network (RNN) Cell Architecture
Types of Recurrent Neural Networks
Mathematical Recurrent Neural Networks
Back Propagation Through Time(BPTT)
In BPTT, gradients are backpropagated through each time step. This is essential for updating network parameters based on temporal dependencies.
�1. Vanishing Gradient Problem�
Definition:
The Vanishing Gradient Problem occurs when gradients become very small (close to zero) during backpropagation.
Why it happens in RNN:
Effects:
Example:
�If gradient = 0.5 and it is multiplied through many steps:�0.5 × 0.5 × 0.5 × 0.5 → becomes almost 0
�2. Exploding Gradient Problem�
Definition:�The Exploding Gradient Problem occurs when gradients become very large during backpropagation.
Why it happens in RNN:
Effects:
Example:
�If gradient = 2 and multiplied repeatedly:�2 × 2 × 2 × 2 × 2 → becomes very large
Variants of Recurrent Neural Networks (RNNs)�
�Long Short-Term Memory �
�LSTM Architecture�
LSTM architectures involves the memory cell which is controlled by three gates:
�1. Forget Gate�
2. Input Gate
3. Output gate
�Applications of LSTM�
�Gated Recurrent Units�
��The GRU consists of two main gates:�
2.Reset Gate (rₜ) :
3. Candidate Hidden State
4. Final Hidden State
Difference Between LSTM or GRU
Bidirectional LSTMs
A Bidirectional LSTM (BiLSTM) consists of two separate LSTM layers:
Mathematically, the final output at time t is computed as:
�
Bidirectional RNNs
Example:
Consider the sentence: "I like apple. It is very healthy.“
In a traditional unidirectional RNN the network might struggle to understand whether "apple" refers to the fruit or the company based on the first sentence. However, a BRNN would have no such issue.
By processing the sentence in both directions, it can easily understand that "apple" refers to the fruit, thanks to the future context provided by the second sentence ("It is very healthy.").
1. Inputting a Sequence: A sequence of data points each represented as a vector with the same dimensionality is fed into the BRNN. The sequence may have varying lengths.
2. Dual Processing: BRNNs process data in two directions:
3. Computing the Hidden State: A non-linear activation function is applied to the weighted sum of the input and the previous hidden state creating a memory mechanism that allows the network to retain information from earlier steps.
4. Determining the Output: A non-linear activation function is applied to the weighted sum of the hidden state and output weights to compute the output at each step. This output can either be:
Convolutional Neural Network (CNN)
CNNs are composed of two main parts:
Components:
�Applications�