Ahmad Kalhor-University of Tehran
1
Chapter 5
Recurrent Neural Networks
Memory Neural Networks
NNs which associate an input pattern (or sequenced input patterns) to an output pattern(or sequenced output patterns) (Supervised Learning )
* such NNs make memories for pattern association.
Recurrent NN
x1….x2…x3…xN
Patterns are taken sequentially
(Often through the time)
y
Output pattern
Or Sequenced Patterns
Sequenced Patterns
Some Applications of RNNs
Ahmad Kalhor- University of Tehan
2
Given a sequence of words we want to predict the probability of each word given the previous words. Language Models allow us to measure how likely a sentence is, which is an important input for Machine Translation (since high-probability sentences are typically correct).
Machine Translation is similar to language modeling in that our input is a sequence of words in our source language (e.g. German). We want to output a sequence of words in our target language (e.g. English).
3. Speech Recognition
Given an input sequence of acoustic signals from a sound wave, we can predict a sequence of phonetic segments together with their probabilities.
4. Generating Image Descriptions
Ahmad Kalhor- University of Tehan
3
Challenges in learning of RNNs
MNNs with a sequenced input patterns
* RNNs make robust memories against distortion, disturbances and dimension variations of input patterns or different sampling rates.
Ahmad Kalhor- University of Tehran
4
Ahmad Kalhor- University of Tehan
5
Training RNNs
Ahmad Kalhor- University of Tehan
6
Back Propagation Through Time (BPTT)
Updating Rules
Loss function
A Typical simple RNN
Ahmad Kalhor- University of Tehan
7
Back Propagation Through Time (BPTT)
Ahmad Kalhor- University of Tehan
8
Main limitation in RNN
Ahmad Kalhor- University of Tehan
9
derivative of sigmoid functions is less than one.
In “BPTT” when the RNN is unfolded for many times the back-propagated gradient coefficient is vanished for the inputs taken at older times.
The learning for long dependencies is not effective any more.
Error gradients vanish exponentially quickly with the size of the time lag between important events
RNNs are good to make Short Term Memory
The clouds are in the SKY
implicit input 🡪 target
Ahmad Kalhor- University of Tehan
10
I grew up in France………....I speak fluent French
RNNs are not good to make Long Term Memory
?
RNN Extensions
Ahmad Kalhor- University of Tehan
11
LSTM (long short term memory)�Hochreitor & Shmidhuber 1997
Ahmad Kalhor- University of Tehan
12
The repeating module in a standard RNN contains a single layer.
� �
The repeating module in an LSTM contains four interacting layers.
�
Module in RNN
Module in LSTM
(ht is the same st)
ht =tanh(Uxt+W ht-1)
the LSTM can read, write and delete information from its memory.
Some Concepts
Ahmad Kalhor- University of Tehan
13
1. Cell state
Gates are a way to optionally let information through.
They are composed out of a sigmoid neural net layer and a pointwise multiplication operation.
The notation
Two key concepts in LSTMs
2. Gate
Step-by-Step LSTM Walk Through�
Ahmad Kalhor- University of Tehan
14
Step1: to decide what information we’re going to throw away from the cell state.
Ahmad Kalhor- University of Tehan
15
Step2: to decide what new information we’re going to store in the cell state.
Ahmad Kalhor- University of Tehan
16
Step3: drop the information about the old subject’s gender and add the new information, as we decided in the previous steps.
Ahmad Kalhor- University of Tehan
17
Step4: to decide what we’re going to output
Compact Form Equations
Ahmad Kalhor- University of Tehan
18
Ahmad Kalhor- University of Tehan
19
LSTM Learning methods
Ahmad Kalhor- University of Tehan
20
Ahmad Kalhor- University of Tehan
21
LSTM Forward and Backward Pass, Arun Mallya
Ahmad Kalhor- University of Tehan
22
Ahmad Kalhor- University of Tehan
23
Ahmad Kalhor- University of Tehan
24
Ahmad Kalhor- University of Tehan
25
Ahmad Kalhor- University of Tehan
26
Ahmad Kalhor- University of Tehan
27
Ahmad Kalhor- University of Tehan
28
Ahmad Kalhor- University of Tehan
29
Ahmad Kalhor- University of Tehan
30
Ahmad Kalhor- University of Tehan
31
Ahmad Kalhor- University of Tehan
32
Ahmad Kalhor- University of Tehan
33
Extended Versions
Ahmad Kalhor- University of Tehan
34
1. One popular LSTM variant, introduced by Gers & Schmidhuber (2000), is adding “peephole connections.” This means that we let the gate layers look at the cell state.
Ahmad Kalhor- University of Tehan
35
2. Another variation is to use coupled forget and input gates. Instead of separately deciding what to forget and what we should add new information to, we make those decisions together. We only forget when we’re going to input something in its place. We only input new values to the state when we forget something older.
Ahmad Kalhor- University of Tehan
36
2. A slightly more dramatic variation on the LSTM is the Gated Recurrent Unit, or GRU, introduced by Cho, et al. (2014). It combines the forget and input gates into a single “update gate.” It also merges the cell state and hidden state, and makes some other changes. The resulting model is simpler than standard LSTM models, and has been growing increasingly popular.
End of Chapter 5
Thank you
Ahmad Kalhor- University of Tehran
37