1 of 24

Deep Learning (DEEP-0001)�

12 – RNNs

2 of 24

Biological sequence

Sequence learning problems

In sequence problems, the current output is dependent on the previous input and the length of the input may be not fixed.

Time series

Natural Language Processing

Computer Vision

Protein secondary structure prediction

stock market price predictions

Machine Translation, Language generation

Human Action Recognition

3 of 24

Recurrent Neural Networks

  • Recurrent Neural Networks (RNNs) are a family of neural networks for processing sequential data.
  • Input consists in a sequence of values (X) and can process sequences of variable length.
  • RNNs have internal memory (h), which allows capturing and processing sequential information.

4 of 24

Recurrent Neural Network

RNN

Simple/Standard

MLP

input

output

5 of 24

Recurrent Neural Network Structure

  • Left: typical RNN structure.
  • Right: the unfolding version where the previous information is transformed to the later time step.

6 of 24

Recurrent Neural Network Structure

7 of 24

RNN Models

RNN

MLP

8 of 24

RNN Models

9 of 24

Recurrent Neural Network

10 of 24

RNN Forward Pass

Wxh (hidden_size, vocab_size)

Whh (hidden_size, hidden_size)

Why (vocab_size, hidden_size)

11 of 24

Backpropagation Through Time

12 of 24

RNN – Challenges

  • Exploding gradients:
    • Gradient Clipping in the Truncated BPTT.

  • Vanishing Gradient problem (long-term dependency):
    • LSTM.

13 of 24

Long Short-Term Memory (LSTM)

  • LSTM is a type o RNN.
  • Designed to avoid the long-term dependency problem.
  • Gating Mechanism allows writing, reading, and erasing information from memory.

“I grew up in France… I speak fluent French.”

14 of 24

Long Short-Term Memory (LSTM)

15 of 24

Cell State (Long-Term)

  • Gates: remove or add information to the cell state.
  • An LSTM has three of these gates, to protect and control the cell state.

16 of 24

Forget gate layer

  • Example: predict the next word based on all the previous ones.
    • The cell state might include the gender of the present subject, so that the correct pronouns can be used.
    • When we see a new subject, we want to forget the gender of the old subject.

17 of 24

Input Gate Layer

  • Information to store in the cell state:
    • Sigmoid layer decides which values will be updated.
    • Tanh layer creates a vector of new candidate values.
  • In the example of our language model, we’d want to add the gender of the new subject to the cell state, to replace the old one we’re forgetting.

18 of 24

Input Gate Layer

  • It’s now time to update the old cell state, into the new cell state.
  • In the case of the language model, this is where we’d actually drop the information about the old subject’s gender and add the new information.

19 of 24

Output Gate (Short-Term)

  • First, we run a sigmoid layer which decides what parts of the cell state we’re going to output.
  • Then, we put the cell state through tanh and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to.
  • For the language model example, it might output whether the subject is singular or plural, so that we know what form a verb should be conjugated into if that’s what follows next.

20 of 24

Dimensionality

21 of 24

Variations - Bidirectional RNN

  • Two independent RNN together.
  • Allows the network to have both backward and forward information about the sequence at every time step.

22 of 24

Variations - Deep/Stacked RNN

23 of 24

Convolutional RNN

24 of 24

Sliding Window technique on RNN