1 of 36

AI@MIT Workshop Series

Presentation based on Nikhil’s Coursera course “An Introduction to Practical Deep Learning”

Workshop 5:

Recurrent Neural Networks

2 of 36

Types of Networks

MLP (Multilayer Perceptron)

CNN

(Convolutional Neural Networks)

RNN

(Recurrent Neural Networks)

Sources: http://bit.ly/2GHV0uS, http://bit.ly/2G3ynDk, http://bit.ly/2GJG13N

3 of 36

Types of Networks

MLP (Multilayer Perceptron)

CNN

(Convolutional Neural Networks)

RNN

(Recurrent Neural Networks)

Sources: http://bit.ly/2GHV0uS, http://bit.ly/2G3ynDk, http://bit.ly/2GJG13N

4 of 36

Today’s Agenda

RNNs: Motivation & Mechanics
Tricks for Improvement
Architectures
Some Applications
PyTorch!

5 of 36

Review

Training Procedure

Initialize weights

Fetch a batch of data

Forward-pass

Cost

Backward-pass

Update weights

Sources: “An Introduction to Practical Deep Learning” Coursera Course

6 of 36

Review

Sources: “An Introduction to Practical Deep Learning” Coursera Course

Inference Procedure

Fetch data

Forward-pass

7 of 36

The man ate his mushrooms raw.

8 of 36

Recurrent Neural Network: A Better Idea

Designed to capture temporal dependence

Recursive and thus accepts varied-length inputs

Allows for temporal indifference when needed

9 of 36

Recurrent Neural Network: A Better Idea

10 of 36

Recurrent Neural Network: A Better Idea

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

11 of 36

RNN Architectures

12 of 36

Recurrent Neuron

13 of 36

Unrolling

14 of 36

Unrolling Example:

15 of 36

Issues with RNNs

Long-range dependencies
Exploding/vanishing gradients
Directionality
Hard to interpret

16 of 36

Vanishing/exploding gradients

17 of 36

Bidirectional RNNs

18 of 36

Deep RNNs

Layer 2

Layer 1

Layer L

19 of 36

LSTMs

20 of 36

LSTMs (Long-Short Term Memory modules)

Forget Gate

Memory Gates

Output Gate

21 of 36

GRUs

22 of 36

Attention

23 of 36

Attention: the Transformer

24 of 36

Transformers

25 of 36

Transformers

26 of 36

Applications

27 of 36

28 of 36

Task-Dependent

N.B. This is how word vectors are computed!

Encoding!

29 of 36

Non-Toxic

Toxic

30 of 36

31 of 36

32 of 36

Cool Results from Andrej Karpathy

33 of 36

Cool Results from Andrej Karpathy

34 of 36

Cool Results from Andrej Karpathy

35 of 36

Summary

RNNs allow us to process sequential data.

They keep a hidden state which encodes all previous computations.

Researchers are always looking for ways to improve long-range dependencies.

Tricks: Depth, Bidirectionality, Attention
Architectures: LSTM, GRU, Transformer

RNNs are good for more than just translation!!!

36 of 36

Thank you!

Log attendance at tinyurl.com/aimws5

and enjoy the lab!