1 of 68

Recurrent Neural Networks

Aaditya Prakash

Sep 26, 2018

RNN, LSTM, Seq2Seq, NMT and Attention

2 of 68

Art with deep recurrent neural networks

The crow crooked on more beautiful and free,

He journeyed off into the quarter sea.

his radiant ribs girdled empty and very –

least beautiful as dignified to see.

Rudi Levette Berice Lussa Hany Mareanne Chrestina Carissy Marylen Hammine Janye Marlise Jacacrie Hendred Romand Charienna Nenotto Ette Dorane Wallen Marly Darine Salina Elvyn Ersia Maralena Minoria Ellia Charmin Antley Nerille Chelon Walmor Evena Jeryly Stachon Charisa Allisa Anatha Cathanie Geetra Alexie Jerin Cassen Herbett Cossie Velen Daurenge Robester Shermond Terisa Licia Roselen Ferine Jayn Lusine Charyanne Sales Sanny Resa Wallon Martine Merus Jelen Candica Wallin Tel

Poetry

Baby Names

Image Captions

3 of 68

Art with deep recurrent neural networks

The crow crooked on more beautiful and free,

He journeyed off into the quarter sea.

his radiant ribs girdled empty and very –

least beautiful as dignified to see.

Poetry

4 of 68

Language Model

There are way too many histories once you’re into a sentence a few words! Exponentially many.

5 of 68

Language Model - Fix: Markov Assumption

Problem: Very small window gives bad prediction�Solution: Smoothing, attention (discussed later)

6 of 68

Language Model - Recurrent Model

7 of 68

Language Model - Recurrent Model

8 of 68

Language Model - Recurrent Model

9 of 68

Language Model - Recurrent Model

10 of 68

Artificial Neuron

11 of 68

Recurrent Neuron

12 of 68

Recurrent Neuron - Unrolled

An unrolled recurrent neural network.

13 of 68

RNN - Structure

14 of 68

Forward propagation through time

15 of 68

Backpropagation through time

16 of 68

Backpropagation through time

17 of 68

BPTT

18 of 68

BPTT

19 of 68

RNN: vanishing & exploding gradient

20 of 68

RNN - Structure

21 of 68

Solution - Long Memory and Short Memory

22 of 68

Solution - Long Memory and Short Memory

Cell State

Why state? E.g remember gender, so that proper pronoun can be used.

23 of 68

Solution - Long Memory and Short Memory

Forget Gate Layer

Why forget then? Perhaps new subject with different gender?

24 of 68

Solution - Long Memory and Short Memory

Input Gate Layer

25 of 68

Solution - Long Memory and Short Memory

Combine to make current state

26 of 68

Solution - Long Memory and Short Memory

Output Gate Layer

Why current input in the state? So that, things like plurality of subject can be determined.

27 of 68

Even better LSTM

LSTM with “peephole connections”

Gate layers look at the cell state. �Gers & Schmidhuber (2000)

28 of 68

GRU - Gated Recurrent Unit

Common cell state and hidden state

Combines the forget and input gates into a single update gate Cho, et al. (2014)

29 of 68

Learned Representation

2D Visualization of ‘vectors’ learned for sentences. Similar sentences are close together in ‘vector’ space. Sutskever et al, 2014

30 of 68

Word Vectors

  • To get dense representation of words.
  • These dense vectors encode the meaning of the word based on context.
  • Because --� You shall know a word by the company it keeps - J. R. Firth, 1957
  • Usually dimensions of these vectors are 100 to 1000 (much smaller than 100K)

31 of 68

Word Vectors

  • CBOW - Predict the target word given the source context words.
  • Skip-Gram - Predict the source context-words given the target words.
  • These dense vectors encode the meaning of the word based on context.
  • Because --� You shall know a word by the company it keeps - J. R. Firth, 1957
  • Usually dimensions of these vectors are 100 to 1000 (much smaller than 100K)

32 of 68

Sequence to Sequence

33 of 68

Sequence to Sequence

34 of 68

Sequence to Sequence

35 of 68

Applications - Plenty

  1. Dialog generation
  2. Machine translation
  3. Image captioning
  4. Paraphrasing
  5. Speech Recognition
  6. Handwriting Recognition

Reddit Comments - Demo

US Elections Dialog agent - Demo

Image captioning - Demo

Handwriting Recognition - Demo

36 of 68

Neural Machine Translation - Alignment

37 of 68

NMT - Attention

38 of 68

Single-modal learning Multi-modal Learning

Images

  • Classification
  • Segmentation
  • Detection

Text

  • Parsing
  • Translation
  • Question �Answering�

Images

Text

39 of 68

Single-modal learning Multi-modal Learning

Images

  • Classification
  • Segmentation
  • Detection

Text

  • Parsing
  • Translation
  • Question �Answering�

  1. Visual Question Answering�
  2. Image Captioning�
  3. Video summarization (Images + Audio)

40 of 68

Image Captioning - Show and Tell

  • Given an image, output possible sentences to describe the image.
  • The sentence could have varying length.
  • Use CNN for image modelling and RNN for language generation.

41 of 68

Image Captioning - Show and Tell

  • Given an image, output possible sentences to describe the image.
  • The sentence could have varying length.
  • Use CNN for image modelling and RNN for language generation.

42 of 68

Visual Attention, Show, attend & Tell paper

Let every step of an RNN pick information to look at from some larger collection of information

43 of 68

Attention Model in action

44 of 68

Visual Question Answering

VQA is a new dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer.

  • Fine grained recognition - “what kind of cheese?”
  • Object Detection - “How many bikes?”
  • Activity Recognition - “Is this man crying?”
  • Reasoning - “Is this pizza vegetarian?”
  • Common sense - “Does this person have 20/20 vision?”

45 of 68

Visual Question Answering

46 of 68

Visual Question Answering - Attention

Source: Jiasen

47 of 68

Soft Attention

48 of 68

Soft Attention

  • Still uses the whole input
  • Constrained to fix grid

49 of 68

Hard Attention

50 of 68

Hard Attention - Usecase

51 of 68

Neural Paraphrase Generation

Source: Yours kindly

52 of 68

Neural Paraphrase Generation

Source: Yours kindly

53 of 68

Beam Search (encourage diversity)

  • Greedy Search is efficient but not optimal.
  • Maybe the second best word is a good solution given rest of the words that will become part of the sequence.
  • Beam Search - Maintain ‘K’ hypotheses at a time.
  • Expand each hypothesis.
  • Pick top-K hypotheses at each time step.
  • Demo

54 of 68

Sequence to Sequence - Modality agnostic

55 of 68

Recurrence in learning

56 of 68

Recurrent Neural Networks - A recap

Vanilla Neural Networks

57 of 68

Recurrent Neural Networks - A recap

Image Captioning -�Sequence to Words

58 of 68

Recurrent Neural Networks - A recap

Sentiment Classification�sequence of words -> sentiment

59 of 68

Recurrent Neural Networks - A recap

Machine Translation�seq of words -> seq of words

60 of 68

Recurrent Neural Networks - A recap

Video classification (frame level)�VQA - ??? More on this later�

61 of 68

RMVA Recurrent Models of Visual Attention

- Glimpse sensor : bandwidth limited sensor of the input image. As an example, if the input image is of size 28x28 (height x width), the RAM may only be able to sense an area of size 8x8 at any given time-step, called glimpses

62 of 68

Unreasonable effectiveness of RNN LSTM�Images -

Even traditional areas where CNN has�done excellent are being improved by�use of RNN. �- Reads number left to right (steps)

Work by DeepMind�http://arxiv.org/abs/1412.7755

63 of 68

Unreasonable effectiveness of RNN LSTM�Literature -

  • Char level LSTM
  • Trained on all works of Shakespeare
  • 3-layer RNN with 512 hidden nodes
  • Learns perfect spelling !

64 of 68

Unreasonable effectiveness of RNN LSTM�Math & �Latex -

65 of 68

Unreasonable effectiveness of RNN LSTM�Math & �Latex &�Drawing

66 of 68

Unreasonable effectiveness of RNN LSTM�Linux�Source Code

67 of 68

Unreasonable effectiveness of RNN LSTM�Bible !

68 of 68

References

  1. http://karpathy.github.io/2015/05/21/rnn-effectiveness/
  2. http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  3. http://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/
  4. Neural Turing Machines, Graves et al http://arxiv.org/pdf/1410.5401v2.pdf
  5. www.technologyreview.com/view/532156/googles-secretive-deepmind-startup-unveils-a-neural-turing-machine/
  6. Recurrent Models of Visual Attention, Mnih et al, http://papers.nips.cc/paper/5542-recurrent-models-of-visual-attention.pdf
  7. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, Kelvin Xu et al, http://arxiv.org/pdf/1502.03044v2.pdf
  8. RNN Tutorial Nervana Systems, https://www.nervanasys.com/recurrent-neural-networks/
  9. Cho, NMT Tutorial http://nlp.stanford.edu/projects/nmt/Luong-Cho-Manning-NMT-ACL2016-v4.pdf
  10. Word2Vec Tutorial https://www.tensorflow.org/tutorials/word2vec