1 of 18

Neural Turing Machines (NTM)

2 of 18

Copy Task

  • Input: a random sequence of k-bit vectors with EOF at the bottom
  • Output: the same as input (without EOF)
  • No input when the network is outputting!

3 of 18

How LSTM performs?

  • Bad
  • Using 3-layers LSTM with 128 hidden states each
  • 8-bit random vectors, random length from 1 to 10
  • Loss function: the cross entropy between input and output

4 of 18

Why NTM is stronger than LSTM ?

  • Such tasks (e.g. Copy/ Reverse sequences) can be decomposed into two parts.
    • 1. operation on the input sequence (e.g. write/erase/shift/sharpen ...)
    • 2. memorize all the input sequence

  • Standard LSTM (RNN) need to do two components at the same time.
    • Store the input sequences.
      • LSTM has limited ability to memorize the long input sequences.
    • Memorize the operation.
      • Some operations (e.g. shift/sharpen) is hard to implemented.

5 of 18

Advantages of NTM over LSTM

  • In NTM, an external memory is proposed to store the input sequences.
    • Release the LSTM from storing the long input sequence.
  • Some operation (e.g. shift/sharpen ...) are designed to interact with the external memory.
  • The LSTM in NTM only need to memorize the how to operate on the stored input sequences.

6 of 18

Pipeline

7 of 18

Pipeline

8 of 18

Addressing

9 of 18

Content Addressing

10 of 18

Gate Interpolation

11 of 18

Convolution Shift

12 of 18

Sharpening

13 of 18

Pipeline

14 of 18

Read

15 of 18

Write (Erase/Add)

16 of 18

Results of NTM

  • Parameters
    • Controller: 1-layer RNN with 64 /128 hidden states
    • Memory Size: 20×8 / 128×20
    • 1 read head & 1 write head
    • 8 bit vectors & 1-10 / 1-15 sequence length
  • NTM can learn the “algorithm” of copying
    • First stage: get the input and write it to external memory sequentially
    • Second stage: output the stored info in external memory sequentially
  • Learning to copy the sequence, no matter what the sequence is.

read head

write head

time

Head Location w

17 of 18

Results of NTM

  • Very effective and accurate in copy task
    • Loss is nearly zero
    • Converged in ~4000 batches (10 sequences per batch)

LSTM

NTM

18 of 18