1 of 17

Memory Augmented

Neural Networks

Layton Hayes

2 of 17

Memory in Neural Networks: RNNs

(Recurrent Neural Networks)

  • Enables linking dependencies across examples / timesteps
  • Many different models, LSTM being very powerful / popular��LSTM (Long Short-Term Memory):
  • Each neuron stores some information about previous activations
  • Memory is updated to some degree on every example, but degree varies

3 of 17

Problems with RNNs

  • Short term memory -- memory isn’t designed to last very long
    • Why?
      • Memory is overwritten and forgotten to some degree at every step
      • Even if the degree of overwrite is small, it compounds quickly
      • Always some amount of distortion
  • Cell by cell updates means fragmentation
  • Difficult to compartmentalize distinct memories

All because the memory is built into the network.

4 of 17

Memory outside Neural Networks: MANNs

(Memory Augmented Neural Network)

  • Uses a neural network to interface an external memory
  • Designed to solve problems of RNNs from last slide

NTM (Neural Turing Machine):

  • Simple, early implementation of a MANN
  • Controller outputs vectors to control read�and write heads
  • r / w heads interact with external memory
  • Whole model is end-to-end differentiable

5 of 17

NTMs continued: how they work

Addressing Mechanism

6 of 17

NTM: reading and writing basic equations

Mt -> the N x M memory matrix,

wt -> vector of weights, length N,

rt -> read vector

et -> erase vector, length M, range (0,1)

at -> add vector, length M

Reading:

Writing:

erase:

add:

7 of 17

NTM vs LSTM: Copy problem

Input:�Sequence of length L, �then nothing for L steps.

Output:�Nothing for L steps,�then repeat input sequence.

8 of 17

NTM vs LSTM: Copy problem generalization

Trained on copy problem for sequences of length L

results for sequences of length L, L*2, L*4, etc.

NTM:

LSTM:

9 of 17

NTM vs LSTM: repeat copy problem

Input:�Sequence of length L, �number of repeats (X),�then nothing for L * X steps.

Output:�Nothing for L+1 steps,�then repeat input sequence�X times.

10 of 17

Exciting applications: one-shot learning

  • Neural networks are powerful, but require a ton of data
  • limits real world applications, small datasets are insufficient
  • training requires large amounts of computing power and time � (expensive and slow)
  • Can’t learn in real time very well, takes too long and too much data

Solution: don’t just learn; learn how to learn first.

11 of 17

One-shot Learning with Memory-Augmented Neural Networks

Input: Image, class of previous image

Output: class of current image

classes used, labels for each class, and specific samples are all shuffled between episodes

12 of 17

NTM++: Differentiable Neural Computer (DNC)

  • Improves upon NTM by using discrete memory interaction and explicit temporal connections
  • discrete memory updates reduce decay, enable longer memory retention
  • maintains end-to-end differentiability

13 of 17

DNC: more detail

14 of 17

DNC: task demonstrations

15 of 17

bAbI

16 of 17

Issues with MANNs

  • Memory size, addressing mechanism, and numbers of read and write heads are additional hyperparameters -- makes training more difficult
  • Scales poorly with size of memory
  • Additional complexity of implementation and experimental design

17 of 17

Source Papers

Neural Turing Machines (Dec 2014)

https://arxiv.org/abs/1410.5401

One-shot Learning with Memory-Augmented Neural Networks (May 2016)

https://arxiv.org/abs/1605.06065

Hybrid computing using a neural network with dynamic external memory (Oct 2016)

https://www.nature.com/articles/nature20101