2 of 17

Memory in Neural Networks: RNNs

(Recurrent Neural Networks)

Enables linking dependencies across examples / timesteps
Many different models, LSTM being very powerful / popular��LSTM (Long Short-Term Memory):
Each neuron stores some information about previous activations
Memory is updated to some degree on every example, but degree varies

3 of 17

Problems with RNNs

Short term memory -- memory isn’t designed to last very long

Why?

Memory is overwritten and forgotten to some degree at every step
Even if the degree of overwrite is small, it compounds quickly
Always some amount of distortion

Cell by cell updates means fragmentation
Difficult to compartmentalize distinct memories

All because the memory is built into the network.

4 of 17

Memory outside Neural Networks: MANNs

(Memory Augmented Neural Network)

Uses a neural network to interface an external memory
Designed to solve problems of RNNs from last slide

NTM (Neural Turing Machine):

Simple, early implementation of a MANN
Controller outputs vectors to control read�and write heads
r / w heads interact with external memory
Whole model is end-to-end differentiable

5 of 17

NTMs continued: how they work

Addressing Mechanism

6 of 17

NTM: reading and writing basic equations

M_t -> the N x M memory matrix,

w_t -> vector of weights, length N,

r_t -> read vector

e_t-> erase vector, length M, range (0,1)

a_t -> add vector, length M

Reading:

Writing:

erase:

add:

7 of 17

NTM vs LSTM: Copy problem

Input:�Sequence of length L, �then nothing for L steps.

Output:�Nothing for L steps,�then repeat input sequence.

8 of 17

NTM vs LSTM: Copy problem generalization

Trained on copy problem for sequences of length L

results for sequences of length L, L*2, L*4, etc.

NTM:

LSTM:

9 of 17

NTM vs LSTM: repeat copy problem

Input:�Sequence of length L, �number of repeats (X),�then nothing for L * X steps.

Output:�Nothing for L+1 steps,�then repeat input sequence�X times.

10 of 17

Exciting applications: one-shot learning

Neural networks are powerful, but require a ton of data
limits real world applications, small datasets are insufficient
training requires large amounts of computing power and time � (expensive and slow)
Can’t learn in real time very well, takes too long and too much data

Solution: don’t just learn; learn how to learn first.

11 of 17

One-shot Learning with Memory-Augmented Neural Networks

Input: Image, class of previous image

Output: class of current image

classes used, labels for each class, and specific samples are all shuffled between episodes

12 of 17

NTM++: Differentiable Neural Computer (DNC)

Improves upon NTM by using discrete memory interaction and explicit temporal connections
discrete memory updates reduce decay, enable longer memory retention
maintains end-to-end differentiability

13 of 17

DNC: more detail

14 of 17

DNC: task demonstrations

16 of 17

Issues with MANNs

Memory size, addressing mechanism, and numbers of read and write heads are additional hyperparameters -- makes training more difficult
Scales poorly with size of memory
Additional complexity of implementation and experimental design

17 of 17

Source Papers

Neural Turing Machines (Dec 2014)

https://arxiv.org/abs/1410.5401

One-shot Learning with Memory-Augmented Neural Networks (May 2016)

https://arxiv.org/abs/1605.06065

Hybrid computing using a neural network with dynamic external memory (Oct 2016)

https://www.nature.com/articles/nature20101

1 of 17