2 of 14

Possible obstacles to Human Level AI

Model’s do not consolidate memory

Large language models (LLMs) do not have an inherent form of compressed memory that consolidates from past interactions in a way similar to human memory consolidation. Instead, their typical memory systems rely on external mechanisms like Retrieval-Augmented Generation (RAG) or summarization. RAG grows in size over time rather than consolidate memory and a piece of “memory” has no way of improving the rest of the system unless retrieved. Summarization is a non-expanding tradeoff between context window usage and lossy compression.

Training and Inference are separate

Models with the amazing autocomplete capabilities of LLM’s cannot learning from single examples seen during test time, ie online learning. Few-shot-in-context learning sorta does this, but consumes more and more context, increasing test time memory and compute and decreases quality over time.

No World Model
No “Top of the Hierarchy” Objective function

3 of 14

Expectation:

Me: uhh . . . Hi

Bot: Hey, what’s your name?

Me: Carson

Bot: That’s a weird name

Me: What is my name?

Bot: it’s Carson, you just told me

4 of 14

Reality:

Me: My dog’s name is Fanny. Fanny is my dog.

bot: That is a beautiful name for a dog

Me: What is my dog’s name?

bot: I would not know that

Me: Fanny is my dog’s name

bot: How old is your dog?

Me: What is my dog’s name?

bot: It’s a secret

5 of 14

Short Term and Linearly Accumulating Memory Solutions

Complex Decision Trees connected to a Traditional Database

Other ways to solve the coreference resolution problem and to build conversational memory in real time

Keep previous timesteps, (t - n) -> t, in memory
Extending the input window to include all of conversation
Preserve the hidden state (ie LSTMs). Memory caches (ie Transformer XL, Memory Transformers)
Pointer Networks

6 of 14

External, Compressed, Long Term Memory

What This Is

Some AI tasks will require an agent to store neural memory with long

term persistence, decoupled from working memory, protected from unrelated

current computation, but accessible at the time of relevance.

so called “External Memory”.

The type that hypothetically would be appropriate for lifelong learning

7 of 14

Differentiable Neural Computer (Graves A, et al 2016)

8 of 14

Meta Learned Neural Memory

9 of 14

Meta-Learned Memory

Agent

What’s crazy dude’s name again?

It’s Carson

“carson”

“crazy dude’s name”

10 of 14

MNM Training

Agent

What’s that crazy dude’s name again?

It’s Carson

“carson”

“crazy dude’s name”

s_t, agents representation of the state

m_t, memory’s response to the state

h_t = controller(s_t,h_t-1)

11 of 14

Perceptron Learning Rule for Gradient-Free Fast Write to Memory @ test time

l is the current layer of the memory network, l - 1 is the previous layer
t is the current update step, t-1 is the prior update step
M is the weight matrix for layer l of the memory network
B is a learned learning rate (small if nothing to add to memory)
z is the current activation for layer l prior to update, z' is the target activation for layer l.

The meta-learning task during training is to learn the best B & z’ for immediately storing or not storing new memories

12 of 14

in-training

output of simple short term coreference resolution problem

> my name is adam

> hi adam

> what is my name?

> its adam

> my name is vicki

> hi vicki

> what is my name?

> its vicki

> my name is chloe

> hi chloe

> what is my name?

> its chloe

> my name is moose

> hi moose

> what is my name?

> its moose

13 of 14

Next Steps

@Carsonlam

carsonlam004 [at] gmail [dot] com

https://www.linkedin.com/in/carson-lam-6003b511/

Solve longer time horizon tasks
Memorization tasks in conversation are cool, but what would be super cool is reinforcement learning tasks in conversation
Inverse reinforcement learning from experts (ie regular talking people)

Say ‘hi carson’ at:

14 of 14

References

Alex Graves, Greg Wayne, and Ivo Danihelka. Neural turing machines. arXiv preprint arXiv:1410.5401, 2014

Alex Graves, Greg Wayne, Malcolm Reynolds, Tim Harley, Ivo Danihelka, Agnieszka GrabskaBarwinska, Sergio Gómez Colmenarejo, Edward Grefenstette, Tiago Ramalho, John Agapiou, ´ et al. Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626):471, 2016

Yan Wu, Gregory Wayne, Karol Gregor, and Timothy Lillicrap. Learning attractor dynamics for generative memory. In Advances in Neural Information Processing Systems, pages 9401–9410, 2018.

Yan Wu, Greg Wayne, Alex Graves, and Timothy Lillicrap. The kanerva machine: A generative distributed memory. ICLR 2018, 2018.