What could be the Data-structures of the Mind?
Rina Panigrahy
Mind-World interaction.
Algorithmic understanding, not physiological.
World
Mind
Phenomena
(sensory) Inputs
environment
Datastructures?
Deep network?
One/Many Modules?
Concepts?
Memory table?
(Tabula Rasa)
Predictions/actions
with Badih Ghazi, Joshua Wang, ICML 2019
Modular Network
Thickness
Color
Line
Circle
Eye
Human Head
Human
Nose
Hair
Human Torso
Human Hand
Tiger Eye
Human Eye
Human Hair
Human Nose
Tiger Head
Tiger
Want ‘sketch’
Thickness
Color
Line
Circle
Eye
Human Head
Human
Nose
Hair
Human Torso
Human Hand
Tiger Eye
Human Eye
Human Hair
Human Nose
Tiger Head
Tiger
Assume one layer of Modules. Use random subspace embedding
.......
x1
x2
xN
M1
M2
MN
Modules
All in one layer
Only few fire
Combine into a sketch (sparse mostly 0’s)
Assume one layer of Modules.
+
+.......+
=
R1
R2
RN
x1
x2
xN
y
M1
M1
MN
Modules
Dictionary Learning: y = r1 x1 + …. rN xN : x is sparse
Here: y = R1 x1 + …. RN xN
Most xi are 0. Others sparse.
Multilayer: make recursive
Recovery: Assume one layer of Modules.
+
+.......+
=
R1
R2
RN
x1
x2
xN
y
. Tiger
.Meeting with Person X on topic Y
.Person X
.Topic Y
.Person
Words cluster/subspace
Animals cluster/subspace
Furniture
Memory and Modularity
Sketch Repository
Knowledge graph
Modular Network
Thickness
Color
Line
Circle
Eye
Human Head
Nose
Hair
Human Torso
Human Hand
Tiger Eye
Human Eye
Human Hair
Human Nose
Tiger Head
Tiger
Human
Recursive sketching: Benefits (provable)
Sketch based Modular Continual Learning Architecture(paper)?
phenomena
sketch
context
bucket
new phenomena
Routing-Module (OS)
LSH Table (modules)
Environment
Input
Environment
Output
Buckets contain programs,
pointers to frequently co-occurring sketches, expected rewards, etc..
F function pays attention to some fields for example: facial features instead of clothes
World
World
Mind
New module, concept, knowledge-graph
sketch
context
bucket
Routing-Module (OS)
LSH Table (modules)
Mind
Compare with
[Sparse Distr. Memories, Kanerva 02]
Traversal over a graph of concepts: Pathways
Sketch is transformed by FF layer in each node. Routed via hashing. Sketches may be combined using attention
Node is an LSH bucket (region in sketch space); has a small module. Possibly multiple parallel pathways
Neural Network
Memory
Neural Memory (with Xin Wang, Manzil Zaheer, AISTATS20)
Simplified Implementation.
Memory architecture: LSH table
Compact BERT models: memory helps model attend to contexts.
Experiments
Long Tail Image Classification: memory helps networks memorize few-shot examples.
Experiments
Connection to Switch Transformers: Neural memory where Hash function becomes routing function
Switch Transformer: A tiny FF layer in each bucket (rank k matrix).
FF
| |
| |
| |
| |
| |
LSH Sketch Memory
Rank k matrix per bucket
Attn
Neural Memory is a form of factorization:
wide layer with sparse activations
Wide dense layer
Wide layer with
sparse activations
output
input
A
B
C
B
Lookup Operation
C
Memory Table
input
input
Conclusion
key-value lookup
fuzzy key-value lookup
Canonical Memory based problems
key space
value space
key space
value space
Synthetic Experiments
key-value
fuzzy key-value
set coherence
Recursive sketching formula.
Use block random matrices
Recursive sketching: Floating modules
These are like reusable functions that are not necessarily “hardwired”
Curve Analyzer
Counting Unit
Clustering unit
Video: Slithering Snake, Walking Man, Flying butterfly
Very Deep Random networks may be Cryptographically hard
With A. Das, S. Gollapudi, R. Kumar.
f = Teacher Deep Network (black box)
f = Student Deep Network
Inputs
x∊Rd
Outputs
y∊R
Inputs
x∊Rd
Outputs
y∊R
^
Plot with random inputs (Gaussian) as function of depth. We prove:
ReLU Width = 100 Sign
Random Deep Network is like a Random function
Input x
Inputs x,y
θ
θ
θ
Modular teacher networks may be easier to learn (sparse cuts help)
With Surbhi Goel 2018
Sparse Cuts make it easier to learn a Teacher Network
Input x
Sparse Cut
(sparse flow
of information)
Few outputs fire