Towards Symbolic Generalization For Neural Reasoning
By Yuhuai Wu
What is reasoning?
Example 1: Theorem Proving
Color: union
Type: progression
Size: constant
Example 2: Raven’s Progressive Matrices
Example 3: Playing StarCraft
Reasoning
Symbolic Generalization
Generalization vs Symbolic Generalization
data distribution
training set
test set
training distribution
training set
test set
test� distribution
shared “rules”
Existing methods
How?
Thesis of the talk:
Outline of the Talk
The Scattering Compositional Learner: Discovering Objects, Attributes and Relationships for Analogical Reasoning [paper]
Roger Grosse
Jimmy Ba
Honghua Dong
Color: union
Type: progression
Size: constant
Example
Size: Progression
Color: Progression
Size: Constant
Color: Constant
Key observation: Composition!
Color
Progression
Constant
Size
Key Idea: represent each component with a NN
Size
Progression
Constant
Color
Key Idea: represent each component with a NN
Progression
Constant
Size
Color
Key Idea: represent each component with a NN
Progression
Constant
Progression
Constant
Size
Color
Key Idea: represent each component with a NN
Progression
Constant
Attribute features
Relationship Features for every attribute
Why?
The Full Architecture
Balanced-RAVEN
PGM
Interpretable Representation: Attribute
Interpretable Representation: Relationship
Generalization to novel analogies
Summary
Outline of the Talk
OPtions as REsponses �[paper]
Maria Eckstein
Remi Leblond
Sasha Vezhnevets
Joel Leibo
Motivation
beats
beats
beats
Motivation
beats
beats
beats
Running example
Running with Scissors:
Problems = different opponent policies
Behaviours = pick R/P/S, attack, scout
Generalisation = novel opponents
opponent observation and inventory
beats
beats
beats
OPtions as REsponses (OPRE) agent
Policy over options p(z|x≤t )
Estimate of concealed info
Options 𝜂(a|z,x≤t )
Response to concealed info
Picking paper!
Opponent has rock!
KL
q(z|xt′)
xt′
concealed
information
V𝜋=∑zq(z|xt′)c(x≤t ,z)
𝜋
xt
p(z|x≤t )
agent
observation
𝜇
𝜇 =∑z p(z|x≤t )𝜂(a|z,x≤t )
𝜋 =∑z q(z|xt′) 𝜂(a|z,x≤t )
option
Credit assignment
Measuring generalisation in MARL
Held out agents
(specialists trained with pseudo-rewards)
Self-play
(6 agents)
Generalisation curves
R P
S
R bot OPRE
Episode start
R bot collects R;
OPRE moves left
OPRE sees that all R is missing; evidence for R bot jumps up
OPRE travels to P area and picks up all P
OPRE searches for the bot and tags it
Time in episode
p(z)
Latent activations
Summary
Predefined Modular Structures
Outline of the Talk
INT: an INequality Theorem Proving Dataset for �Benchmarking Generalization [paper]
Roger Grosse
Jimmy Ba
Albert Jiang
Theorem proving challenge: Learning
INT: An INequality Theorem Proving Dataset
Proof system
43
A proof in LEAN
A proof in our system
Generator : An simple example
Axioms orders: (A.C., A.C.)
2. A+B = B+A
Apply additional_commutativity (A, B)
3. (A+B)+C = C+(A+B)
Apply additional_commutativity (A+B, C)
Substitute 2 to 3
4. (A+B)+C = C+(B+A)
Six Dimensions of Generalization
45
Proof Lengths
46
Improving Generalization
Boosted by planning
48
Boosted by planning
49
Summary
Outline of the Talk
Modelling High-level Mathematical Reasoning �in Mechanised Declarative Proofs [paper]
Lei Yu
Laurence Paulson
Wenda Li
Declarative Proof vs. Tactic Proof
IsarStep Dataset
IsarStep Dataset
IsarStep Dataset Statistics
Training as a seq-2-seq task
Word Embeddings
Summary
Outline of the Talk
Learning Branching Heuristics for Propositional Model Counting [paper]
Pashootan Vaezipoor, Gil Lederman, Yuhuai Wu, Chris Maddison, Roger Grosse, Edward Lee, Sanjit Seshia, Fahiem Bacchus
by
Sharp SAT Solver
Use a learned heuristic!
Training
Generalizing to problem of much larger size
3 days (SharpSAT) vs. 9 hours (Learned)!!
Future directions