1 of 66

Towards Symbolic Generalization For Neural Reasoning

By Yuhuai Wu

2 of 66

What is reasoning？

3 of 66

Example 1: Theorem Proving

4 of 66

Color: union

Type: progression

Size: constant

Example 2: Raven’s Progressive Matrices

5 of 66

Example 3: Playing StarCraft

6 of 66

Reasoning

Symbolic Generalization

7 of 66

Generalization vs Symbolic Generalization

data distribution

training set

test set

training distribution

training set

test set

test� distribution

shared “rules”

8 of 66

Existing methods

Unrobust to distribution shift.
Extrapolation using interpolation.

Very expensive. Unlikely to cover all cases.

9 of 66

How？

Thesis of the talk:

Learning modular/symbolic representations.

Boosted by planning.

10 of 66

Outline of the Talk

Modularity learning.

Scattering Compositional Learner.
OPtions as REsponses (OPRE) agent.
MCTS on INT dataset.

General theorem proving

A better symbolic representation: IsarStep Dataset.

11 of 66

The Scattering Compositional Learner: Discovering Objects, Attributes and Relationships for Analogical Reasoning [paper]

Roger Grosse

Jimmy Ba

Honghua Dong

12 of 66

Color: union

Type: progression

Size: constant

13 of 66

Example

Size: Progression

Color: Progression

Size: Constant

Color: Constant

14 of 66

Key observation: Composition!

Color

Progression

Constant

Size

15 of 66

Key Idea: represent each component with a NN

16 of 66

Size

Progression

Constant

Color

Key Idea: represent each component with a NN

17 of 66

Progression

Constant

Size

Color

Key Idea: represent each component with a NN

Progression

Constant

18 of 66

Progression

Constant

Size

Color

Key Idea: represent each component with a NN

Progression

Constant

19 of 66

Attribute features

Relationship Features for every attribute

20 of 66

Why?

Progression/Constant neural network are encouraged to be compatible with both Color and size neural networks.

Recognizing relationship (e.g., Progression) regardless of the type of attribute → generalizing to novel attributes.

21 of 66

The Full Architecture

22 of 66

Balanced-RAVEN

23 of 66

PGM

24 of 66

Interpretable Representation: Attribute

25 of 66

Interpretable Representation: Relationship

26 of 66

Generalization to novel analogies

27 of 66

Summary

Introduce an architecture that learns to discover the underlying compositional structures:

Extracting objects.
Extract features for each object.
Recognize relationship among features.

Modular learning enables symbolic generalization.

28 of 66

Outline of the Talk

Modularity.

Scattering Compositional Learner.
OPtions as REsponses (OPRE) agent.*
MCTS on INT dataset.

General Theorem Proving

A better symbolic representation: IsarStep Dataset.

29 of 66

OPtions as REsponses �[paper]

Maria Eckstein

Remi Leblond

Sasha Vezhnevets

Joel Leibo

30 of 66

Motivation

HRL for generalization.

Learn a menu of behaviors.
Respond with the right behavior.
Generalize to novel problems.

beats

31 of 66

Motivation

Natural options in Multi-Agent Reinforcement Learning.

Options as Responses to different opponents.

beats

32 of 66

Running example

Running with Scissors:

Purple = rock, cyan = paper, brown = scissors
Players gather resources to control their commitment to a strategy (R/P/S)
Partial observability

Problems = different opponent policies

Behaviours = pick R/P/S, attack, scout

Generalisation = novel opponents

Zero-sum
concealed information =

opponent observation and inventory

non-transitive reward

beats

33 of 66

OPtions as REsponses (OPRE) agent

Policy over options p(z|x_≤t)

Estimate of concealed info

Options 𝜂(a|z,x_≤t)

Response to concealed info

Picking paper!

Opponent has rock!

34 of 66

KL

q(z|x_t′)

x_t′

concealed

information

V_𝜋=∑_zq(z|x_t′)c(x_≤t,z)

𝜋

x_t

p(z|x_≤t)

agent

observation

𝜇

𝜇 =∑_zp(z|x_≤t)𝜂(a|z,x_≤t)

𝜋=∑_zq(z|x_t′) 𝜂(a|z,x_≤t)

option

Credit assignment

35 of 66

Measuring generalisation in MARL

Held out agents

(specialists trained with pseudo-rewards)

Self-play

(6 agents)

Generalisation curves

36 of 66

R P

S

R bot OPRE

Episode start

R bot collects R;

OPRE moves left

OPRE sees that all R is missing; evidence for R bot jumps up

OPRE travels to P area and picks up all P

OPRE searches for the bot and tags it

Time in episode

p(z)

Latent activations

37 of 66

Summary

Options as Response framework in MARL, via two levels of credit assignment: 1. which strategy to use? 2. How well the strategy was executed.

Interpretable options.

Modular learning enables better generalization in RL.

38 of 66

Predefined Modular Structures

In those two works, we have seen modular structures from learning.

In the following two works of theorem proving, the (primitive) modular structures are given, as symbolic rules in theorem proving.

39 of 66

Outline of the Talk

Modularity.

Scattering Compositional Learner.
OPtions as REsponses (OPRE) agent.
MCTS on INT dataset.*

General Theorem Proving

A better symbolic representation: IsarStep Dataset.

40 of 66

INT: an INequality Theorem Proving Dataset for �Benchmarking Generalization [paper]

Roger Grosse

Jimmy Ba

Albert Jiang

41 of 66

Theorem proving challenge: Learning

Learn to generalize to theorems unseen in training.

An algorithm that overcomes distribution shift.

Systematic benchmarks for evaluation.

Our work: INT.

42 of 66

INT: An INequality Theorem Proving Dataset

Inequality theorems.
Various dimensions of generalizations.
Proof interface mimics modern ITP.
Lightweight, fast simulation.

43 of 66

Proof system

Proof as graph manipulation

43

A proof in LEAN

A proof in our system

44 of 66

Generator : An simple example

Initial conditions: A = A, B=B, C=C

Axioms orders: (A.C., A.C.)

2. A+B = B+A

Apply additional_commutativity (A, B)

3. (A+B)+C = C+(A+B)

Apply additional_commutativity (A+B, C)

Substitute 2 to 3

4. (A+B)+C = C+(B+A)

45 of 66

Six Dimensions of Generalization

Randomness in generator algorithm
Varying initial premises
Varying axiom combinations (AB),(BC) -> (AC)
Varying axiom orders (ABC) -> (ACB)
Number of unique axioms (K)
Varying proof lengths (L)

45

46 of 66

Proof Lengths

46

47 of 66

Improving Generalization

Planning helps generalize to new theorems?

48 of 66

Boosted by planning

48

49 of 66

Boosted by planning

49

50 of 66

Summary

Introduce a dataset that allows benchmarking six dimensions of generalization.

Randomness in generator algorithm
Varying initial premises
Varying axiom combinations
Varying axiom orders
Number of unique axioms (K)
Varying proof lengths (L)

Planning with symbolic actions greatly helps generalization.

51 of 66

Outline of the Talk

Modularity.

Scattering Compositional Learner.
OPtions as REsponses (OPRE) agent.
MCTS on INT dataset.

General Theorem Proving

A better symbolic representation: IsarStep Dataset.*

52 of 66

Modelling High-level Mathematical Reasoning �in Mechanised Declarative Proofs [paper]

Lei Yu

Laurence Paulson

Wenda Li

53 of 66

Declarative Proof vs. Tactic Proof

54 of 66

IsarStep Dataset

55 of 66

IsarStep Dataset

56 of 66

IsarStep Dataset Statistics

We have mined the largest repository of mechanised proofs:

Isabelle proof.
143K lemmas (increased by 10k annually).

Combined with Isabelle library, we have in total 204K lemmas, compared to:

71K in CoqGym, 11K in HolStep, 29K in HolList, 1.6K in GamePad.

Topics includes:

Foundational logic, advanced analysis, computer algebra, cryptographics, data structures.

850K data points for IsarStep.

57 of 66

Training as a seq-2-seq task

58 of 66

Word Embeddings

59 of 66

Summary

The largest theorem proving dataset.
Use declarative proof style to do more efficient search and planning.
More detailed examples of the agent performing multi-step reasoning found in the paper.

60 of 66

Outline of the Talk

Modularity.

Scattering Compositional Learner.
OPtions as REsponses (OPRE) agent.

Planning.

MCTS on INT dataset.
A better symbolic representation: IsarStep Dataset.*

Application

GNNs for branching heuristics with SharpSAT. *

61 of 66

Learning Branching Heuristics for Propositional Model Counting [paper]

Pashootan Vaezipoor, Gil Lederman, Yuhuai Wu, Chris Maddison, Roger Grosse, Edward Lee, Sanjit Seshia, Fahiem Bacchus

by

62 of 66

Sharp SAT Solver

Use a learned heuristic!

63 of 66

Training

Graph Neural Network for representing formulas.
Evolution Strategy

The exploration complexity of RL is a function of |A| and horizon, where as ES does not depend on both.

64 of 66

Generalizing to problem of much larger size

65 of 66

3 days (SharpSAT) vs. 9 hours (Learned)!!

66 of 66

Future directions

There are many opportunities!
Improve reasoning architectures for IsarStep.
Learning to discover modular structures with predefined primitive structure. Macro theorems!
Theoretical understanding of neural architectures for reasoning.