1 of 66

Towards Symbolic Generalization For Neural Reasoning

By Yuhuai Wu

2 of 66

What is reasoning?

3 of 66

Example 1: Theorem Proving

4 of 66

Color: union

Type: progression

Size: constant

Example 2: Raven’s Progressive Matrices

5 of 66

Example 3: Playing StarCraft

6 of 66

Reasoning

Symbolic Generalization

7 of 66

Generalization vs Symbolic Generalization

data distribution

training set

test set

training distribution

training set

test set

test� distribution

shared “rules”

8 of 66

Existing methods

  • Unrobust to distribution shift.
  • Extrapolation using interpolation.
    • Very expensive. Unlikely to cover all cases.

9 of 66

How?

Thesis of the talk:

  • Learning modular/symbolic representations.
    • Boosted by planning.

10 of 66

Outline of the Talk

  • Modularity learning.
    • Scattering Compositional Learner.
    • OPtions as REsponses (OPRE) agent.
    • MCTS on INT dataset.
  • General theorem proving
    • A better symbolic representation: IsarStep Dataset.

11 of 66

The Scattering Compositional Learner: Discovering Objects, Attributes and Relationships for Analogical Reasoning [paper]

Roger Grosse

Jimmy Ba

Honghua Dong

12 of 66

Color: union

Type: progression

Size: constant

13 of 66

Example

Size: Progression

Color: Progression

Size: Constant

Color: Constant

14 of 66

Key observation: Composition!

Color

Progression

Constant

Size

15 of 66

Key Idea: represent each component with a NN

16 of 66

Size

Progression

Constant

Color

Key Idea: represent each component with a NN

17 of 66

Progression

Constant

Size

Color

Key Idea: represent each component with a NN

Progression

Constant

18 of 66

Progression

Constant

Size

Color

Key Idea: represent each component with a NN

Progression

Constant

19 of 66

Attribute features

Relationship Features for every attribute

20 of 66

Why?

  • Progression/Constant neural network are encouraged to be compatible with both Color and size neural networks.

  • Recognizing relationship (e.g., Progression) regardless of the type of attribute → generalizing to novel attributes.

21 of 66

The Full Architecture

22 of 66

Balanced-RAVEN

23 of 66

PGM

24 of 66

Interpretable Representation: Attribute

25 of 66

Interpretable Representation: Relationship

26 of 66

Generalization to novel analogies

27 of 66

Summary

  • Introduce an architecture that learns to discover the underlying compositional structures:
    • Extracting objects.
    • Extract features for each object.
    • Recognize relationship among features.

  • Modular learning enables symbolic generalization.

28 of 66

Outline of the Talk

  • Modularity.
    • Scattering Compositional Learner.
    • OPtions as REsponses (OPRE) agent.*
    • MCTS on INT dataset.
  • General Theorem Proving
    • A better symbolic representation: IsarStep Dataset.

29 of 66

OPtions as REsponses �[paper]

Maria Eckstein

Remi Leblond

Sasha Vezhnevets

Joel Leibo

30 of 66

Motivation

  • HRL for generalization.
    • Learn a menu of behaviors.
    • Respond with the right behavior.
    • Generalize to novel problems.

beats

beats

beats

31 of 66

Motivation

  • Natural options in Multi-Agent Reinforcement Learning.
    • Options as Responses to different opponents.

beats

beats

beats

32 of 66

Running example

Running with Scissors:

  • Purple = rock, cyan = paper, brown = scissors
  • Players gather resources to control their commitment to a strategy (R/P/S)
  • Partial observability

Problems = different opponent policies

Behaviours = pick R/P/S, attack, scout

Generalisation = novel opponents

  • Zero-sum
  • concealed information =

opponent observation and inventory

  • non-transitive reward

beats

beats

beats

33 of 66

OPtions as REsponses (OPRE) agent

Policy over options p(z|x≤t )

Estimate of concealed info

Options 𝜂(a|z,x≤t )

Response to concealed info

Picking paper!

Opponent has rock!

34 of 66

KL

q(z|xt′)

xt

concealed

information

V𝜋=∑zq(z|xt′)c(x≤t ,z)

𝜋

xt

p(z|x≤t )

agent

observation

𝜇

𝜇 =∑z p(z|x≤t )𝜂(a|z,x≤t )

𝜋 =∑z q(z|xt′) 𝜂(a|z,x≤t )

option

Credit assignment

35 of 66

Measuring generalisation in MARL

Held out agents

(specialists trained with pseudo-rewards)

Self-play

(6 agents)

Generalisation curves

36 of 66

R P

S

R bot OPRE

Episode start

R bot collects R;

OPRE moves left

OPRE sees that all R is missing; evidence for R bot jumps up

OPRE travels to P area and picks up all P

OPRE searches for the bot and tags it

Time in episode

p(z)

Latent activations

37 of 66

Summary

  • Options as Response framework in MARL, via two levels of credit assignment: 1. which strategy to use? 2. How well the strategy was executed.

  • Interpretable options.

  • Modular learning enables better generalization in RL.

38 of 66

Predefined Modular Structures

  • In those two works, we have seen modular structures from learning.

  • In the following two works of theorem proving, the (primitive) modular structures are given, as symbolic rules in theorem proving.

39 of 66

Outline of the Talk

  • Modularity.
    • Scattering Compositional Learner.
    • OPtions as REsponses (OPRE) agent.
    • MCTS on INT dataset.*
  • General Theorem Proving
    • A better symbolic representation: IsarStep Dataset.

40 of 66

INT: an INequality Theorem Proving Dataset for �Benchmarking Generalization [paper]

Roger Grosse

Jimmy Ba

Albert Jiang

41 of 66

Theorem proving challenge: Learning

  • Learn to generalize to theorems unseen in training.
    • An algorithm that overcomes distribution shift.

  • Systematic benchmarks for evaluation.
    • Our work: INT.

42 of 66

INT: An INequality Theorem Proving Dataset

  • Inequality theorems.
  • Various dimensions of generalizations.
  • Proof interface mimics modern ITP.
  • Lightweight, fast simulation.

43 of 66

Proof system

  • Proof as graph manipulation

43

A proof in LEAN

A proof in our system

44 of 66

Generator : An simple example

  1. Initial conditions: A = A, B=B, C=C

Axioms orders: (A.C., A.C.)

2. A+B = B+A

Apply additional_commutativity (A, B)

3. (A+B)+C = C+(A+B)

Apply additional_commutativity (A+B, C)

Substitute 2 to 3

4. (A+B)+C = C+(B+A)

45 of 66

Six Dimensions of Generalization

  • Randomness in generator algorithm
  • Varying initial premises
  • Varying axiom combinations (AB),(BC) -> (AC)
  • Varying axiom orders (ABC) -> (ACB)
  • Number of unique axioms (K)
  • Varying proof lengths (L)

45

46 of 66

Proof Lengths

46

47 of 66

Improving Generalization

  • Planning helps generalize to new theorems?

48 of 66

Boosted by planning

48

49 of 66

Boosted by planning

49

50 of 66

Summary

  • Introduce a dataset that allows benchmarking six dimensions of generalization.
    • Randomness in generator algorithm
    • Varying initial premises
    • Varying axiom combinations
    • Varying axiom orders
    • Number of unique axioms (K)
    • Varying proof lengths (L)

  • Planning with symbolic actions greatly helps generalization.

51 of 66

Outline of the Talk

  • Modularity.
    • Scattering Compositional Learner.
    • OPtions as REsponses (OPRE) agent.
    • MCTS on INT dataset.
  • General Theorem Proving
    • A better symbolic representation: IsarStep Dataset.*

52 of 66

Modelling High-level Mathematical Reasoning �in Mechanised Declarative Proofs [paper]

Lei Yu

Laurence Paulson

Wenda Li

53 of 66

Declarative Proof vs. Tactic Proof

54 of 66

IsarStep Dataset

55 of 66

IsarStep Dataset

56 of 66

IsarStep Dataset Statistics

  • We have mined the largest repository of mechanised proofs:
    • Isabelle proof.
    • 143K lemmas (increased by 10k annually).
  • Combined with Isabelle library, we have in total 204K lemmas, compared to:
    • 71K in CoqGym, 11K in HolStep, 29K in HolList, 1.6K in GamePad.
  • Topics includes:
    • Foundational logic, advanced analysis, computer algebra, cryptographics, data structures.
  • 850K data points for IsarStep.

57 of 66

Training as a seq-2-seq task

58 of 66

Word Embeddings

59 of 66

Summary

  • The largest theorem proving dataset.
  • Use declarative proof style to do more efficient search and planning.
  • More detailed examples of the agent performing multi-step reasoning found in the paper.

60 of 66

Outline of the Talk

  • Modularity.
    • Scattering Compositional Learner.
    • OPtions as REsponses (OPRE) agent.
  • Planning.
    • MCTS on INT dataset.
    • A better symbolic representation: IsarStep Dataset.*
  • Application
    • GNNs for branching heuristics with SharpSAT. *

61 of 66

Learning Branching Heuristics for Propositional Model Counting [paper]

Pashootan Vaezipoor, Gil Lederman, Yuhuai Wu, Chris Maddison, Roger Grosse, Edward Lee, Sanjit Seshia, Fahiem Bacchus

by

62 of 66

Sharp SAT Solver

Use a learned heuristic!

63 of 66

Training

  • Graph Neural Network for representing formulas.
  • Evolution Strategy
    • The exploration complexity of RL is a function of |A| and horizon, where as ES does not depend on both.

64 of 66

Generalizing to problem of much larger size

65 of 66

3 days (SharpSAT) vs. 9 hours (Learned)!!

66 of 66

Future directions

  • There are many opportunities!
  • Improve reasoning architectures for IsarStep.
  • Learning to discover modular structures with predefined primitive structure. Macro theorems!
  • Theoretical understanding of neural architectures for reasoning.