1 of 27

Neural Symbolic Learning and Reasoning - A survey and interpretation

Tarek R. Besold, Artur d’ Avila Garcez, Sebastian Bader, Howard Bowman, Pedro Domingos, Pascal Hitzler, Kai-Uwe Kühnberger et al.

Presented by:

Rohit Sanjay Inamdar

SBU ID: 114504643

2 of 27

Contents

  • Overview
  • Prolegomena of Neural-Symbolic Computation
  • NSCA for Neural-Symbolic Computing
  • NSI for Cognitive Science
  • Binding and First-Order Inference in an NS Framework
  • Connectionist First-Order Logic
  • Markov Logic Networks
  • Developments and Future Work
  • References

3 of 27

Overview

  • Neural(connectionist property of ‘neurons’) and Symbolic(logic programming - intuitionistic, modal, temporal) approach.
  • The integration of the ‘symbolic’ accounts for the robustness of such neural networks.
  • Aimed to address the shortcomings of neural networks and logic paradigms.
  • Neuro-symbolic AI seamlessly integrates the statistical learning and the logical reasoning concepts to build reliable and robust computational models.
  • The system should respond rapidly to any change/alteration in the context based on proper feedback and revise its analysis with minimal error(especially concerning nonmonotonic reasoning)

Kyle Hamilton, Aparna Nayak, Bojan Boži´c and Luca Longo

https://arxiv.org/pdf/2202.12205

4 of 27

Prolegomena of N-S Computation

  • Translational Algorithms: used to convert from symbolic to connectionist(neural form of logic) or vice versa(logical form of neural system), or a hybrid system of both.
  • Commonly used systems are feedforward and recurrent neural networks comprising knowledge evolution over multiple stages.
  • The idea is to create a hierarchical system of knowledge representation where higher levels represent more abstract and general concepts and lower levels represent more specific and concrete concepts
  • Fibring - an approach that combines multiple stages of networks in an upward, recursive manner.
  • Can be understood as a way of training of the network in a way.

5 of 27

Principles and mechanisms

  • Modularity: Independent tasks performed by each component
  • Hierarchical: Raw data from lower layers propagates to higher layers.
  • Four main tasks cyclically performed:

1. Translation of symbolic knowledge into network

2. Gaining additional knowledge from examples

3. Reasoning

4. Symbolic extraction from the network.

  • (Garcez et al.) Layers/network ensembles perform different tasks. For example -

Network A: P(X, Y)

Network B: Q(Z)

Output: P(X, Y) Λ Q(Z) → R(X, Y, Z)

6 of 27

Fibring

  • Composition of interconnected neurones
  • Mapping is done vertically among layers of ensembles
  • A fibring function defined how the network behaves and the connectivity among the layers.
  • Diverse reasoning, inferences, knowledge evolution occur simultaneously.
  • Single algorithm is applied throughout the network on every level.

Conceptual overview of Neuro Symbolic System (Garcez et al.)

7 of 27

NSCA for Neuro-Symbolic Computing (de Penning et al.)

  • An agent(or a cognitive architecture) for integrating neural and symbolic systems
  • Usually a difficult task for integrating these two in real world.
  • The aim is to integrate and model complex cognitive abilities.
  • ANNs are commonly used in such models to learn and reason about knowledge of the world, represented as symbolic logic
  • Algorithms map logical theories T on a network N.
  • Network N computes the logical consequences of T.
  • T also serves as background knowledge for training the network.

8 of 27

NSCA Mechanisms

  • Two components: Temporal logic theory T and Restricted Boltzmann Machine(RBM) network N
  • The RBM defines a probability distribution:

P(V = v, H = h) where, V - visible layer; H - hidden layer

v,h - vectors encoded in the layers

v encodes data into binary or real values

h encodes the posterior probability P(H|v)

  • Aids in reconstructing data vectors from inconsistent/incomplete data.
  • Product of Experts” a combination of hidden layers that constrain the high dimensions of data, which aid in learning complex data relationships.

9 of 27

Recurrent Temporal RBM (Sutskever, Hinton, Taylor, 2009)

  • Specialized variant of general RBM.
  • Uses recurrent relations in hidden layer units in specific time intervals(t and t-1) to encode temporal rules into hypothesis of previous applied rules
  • Mechanism used: Bayesian inference

Each hidden unit Hj is a hypothesis of rule Rj

rule Rj computes the posterior probability i.e:

P(R|B=b, Rt-1 = rt-1) where B=b: beliefs observed in the visible layer

  • Using Gaussian distribution, most applicable rules rare selected:

r P(R|B=b, Rt-1 = rt-1)

  • Network training is done by monitoring the difference between the observed and inferred beliefs(using Contrastive Divergence & Backpropagation).

10 of 27

RTRBM Scenario

Conditions:

(1) (Weather > good) //belief i.e a pre-defined data feature

meaning: the weather is at least good

Scenario:

(2) ApproachingIntersection ^ (ApproachingTraffic = right)

meaning: the car is approaching an intersection and sometime in the future traffic is approaching from the right

(3) ((Speed > 0) ^ HeadingIntersection) S (DistanceIntersection < x) → ApproachingIntersection

meaning: if the car is moving and heading towards an intersection since it has been deemed close to the

intersection, then the car is approaching the intersection.

Assessment:

(4) ApproachingIntersection ^ (DistanceIntersection = 0) ^ (ApproachingTraffic = right) ^ (Speed = 0) →

(Evaluation = good)

meaning: if the car is approaching an intersection and arrives at the intersection when traffic is coming from the

right and stops then the trainee gets a good evaluation

  • Here, Rule (4) is an uncertain notion and solely depends on the person’s experience. Encoding it with RTRBM aids in learning a more objective value for distance ‘x’

11 of 27

Neuro-Symbolic Integration in and for Cognitive Science

  • Concept of incorporating Mental Models in Neuro Symbolic System.
  • Focus is laid on the way human brain “binds” together different sensory features of objects, creates a coherent representation of external world.
  • Initial research: temporal synchrony(Engel, Singer, 2001), conjunctive codes(O’Reilly, Busby, Soto et al., 2003) and convolution based (Thagard, Stewart, 2011)
  • Modern cognitive neuroscience also emphasizes the the role of emotion/body-state evaluations in reasoning.
  • Emotions such as ‘fear’ and ‘disgust’ can be characterized as ‘stamped’, for they produce a rich, affect-colored high dimension.

12 of 27

Neuro-Symbolic Integration in and for Cognitive Science(contd.)

  • ANN-based Computational Modelling plays a key role in this area.
  • Explains how cognitive capabilities are generated by brain. Hence coined the term as ‘glue’ between cognition and brain.
  • Two dominant traditional approaches: Symbolic Modelling and Network Modelling
  • Some argue that connectionist methods are limited in their cognitive modelling method. Four supporting cases(concerning PDP models) are:

1. Rule-guided problem solving: The process of modelling complex relations in a problem could poorly map errors based on preconditions.

2. Central Executive Function: Symbolic AI is capable of accurately model the central control system of the brain and perform tasks of overriding pre-existing responses to alter the strategies of achieving the goal or outcome.

3. Syntactic Structures: Connectionist models are prone to incorrectly interpret the syntactic structure of data, and thus only “learn” the pattern without any rule usage.

4. Compositionality: Connectionist models are unable to reflect the representational compositionality of data, and require to learn explicitly relationships between entities.

13 of 27

Binding and First-Order Inference in a Neural Symbolic Framework

  • Aims to address the shortcomings of ANNs by using FOL(First-Order-Logic)
  • Approach of integrating predicate logic in a recurrent neural network with symmetric weights.
  • Logic expressions are encoded in the activation, followed by encoding symbolic constraints and rules in the weights.
  • Output is therefore generated as an activation pattern.

14 of 27

Fodor, Pylyshyn Computational Model (1988)

  • Two characteristics deemed essential for modelling cognition:

1. Combinatorial syntax and semantics for mental representations: Recursive building of representations from atomic models. Non-atomic semantics should be function of the atomic semantics

2. Structure Sensitivity of processes: Operations used should adhere to the syntactic structure representation.

  • Limitations:
  • Implementation in connectionist systems resulted in neural damage
  • Lack of robustness
  • Limited learning capability
  • Need to use ad-hoc network engineering

15 of 27

Inference Specifications as Fixed Points of ANNs

  • Top-down approach: deciding on the possible results of reasoning process as stable states of ANN.
  • Symmetric weighted ANNs are commonly used to associate problem solutions with global minima.
  • Non symmetric ANNs, only used as long as emerging stable states are on par with the assumed solutions.
  • Working:
    • First Order rules are compiled into weights, or clamped onto the activation of visible neurones.
    • Post configuration, ANN begins gradient descent to identify the appropriate solution to the logic inference.
    • On reaching a stable state, output is an FOL Inference Chain.

16 of 27

ANN specifications of FOL Inference Chains

  • Two approaches to logical reasoning.
    • Model Theory: Check all possible models satisfying a knowledge base to infer a statement. Commonly used in ANNs for propositional satisfiability
    • Proof Theory: Try to prove a statement by applying inference rules, typically less complex.
  • Model theory uses grounded knowledge bases, which results in exponential increase of boolean variables(neurones). Infeasible to handle the generated outcomes.
  • Each clause generated from the FOL inference chain computation of ANN is either obtained from an LTM(Logical Technical Model) or by resolving previous chain clauses.
  • Variations of the inference chain can be used to arrive at the most likely explanation of a query.

17 of 27

Dynamic Binding

  • Effective mechanism to integrating neurones and logic objects.
  • Spatial(conjunctive) binding used because of non sensitivity to timing synchronization
  • Crossbars of neurones are used to bind FOL objects using special ensembles known as GP(General Purpose) Binders.
  • Binding is done in a nested, hierarchical manner resulting in nested FOL terms, clauses and resolution based proofs.
  • Number of binders ∝ size of proof
  • Also known as diagraph representation of logic objects.
  • Features(Pinkas et al., 2012):
    • Minimal neurone count
    • Fault tolerance
    • High expressive power

18 of 27

Connectionist First-Order Logic

  • Two main instances w.r.t neural learning:
    • Approximation (generate finite inferences) of TP (immediate consequence operator) associated with FOL program P.
    • Markov Logic as probabilistic FOL extension combining logic and graphic models.
  • Applying the TP operator to an FOL fact p(X), where ‘X’ is a component of the Herbrand Universe UL generates an infinite result containing infinitely many p(X) atoms.
  • Hence need to approximate TP.

19 of 27

Connectionist First-Order Logic (contd.)

  • For approximation, define a homeomorphic embedding from the space of interpretations into a compact set of real numbers.
  • Thus, for some Herbrand interpretation I,

where, A ∊ I, I ∊ L

  • Cb denotes the set of all embedded interpretations - a ‘binary’ representation of interpretation I for base ‘b’
  • Homeomorphism criteria (continuous, bijective and continuous inverse)
  • To get a correct embedding of TP, we have:
  • fP - real valued version of TP and preserves the structural information of the inferences.

20 of 27

Markov Logic Networks

  • Probabilistic extension of First Order Logic.
  • Key idea: reduce noise/uncertainty susceptibility by combining FOL with graphical models.
  • Expressed as a set of weighted FOL formulae with a set of constants, indicating the strength of a formula’s contribution to the probability distribution.
  • Network is formed such that it one node is assigned to a ‘ground atom(a predicate devoid of free variables)’

21 of 27

Markov Logic Networks (contd.)

22 of 27

Markov Logic Networks - SPNs

  • SPN: Sum Product Networks (Poon and Domingos, 2011)
  • Represented as directed acyclic graphs:

Leaves: variables; sum and products: internal nodes with weighted edges

e.g: junction tree(fig. a), Naive Bayes model (fig. b)

  • Hierarchical probability distribution
  • Number of links in the graph ∝ number of inference tasks performed
  • Linearly compute arbitrary probability queries.

23 of 27

Relational Sum-product Networks (RSPNs) (Nath & Domingos, 2015)

  • Extension over SPNs
  • Designed to handle relational domains such as social networks & cellular pathways.
  • Follows a type of object oriented approach to define a set of object classes.
  • Each relation can be represented as a sum-product network

24 of 27

Recent Developments and Future Work

Conceptors (Jaeger, 2014)

  • Features:
    • RNN processing modes are treated as state clouds
    • Processing mode is stabilized given that RNN states are filtered to remain in a state cloud
  • Multiplicity of processing modes on a single RNN, i.e top-down control on a bottom-up connectionist network
  • Selected learnt patterns are stabilized and regenerated by inserting conceptor filters.

25 of 27

Recent Developments (contd.)

Neural Turing Machine (Graves et al., 2014)

  • Von-Neumann Architecture based connectionist model.
  • RNNs coupled with large addressable memory units
  • Produces end-to-end differentiable and gradient descent trainable finite state machines.
  • Probabilistic mode of Read and Write operations
  • Sparse memory interaction, leading to biased access to a specified location.

26 of 27

Future Work

  • Anchoring Knowledge and Interaction in Multi-agent systems: Learning from environment, internal state reasoning etc should encompass a bidirectional approach i.e. high-level knowledge formation and low level interaction and sensing simultaneously, leading to modification of inferences by application of high-level logic rules via feedback.
  • Visualizing and Understanding Recurrent Networks: Recent proposals for understanding LSTMs such as - prediction and error types, lead to the presence of interpretable cells.

Character-level language models as an interpretable testbed for such cells.

  • Identifying and Exploring Differences in Complexity: The claim that the interaction between neural and symbolic paradigms address empirical differences in terms of performance, has to be examined in more detail.

27 of 27

References

  • Besold, Tarek R., Artur d'Avila Garcez, Sebastian Bader, Howard Bowman, Pedro Domingos, Pascal Hitzler, Kai-Uwe Kühnberger et al. “Neural-symbolic learning and reasoning: A survey and interpretation.” arXiv preprint arXiv:1711.03902 (2017). https://arxiv.org/abs/1711.03902

  • Kyle Hamilton, Aparna Nayak, Bojan Boži´c, Luca Longo “Is Neuro-Symbolic AI meeting its promise in Natural Language Processing? A structured Review” https://arxiv.org/abs/2202.12205