1 of 27

Neural Symbolic Learning and Reasoning - A survey and interpretation

Tarek R. Besold, Artur d’ Avila Garcez, Sebastian Bader, Howard Bowman, Pedro Domingos, Pascal Hitzler, Kai-Uwe Kühnberger et al.

Presented by:

Rohit Sanjay Inamdar

SBU ID: 114504643

2 of 27

Contents

Overview
Prolegomena of Neural-Symbolic Computation
NSCA for Neural-Symbolic Computing
NSI for Cognitive Science
Binding and First-Order Inference in an NS Framework
Connectionist First-Order Logic
Markov Logic Networks
Developments and Future Work
References

3 of 27

Overview

Neural(connectionist property of ‘neurons’) and Symbolic(logic programming - intuitionistic, modal, temporal) approach.
The integration of the ‘symbolic’ accounts for the robustness of such neural networks.
Aimed to address the shortcomings of neural networks and logic paradigms.
Neuro-symbolic AI seamlessly integrates the statistical learning and the logical reasoning concepts to build reliable and robust computational models.
The system should respond rapidly to any change/alteration in the context based on proper feedback and revise its analysis with minimal error(especially concerning nonmonotonic reasoning)

Kyle Hamilton, Aparna Nayak, Bojan Boži´c and Luca Longo

https://arxiv.org/pdf/2202.12205

4 of 27

Prolegomena of N-S Computation

Translational Algorithms: used to convert from symbolic to connectionist(neural form of logic) or vice versa(logical form of neural system), or a hybrid system of both.
Commonly used systems are feedforward and recurrent neural networks comprising knowledge evolution over multiple stages.
The idea is to create a hierarchical system of knowledge representation where higher levels represent more abstract and general concepts and lower levels represent more specific and concrete concepts
Fibring - an approach that combines multiple stages of networks in an upward, recursive manner.
Can be understood as a way of training of the network in a way.

Give examples for translation algos: E.g: tensor factorization, knowledge distillation and N-S integration algorithms

Tractability: how efficiently a problem can be solved.

Each stage/agent are combined in an upward manner to represent relational knowledge; and downward to create specializations; "specialisations" refer to the creation of more specific and refined representations of concepts or knowledge by combining the networks/agents at different levels of the neural-symbolic system.

This process involves combining the knowledge and preferences of multiple agents at a lower level to create more abstract and general concepts at a higher level, which can then be further combined and refined to create more specialized representations. The idea is to create a hierarchical system of knowledge representation where higher levels represent more abstract and general concepts and lower levels represent more specific and concrete concepts.

5 of 27

Principles and mechanisms

Modularity: Independent tasks performed by each component
Hierarchical: Raw data from lower layers propagates to higher layers.
Four main tasks cyclically performed:

1. Translation of symbolic knowledge into network

2. Gaining additional knowledge from examples

3. Reasoning

4. Symbolic extraction from the network.

(Garcez et al.) Layers/network ensembles perform different tasks. For example -

Network A: P(X, Y)

Network B: Q(Z)

Output: P(X, Y) Λ Q(Z) → R(X, Y, Z)

6 of 27

Fibring

Composition of interconnected neurones
Mapping is done vertically among layers of ensembles
A fibring function defined how the network behaves and the connectivity among the layers.
Diverse reasoning, inferences, knowledge evolution occur simultaneously.
Single algorithm is applied throughout the network on every level.

Conceptual overview of Neuro Symbolic System (Garcez et al.)

7 of 27

NSCA for Neuro-Symbolic Computing (de Penning et al.)

An agent(or a cognitive architecture) for integrating neural and symbolic systems
Usually a difficult task for integrating these two in real world.
The aim is to integrate and model complex cognitive abilities.
ANNs are commonly used in such models to learn and reason about knowledge of the world, represented as symbolic logic
Algorithms map logical theories T on a network N.
Network N computes the logical consequences of T.
T also serves as background knowledge for training the network.

8 of 27

NSCA Mechanisms

Two components: Temporal logic theory T and Restricted Boltzmann Machine(RBM) network N
The RBM defines a probability distribution:

P(V = v, H = h) where, V - visible layer; H - hidden layer

v,h - vectors encoded in the layers

v encodes data into binary or real values

h encodes the posterior probability P(H|v)

Aids in reconstructing data vectors from inconsistent/incomplete data.
“Product of Experts” a combination of hidden layers that constrain the high dimensions of data, which aid in learning complex data relationships.

9 of 27

Recurrent Temporal RBM (Sutskever, Hinton, Taylor, 2009)

Specialized variant of general RBM.
Uses recurrent relations in hidden layer units in specific time intervals(t and t-1) to encode temporal rules into hypothesis of previous applied rules
Mechanism used: Bayesian inference

Each hidden unit H_j is a hypothesis of rule R_j

rule R_j computes the posterior probability i.e:

P(R|B=b, R^t-1 = r^t-1) where B=b: beliefs observed in the visible layer

Using Gaussian distribution, most applicable rules ‘r’ are selected:

r ∝ P(R|B=b, R^t-1 = r^t-1)

Network training is done by monitoring the difference between the observed and inferred beliefs(using Contrastive Divergence & Backpropagation).

10 of 27

RTRBM Scenario

Conditions:

(1) (Weather > good) //belief i.e a pre-defined data feature

meaning: the weather is at least good

Scenario:

(2) ApproachingIntersection ^ ⋄ (ApproachingTraffic = right)

meaning: the car is approaching an intersection and sometime in the future traffic is approaching from the right

(3) ((Speed > 0) ^ HeadingIntersection) S (DistanceIntersection < x) → ApproachingIntersection

meaning: if the car is moving and heading towards an intersection since it has been deemed close to the

intersection, then the car is approaching the intersection.

Assessment:

(4) ApproachingIntersection ^ (DistanceIntersection = 0) ^ (ApproachingTraffic = right) ^ (Speed = 0) →

(Evaluation = good)

meaning: if the car is approaching an intersection and arrives at the intersection when traffic is coming from the

right and stops then the trainee gets a good evaluation

Here, Rule (4) is an uncertain notion and solely depends on the person’s experience. Encoding it with RTRBM aids in learning a more objective value for distance ‘x’

11 of 27

Neuro-Symbolic Integration in and for Cognitive Science

Concept of incorporating Mental Models in Neuro Symbolic System.
Focus is laid on the way human brain “binds” together different sensory features of objects, creates a coherent representation of external world.
Initial research: temporal synchrony(Engel, Singer, 2001), conjunctive codes(O’Reilly, Busby, Soto et al., 2003) and convolution based (Thagard, Stewart, 2011)
Modern cognitive neuroscience also emphasizes the the role of emotion/body-state evaluations in reasoning.
Emotions such as ‘fear’ and ‘disgust’ can be characterized as ‘stamped’, for they produce a rich, affect-colored high dimension.

12 of 27

Neuro-Symbolic Integration in and for Cognitive Science(contd.)

ANN-based Computational Modelling plays a key role in this area.
Explains how cognitive capabilities are generated by brain. Hence coined the term as ‘glue’ between cognition and brain.
Two dominant traditional approaches: Symbolic Modelling and Network Modelling
Some argue that connectionist methods are limited in their cognitive modelling method. Four supporting cases(concerning PDP models) are:

1. Rule-guided problem solving: The process of modelling complex relations in a problem could poorly map errors based on preconditions.

2. Central Executive Function: Symbolic AI is capable of accurately model the central control system of the brain and perform tasks of overriding pre-existing responses to alter the strategies of achieving the goal or outcome.

3. Syntactic Structures: Connectionist models are prone to incorrectly interpret the syntactic structure of data, and thus only “learn” the pattern without any rule usage.

4. Compositionality: Connectionist models are unable to reflect the representational compositionality of data, and require to learn explicitly relationships between entities.

13 of 27

Binding and First-Order Inference in a Neural Symbolic Framework

Aims to address the shortcomings of ANNs by using FOL(First-Order-Logic)
Approach of integrating predicate logic in a recurrent neural network with symmetric weights.
Logic expressions are encoded in the activation, followed by encoding symbolic constraints and rules in the weights.
Output is therefore generated as an activation pattern.

14 of 27

Fodor, Pylyshyn Computational Model (1988)

Two characteristics deemed essential for modelling cognition:

1. Combinatorial syntax and semantics for mental representations: Recursive building of representations from atomic models. Non-atomic semantics should be function of the atomic semantics

2. Structure Sensitivity of processes: Operations used should adhere to the syntactic structure representation.

Limitations:
Implementation in connectionist systems resulted in neural damage
Lack of robustness
Limited learning capability
Need to use ad-hoc network engineering

15 of 27

Inference Specifications as Fixed Points of ANNs

Top-down approach: deciding on the possible results of reasoning process as stable states of ANN.
Symmetric weighted ANNs are commonly used to associate problem solutions with global minima.
Non symmetric ANNs, only used as long as emerging stable states are on par with the assumed solutions.
Working:

First Order rules are compiled into weights, or clamped onto the activation of visible neurones.
Post configuration, ANN begins gradient descent to identify the appropriate solution to the logic inference.
On reaching a stable state, output is an FOL Inference Chain.

16 of 27

ANN specifications of FOL Inference Chains

Two approaches to logical reasoning.

Model Theory: Check all possible models satisfying a knowledge base to infer a statement. Commonly used in ANNs for propositional satisfiability
Proof Theory: Try to prove a statement by applying inference rules, typically less complex.

Model theory uses grounded knowledge bases, which results in exponential increase of boolean variables(neurones). Infeasible to handle the generated outcomes.
Each clause generated from the FOL inference chain computation of ANN is either obtained from an LTM(Logical Technical Model) or by resolving previous chain clauses.
Variations of the inference chain can be used to arrive at the most likely explanation of a query.

17 of 27

Dynamic Binding

Effective mechanism to integrating neurones and logic objects.
Spatial(conjunctive) binding used because of non sensitivity to timing synchronization
Crossbars of neurones are used to bind FOL objects using special ensembles known as GP(General Purpose) Binders.
Binding is done in a nested, hierarchical manner resulting in nested FOL terms, clauses and resolution based proofs.
Number of binders ∝ size of proof
Also known as diagraph representation of logic objects.
Features(Pinkas et al., 2012):

Minimal neurone count
Fault tolerance
High expressive power

18 of 27

Connectionist First-Order Logic

Two main instances w.r.t neural learning:

Approximation (generate finite inferences) of T_P (immediate consequence operator) associated with FOL program P.
Markov Logic as probabilistic FOL extension combining logic and graphic models.

Applying the T_P operator to an FOL fact p(X), where ‘X’ is a component of the Herbrand Universe U_L generates an infinite result containing infinitely many p(X) atoms.
Hence need to approximate T_P.

19 of 27

Connectionist First-Order Logic (contd.)

For approximation, define a homeomorphic embedding from the space of interpretations into a compact set of real numbers.
Thus, for some Herbrand interpretation I,

where, A ∊ I, I ∊ L

C_b denotes the set of all embedded interpretations - a ‘binary’ representation of interpretation I for base ‘b’
Homeomorphism criteria (continuous, bijective and continuous inverse)
To get a correct embedding of T_P, we have:
f_P - real valued version of T_P and preserves the structural information of the inferences.

20 of 27

Markov Logic Networks

Probabilistic extension of First Order Logic.
Key idea: reduce noise/uncertainty susceptibility by combining FOL with graphical models.
Expressed as a set of weighted FOL formulae with a set of constants, indicating the strength of a formula’s contribution to the probability distribution.
Network is formed such that it one node is assigned to a ‘ground atom(a predicate devoid of free variables)’

21 of 27

Markov Logic Networks (contd.)

22 of 27

Markov Logic Networks - SPNs

SPN: Sum Product Networks (Poon and Domingos, 2011)
Represented as directed acyclic graphs:

Leaves: variables; sum and products: internal nodes with weighted edges

e.g: junction tree(fig. a), Naive Bayes model (fig. b)

Hierarchical probability distribution
Number of links in the graph ∝ number of inference tasks performed
Linearly compute arbitrary probability queries.

23 of 27

Relational Sum-product Networks (RSPNs) (Nath & Domingos, 2015)

Extension over SPNs
Designed to handle relational domains such as social networks & cellular pathways.
Follows a type of object oriented approach to define a set of object classes.
Each relation can be represented as a sum-product network

24 of 27

Recent Developments and Future Work

Conceptors (Jaeger, 2014)

Features:

RNN processing modes are treated as state clouds
Processing mode is stabilized given that RNN states are filtered to remain in a state cloud

Multiplicity of processing modes on a single RNN, i.e top-down control on a bottom-up connectionist network
Selected learnt patterns are stabilized and regenerated by inserting conceptor filters.

25 of 27

Recent Developments (contd.)

Neural Turing Machine (Graves et al., 2014)

Von-Neumann Architecture based connectionist model.
RNNs coupled with large addressable memory units
Produces end-to-end differentiable and gradient descent trainable finite state machines.
Probabilistic mode of Read and Write operations
Sparse memory interaction, leading to biased access to a specified location.

26 of 27

Future Work

Anchoring Knowledge and Interaction in Multi-agent systems: Learning from environment, internal state reasoning etc should encompass a bidirectional approach i.e. high-level knowledge formation and low level interaction and sensing simultaneously, leading to modification of inferences by application of high-level logic rules via feedback.
Visualizing and Understanding Recurrent Networks: Recent proposals for understanding LSTMs such as - prediction and error types, lead to the presence of interpretable cells.

Character-level language models as an interpretable testbed for such cells.

Identifying and Exploring Differences in Complexity: The claim that the interaction between neural and symbolic paradigms address empirical differences in terms of performance, has to be examined in more detail.

27 of 27

References

Besold, Tarek R., Artur d'Avila Garcez, Sebastian Bader, Howard Bowman, Pedro Domingos, Pascal Hitzler, Kai-Uwe Kühnberger et al. “Neural-symbolic learning and reasoning: A survey and interpretation.” arXiv preprint arXiv:1711.03902 (2017). https://arxiv.org/abs/1711.03902

Kyle Hamilton, Aparna Nayak, Bojan Boži´c, Luca Longo “Is Neuro-Symbolic AI meeting its promise in Natural Language Processing? A structured Review” https://arxiv.org/abs/2202.12205