1 of 29

DeepProbLog: Neural Probabilistic Logic Programming

Robin Manhaeve, Sebastijan Dumančić, Angelika Kimmig, Thomas Demeester, Luc De Raedt

DTAI, KU Leuven (BLG)

2 of 29

Probabilistic reasoning & High representational power

Probabilistic reasoning allows for uncertainty and robustness

BUT! flexiblity may be a problem

Differentiable models have high representational power

BUT! robustness may be a problem

3 of 29

Probabilistic reasoning & High representational power

Probabilistic reasoning allows for uncertainty and robustness

BUT! flexiblity may be a problem

Differentiable models have high representational power

BUT! robustness may be a problem

Why not merge the two?

4 of 29

ProLog

5 of 29

ProLog

Logic programming language, defines:

variables: {v₁, …, v_l}
terms: {t₁, …, t_m}, T := c | v | f(u₁, …, u_k)
atoms: q(t_i, …, t_n)
clauses: h :- b₁, …, b_i

burglary.	hears_alarm(mario).	alarm :- earthquake.
earthquake.	hears_alarm(gianna).	alarm :- burglary.
	calls(X) :- alarm, hears_alarm(X).

6 of 29

ProLog tasks: Inference

Is a given clause h :- b₁, …, b_i true?

bottom-up (from given evidence to target clause)
top-down (from target clause to given evidence)

Two key elements:

grounding realizing variable assignments
compilation generation of the inference space

7 of 29

Top-down inference: an example

burglary.
hearthquake.
hears_alarm(mario).
hears_alarm(gianna).
alarm :- earthquake.
alarm :- burglary.
calls(X) :- alarm, hears_alarm(X).

calls(mario)

alarm

hears_alarm(mario)

earthquake

burglary

|

&

8 of 29

Pro(b)Log

Now with probabilities

9 of 29

ProbLog

Prolog, with probabilities!

variables: {v₁, …, v_l}
terms: {t₁, …, t_m}, T := c | v | p::f(u₁, …, u_k)
atoms: p::q(t_i, …, t_n)
clauses: p::h :- b₁, …, b_i

0.1::burglary.	0.5::hears_alarm(mario).	alarm :- earthquake.
0.2::earthquake.	0.4::hears_alarm(gianna).	alarm :- burglary.
	calls(X) :- alarm, hears_alarm(X).

10 of 29

ProbLog tasks: Inference

Is a given clause h :- b₁, …, b_i true?

top-down (from target clause to given evidence)
Sentential Decision Diagram¹

inference tree as before, but with attached probabilities

[1] SDD: A new canonical representation of propositional knowledge bases Darwiche and Marquis, IJCAI 2011

* http://bit.ly/2pO9W6m

11 of 29

Top-down inference: an example

0.1::burglary.
0.2::earthquake.
0.5::hears_alarm(mario).
0.4::hears_alarm(gianna).
alarm :- earthquake.
alarm :- burglary.
calls(X) :- alarm, hears_alarm(X).

calls(mario)

alarm

hears_alarm(mario)

earthquake

burglary

|

&

0.2

0.1

0.5

12 of 29

Top-down inference: an example

0.1::burglary.
0.2::earthquake.
0.5::hears_alarm(mario).
0.4::hears_alarm(gianna).
alarm :- earthquake.
alarm :- burglary.
calls(X) :- alarm, hears_alarm(X).

calls(mario)

alarm

hears_alarm(mario)

earthquake

burglary

|

&

0.2

0.1

0.5

0.3

13 of 29

Top-down inference: an example

0.1::burglary.
0.2::earthquake.
0.5::hears_alarm(mario).
0.4::hears_alarm(gianna).
alarm :- earthquake.
alarm :- burglary.
calls(X) :- alarm, hears_alarm(X).

calls(mario)

alarm

hears_alarm(mario)

earthquake

burglary

|

&

0.2

0.1

0.5

0.3

0.15

14 of 29

(a)Pro(b)Log

Now with semirings

15 of 29

Algebraic ProbLog²

Associate each element in a ProbLog program with a semiring element.

[2] An Algebraic Prolog for Reasoning about Possible Worlds�Kimmig et al, AAAI 2011

(A, ⊕, ⊗, 0^⊕, 0^⊗)

A element of the semiring
⊕ semiring +
⊗ semiring x
0^⊕ semiring null element for ⊕
0^⊗ semiring null element for ⊗

16 of 29

Gradient semiring

Credit assignment on premises:

assign error to each premise
propagate error across the inference model

Gradient => Differentiable!

17 of 29

Gradient semiring: AND

Cross product between probabilities and gradients
Require initial gradients

18 of 29

Gradient semiring: OR

Sum of gradients
Require initial gradients

19 of 29

Gradient semiring: an example

0.1::burglary.
0.2::hearthquake.
0.5::hears_alarm(mario).
0.4::hears_alarm(gianna).
alarm :- earthquake.
alarm :- burglary.
calls(X) :- alarm, hears_alarm(X).

calls(mario)

alarm

hears_alarm(mario)

earthquake

burglary

|

&

0.2

0.1

0.5

0.3

0.15

20 of 29

<0, 1.0, 0>

<1.0, 0, 0>

Gradient semiring: an example

calls(mario)

alarm

hears_alarm(mario)

earthquake

burglary

0.2

0.1

0.5

0.3

0.15

-

???

<0, 0, 1.0>

⊕

???

-

⊗

target clause

gradient

probability

21 of 29

<0, 1.0, 0>

<1.0, 0, 0>

Gradient semiring: an example

calls(mario)

alarm

hears_alarm(mario)

earthquake

burglary

0.2

0.1

0.5

0.3

0.15

-

<0.1, 0.2, 0>

<0, 0, 1.0>

⊕

-

target clause

gradient

probability

???

⊗

22 of 29

<0, 1.0, 0>

<1.0, 0, 0>

Gradient semiring: an example

calls(mario)

alarm

hears_alarm(mario)

earthquake

burglary

0.2

0.1

0.5

0.3

0.15

-

<0.1, 0.2, 0>

<0, 0, 1.0>

⊕

-

target clause

gradient

probability

⊗

<0.5, 0.5, 0.3>

23 of 29

Neural (differentiable) predicates

Gradient opens the door for differentiable predicates!

Neural predicates on probabilistic differentiable models:

nn(M, x, Y)

nn family of predicates
M differentiable probabilistic model
x input record
Y output space

24 of 29

Neural (differentiable) predicates

Gradient opens the door for differentiable predicates!

Neural predicates on probabilistic differentiable models:

nn(M, x, Y)

nn family of predicates
M differentiable probabilistic model
x input record
Y output space

nn(ResNet, , {0, .. 9})

25 of 29

Experiments

Finally no more ProLog

26 of 29

Tasks

Addition

ProbLog program implementing addition (1 digit and 2 digit numbers)

Sort

bubble sort, neural predicate predicts output of bubble step

WAP

{permute, swap, operation}: decide which to take next

27 of 29

Addition

Learn sum of two numbers: ProbLog vs Differentiable model

28 of 29

Sorting

Learn step of bubble sort: DeepProbLog vs Differentiable Forth

29 of 29

Take-home messages

Learning with data + probabilistic logic programming shows�promising results

faster convergence
comparable results

Differentiable logic is a thing

hints at learning probabilities of given clauses