1 of 29

DeepProbLog: Neural Probabilistic Logic Programming

Robin Manhaeve, Sebastijan Dumančić, Angelika Kimmig, Thomas Demeester, Luc De Raedt

DTAI, KU Leuven (BLG)

2 of 29

Probabilistic reasoning & High representational power

  • Probabilistic reasoning allows for uncertainty and robustness
    • BUT! flexiblity may be a problem
  • Differentiable models have high representational power
    • BUT! robustness may be a problem

3 of 29

Probabilistic reasoning & High representational power

  • Probabilistic reasoning allows for uncertainty and robustness
    • BUT! flexiblity may be a problem
  • Differentiable models have high representational power
    • BUT! robustness may be a problem

Why not merge the two?

4 of 29

ProLog

5 of 29

ProLog

Logic programming language, defines:

  • variables: {v1, …, vl}
  • terms: {t1, …, tm}, T := c | v | f(u1, …, uk)
  • atoms: q(ti, …, tn)
  • clauses: h :- b1, …, bi

burglary.

hears_alarm(mario).

alarm :- earthquake.

earthquake.

hears_alarm(gianna).

alarm :- burglary.

calls(X) :- alarm, hears_alarm(X).

6 of 29

ProLog tasks: Inference

Is a given clause h :- b1, …, bi true?

  • bottom-up (from given evidence to target clause)
  • top-down (from target clause to given evidence)

Two key elements:

  • grounding realizing variable assignments
  • compilation generation of the inference space

7 of 29

Top-down inference: an example

burglary.

hearthquake.

hears_alarm(mario).

hears_alarm(gianna).

alarm :- earthquake.

alarm :- burglary.

calls(X) :- alarm, hears_alarm(X).

calls(mario)

alarm

hears_alarm(mario)

earthquake

burglary

|

&

8 of 29

Pro(b)Log

Now with probabilities

9 of 29

ProbLog

Prolog, with probabilities!

  • variables: {v1, …, vl}
  • terms: {t1, …, tm}, T := c | v | p::f(u1, …, uk)
  • atoms: p::q(ti, …, tn)
  • clauses: p::h :- b1, …, bi

0.1::burglary.

0.5::hears_alarm(mario).

alarm :- earthquake.

0.2::earthquake.

0.4::hears_alarm(gianna).

alarm :- burglary.

calls(X) :- alarm, hears_alarm(X).

10 of 29

ProbLog tasks: Inference

Is a given clause h :- b1, …, bi true?

  • top-down (from target clause to given evidence)
  • Sentential Decision Diagram1
    • inference tree as before, but with attached probabilities

[1] SDD: A new canonical representation of propositional knowledge bases Darwiche and Marquis, IJCAI 2011

* http://bit.ly/2pO9W6m

11 of 29

Top-down inference: an example

0.1::burglary.

0.2::earthquake.

0.5::hears_alarm(mario).

0.4::hears_alarm(gianna).

alarm :- earthquake.

alarm :- burglary.

calls(X) :- alarm, hears_alarm(X).

calls(mario)

alarm

hears_alarm(mario)

earthquake

burglary

|

&

0.2

0.1

0.5

12 of 29

Top-down inference: an example

0.1::burglary.

0.2::earthquake.

0.5::hears_alarm(mario).

0.4::hears_alarm(gianna).

alarm :- earthquake.

alarm :- burglary.

calls(X) :- alarm, hears_alarm(X).

calls(mario)

alarm

hears_alarm(mario)

earthquake

burglary

|

&

0.2

0.1

0.5

0.3

13 of 29

Top-down inference: an example

0.1::burglary.

0.2::earthquake.

0.5::hears_alarm(mario).

0.4::hears_alarm(gianna).

alarm :- earthquake.

alarm :- burglary.

calls(X) :- alarm, hears_alarm(X).

calls(mario)

alarm

hears_alarm(mario)

earthquake

burglary

|

&

0.2

0.1

0.5

0.3

0.15

14 of 29

(a)Pro(b)Log

Now with semirings

15 of 29

Algebraic ProbLog2

Associate each element in a ProbLog program with a semiring element.

[2] An Algebraic Prolog for Reasoning about Possible Worlds�Kimmig et al, AAAI 2011

(A, ⊕, ⊗, 0, 0)

  • A element of the semiring
  • ⊕ semiring +
  • ⊗ semiring x
  • 0 semiring null element for ⊕
  • 0 semiring null element for ⊗

16 of 29

Gradient semiring

Credit assignment on premises:

  • assign error to each premise
  • propagate error across the inference model

Gradient => Differentiable!

17 of 29

Gradient semiring: AND

  • Cross product between probabilities and gradients
  • Require initial gradients

18 of 29

Gradient semiring: OR

  • Sum of gradients
  • Require initial gradients

19 of 29

Gradient semiring: an example

0.1::burglary.

0.2::hearthquake.

0.5::hears_alarm(mario).

0.4::hears_alarm(gianna).

alarm :- earthquake.

alarm :- burglary.

calls(X) :- alarm, hears_alarm(X).

calls(mario)

alarm

hears_alarm(mario)

earthquake

burglary

|

&

0.2

0.1

0.5

0.3

0.15

20 of 29

<0, 1.0, 0>

<1.0, 0, 0>

Gradient semiring: an example

calls(mario)

alarm

hears_alarm(mario)

earthquake

burglary

0.2

0.1

0.5

0.3

0.15

-

-

???

<0, 0, 1.0>

???

-

target clause

gradient

probability

21 of 29

<0, 1.0, 0>

<1.0, 0, 0>

Gradient semiring: an example

calls(mario)

alarm

hears_alarm(mario)

earthquake

burglary

0.2

0.1

0.5

0.3

0.15

-

-

<0.1, 0.2, 0>

<0, 0, 1.0>

-

target clause

gradient

probability

???

22 of 29

<0, 1.0, 0>

<1.0, 0, 0>

Gradient semiring: an example

calls(mario)

alarm

hears_alarm(mario)

earthquake

burglary

0.2

0.1

0.5

0.3

0.15

-

-

<0.1, 0.2, 0>

<0, 0, 1.0>

-

target clause

gradient

probability

<0.5, 0.5, 0.3>

23 of 29

Neural (differentiable) predicates

Gradient opens the door for differentiable predicates!

Neural predicates on probabilistic differentiable models:

nn(M, x, Y)

  • nn family of predicates
  • M differentiable probabilistic model
  • x input record
  • Y output space

24 of 29

Neural (differentiable) predicates

Gradient opens the door for differentiable predicates!

Neural predicates on probabilistic differentiable models:

nn(M, x, Y)

  • nn family of predicates
  • M differentiable probabilistic model
  • x input record
  • Y output space

nn(ResNet, , {0, .. 9})

25 of 29

Experiments

Finally no more ProLog

26 of 29

Tasks

  • Addition
    • ProbLog program implementing addition (1 digit and 2 digit numbers)
  • Sort
    • bubble sort, neural predicate predicts output of bubble step
  • WAP
    • {permute, swap, operation}: decide which to take next

27 of 29

Addition

Learn sum of two numbers: ProbLog vs Differentiable model

28 of 29

Sorting

Learn step of bubble sort: DeepProbLog vs Differentiable Forth

29 of 29

Take-home messages

  • Learning with data + probabilistic logic programming shows�promising results
    • faster convergence
    • comparable results
  • Differentiable logic is a thing
    • hints at learning probabilities of given clauses