1 of 55

1

Domain Model Learning in AI Planning

Tutorial in AAAI 2026

2 of 55

Part 2: Learning State Abstractions

Roni Stern

Ben Gurion University of the Negev

Some slides from: Christian Muise, Masataro Asai

3 of 55

Representation Learning

Learning a Symbolic Representation of the World

In general: hard task, ill-defined, deeply studied (ICLR?)

In particular: a symbolic representation for planning

Still hard, still ill-defined ☺

3

4 of 55

Learning Planning Domain Models

Planning

Problem

Formal Domain+Problem

(STRIPS, PDDL, PDDL+, RDDL, fSTRIPS,…)

Planning

Problem

Planning

Problem

Operator

Observation

Planning Domain Learner

5 of 55

Learning Planning Domain Models

Planning

Problem

Planning

Problem

Planning

Problem

Operator

Observation

Planning Domain Learner

domain.pddl

How to represent a state?
What operators do we have?
How do they work?

Model of the environment

6 of 55

Learning Planning Domain Models

domain.pddl

How to represent a state?
What operators do we have?
How do they work?

Model of the environment

(:predicates (on ?x ?y)

(smaller ?x ?y)

(clear ?x))

(:action move

:parameters (?x ?y ?z))

7 of 55

Learning Planning Domain Models

domain.pddl

How to represent a state?
What operators do we have?
How do they work?

Model of the environment

(:predicates (on ?x ?y)

(smaller ?x ?y)

(clear ?x))

(:action move

:parameters (?x ?y ?z))

What is the input?

8 of 55

Learning From Action Sequences

8

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

Move(A,B) changes some predicates of A or B that have been modified

Move(B,C) changes some predicates of B or C that have been modified

Insight: It’s same change!

Actually, this is an assumption

The “Object-Centric” approach

Learn how action affect their parameters
Suggest sufficient predicates to encode these effects

9 of 55

Types of Input for Representation Learning

9

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

Images

Text

Raw set of features

(e.g., sensor data)

Basic set of symbols

(e.g., goal description)

Low-level controllers

10 of 55

Learning From Action Sequences

Approach: an “Object-Centric” approach:

Learn how action affect their parameters
Suggest sufficient “features” to fit assumptions

10

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

Move(A,B) changes some predicates of A or B that have been modified

Move(B,C) changes some predicates of B or C that have been modified

Insight: It’s same change!

Actually, this is an assumption

11 of 55

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

Assumptions

Objects have a “state”

Actions affects the “states” of its parameters

- Changes in the state of an object is a Finite State Automata

Actions with the same name transition similarly

11

Do(x) changes the state of x to “done” (done(x))

If Do(x) changes the state of x to “done” (done(x))

then Do(y) changes the state of y to “done” (done(y))

Not Done

Done

Do(x)

Undo(x)

Intuition: a FSA state will be a predicate(x)

12 of 55

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

12

Example

(open c1)
(fetch-jack j1 c1)
(fetch-wrench wr1 c1)
(close c1)
(open c2)
(fetch-wrench wr2 c2)
(fetch-jack j2 c2)
(close c2)
(close c3)
(open c3)

open(o)

fetch-jack(_,o)

fetch-wrench(_,o)

close(o)

S1

S2

S3

S4

S5

S6

S7

S8

13 of 55

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

13

open(o)

fetch-jack(_,o)

fetch-wrench(_,o)

close(o)

S1

S2

S3

S4

S5

S6

S7

S8

S2,S3

S4,S5

S6,S7

S1

S8

Example

(open c1)
(fetch-jack j1 c1)
(fetch-wrench wr1 c1)
(close c1)
(open c2)
(fetch-wrench wr2 c2)
(fetch-jack j2 c2)
(close c2)
(close c3)
(open c3)

14 of 55

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

14

open(o)

fetch-jack(_,o)

fetch-wrench(_,o)

close(o)

S1

S2

S3

S4

S5

S6

S7

S8

S2,S3

S4,S5

S6,S7

S1

S8

Example

(open c1)
(fetch-jack j1 c1)
(fetch-wrench wr1 c1)
(close c1)
(open c2)
(fetch-wrench wr2 c2)
(fetch-jack j2 c2)
(close c2)
(close c3)
(open c3)

15 of 55

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

15

open(o)

fetch-jack(_,o)

fetch-wrench(_,o)

close(o)

S1

S2

S3

S4

S5

S6

S7

S8

S2,S3

S5

S4,S5

S7

S6,S7

S3

S1

S8

Example

(open c1)
(fetch-jack j1 c1)
(fetch-wrench wr1 c1)
(close c1)
(open c2)
(fetch-wrench wr2 c2)
(fetch-jack j2 c2)
(close c2)
(close c3)
(open c3)

16 of 55

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

16

open(o)

fetch-jack(_,o)

fetch-wrench(_,o)

close(o)

S1

S2

S3

S4

S5

S6

S7

S8

S2,S3,S4,

S5, S6,S7

S1

S8

Example

(open c1)
(fetch-jack j1 c1)
(fetch-wrench wr1 c1)
(close c1)
(open c2)
(fetch-wrench wr2 c2)
(fetch-jack j2 c2)
(close c2)
(close c3)
(open c3)

17 of 55

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

17

open(o)

fetch-jack(_,o)

fetch-wrench(_,o)

close(o)

S1

S2

S3

S4

S5

S6

S7

S8

S2,S3,S4,

S5, S6,S7

S1

S8

Example

(open c1)
(fetch-jack j1 c1)
(fetch-wrench wr1 c1)
(close c1)
(open c2)
(fetch-wrench wr2 c2)
(fetch-jack j2 c2)
(close c2)
(close c3)
(open c3)

18 of 55

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

18

open(o)

fetch-jack(_,o)

fetch-wrench(_,o)

close(o)

S1

S2

S3

S4

S5

S6

S7

S8

S2,S3,S4,

S5, S6,S7

S1,S8

Example

(open c1)
(fetch-jack j1 c1)
(fetch-wrench wr1 c1)
(close c1)
(open c2)
(fetch-wrench wr2 c2)
(fetch-jack j2 c2)
(close c2)
(close c3)
(open c3)

19 of 55

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

19

S2,S3,S4,

S5, S6,S7

S1,S8

Example

(open c1)
(fetch-jack j1 c1)
(fetch-wrench wr1 c1)
(close c1)
(open c2)
(fetch-wrench wr2 c2)
(fetch-jack j2 c2)
(close c2)
(close c3)
(open c3)

fetch_wrench(_,o)

open(o)

Close(o)

fetch_jack(_,o)

P1(o)

P2(o)

closed(o)?

opened(o)?

Additional details and extensions

Zero-parameter predicates (e.g., hand-empty)
Detecting predicates with multiple parameters (LOCM2)
Learning action costs (N-LOCM)

20 of 55

SIFT [Gosgens et al. ’25]

Assumption: domain is a well-formed STRIPS domain

20

If opened(door) is an effect of op(door)

the not(opened(door)) is a precondition

SIFT pseudo-code

While not done

Suggest possible predicates and affecting actions
Generate constraints based on traces
If the exists satisfying action model – done!

21 of 55

SIFT [Gosgens et al. ’25]

21

An action pattern represents

a predicate that is affected by the action

A feature represents a predicate and all the actions affecting it

Suggest possible predicates and affecting actions

22 of 55

SIFT [Gosgens et al. ’25]

22

Example

(open c1)
(fetch-jack j1 c1)
(fetch-wrench wr1 c1)
(close c1)
(open c2)
(fetch-wrench wr2 c2)
(fetch-jack j2 c2)
(close c2)

P(?t1)

2. Generate constraints based on traces

23 of 55

SIFT [Gosgens et al. ’25]

23

Example

(open c1)
(fetch-jack j1 c1)
(fetch-wrench wr1 c1)
(close c1)
(open c2)
(fetch-wrench wr2 c2)
(fetch-jack j2 c2)
(close c2)

P(?t1)

2. Generate constraints based on traces

24 of 55

SIFT [Gosgens et al. ’25]

24

Example

(open c1)
(fetch-jack j1 c1)
(fetch-wrench wr1 c1)
(close c1)
(open c2)
(fetch-wrench wr2 c2)
(fetch-jack j2 c2)
(close c2)

P(?t1)

2. Generate constraints based on traces

Is satisfiable?

Yes

25 of 55

SIFT [Gosgens et al. ’25]

25

Example

(open c1)
(fetch-jack j1 c1)
(fetch-wrench wr1 c1)
(close c1)
(open c2)
(fetch-wrench wr2 c2)
(fetch-jack j2 c2)
(close c2)

P(?t1)

2. Generate constraints based on traces

UNSAT!

Note: SAT check here is efficient (2-SAT)

26 of 55

SIFT [Gosgens et al. ’25]

Problem: too many possible “features”

Solution(*): infer object types!

26

SIFT pseudo-code

While not done

Suggest possible predicates and affecting actions
Generate constraints based on traces
If the exists satisfying action model – done!

27 of 55

SIFT [Gosgens et al. ’25]

Grouping action parameters to infer types

27

Example

(open c1)
(fetch-jack j1 c1)
(fetch-wrench wr1 c1)
(close c1)
(open c2)
(fetch-wrench wr2 c2)
(fetch-jack j2 c2)
(close c2)

open(?x1)

fetch-jack(?x2 ?x3)

fetch-wrench(?x4 ?x5)

close(?x6)

t1?

t2?

t3?

28 of 55

SIFT [Gosgens et al. ’25]

Grouping action parameters to infer types

28

Example

(open c1)
(fetch-jack j1 c1)
(fetch-wrench wr1 c1)
(close c1)
(open c2)
(fetch-wrench wr2 c2)
(fetch-jack j2 c2)
(close c2)

open(?t1)

fetch-jack(?t2 ?t1)

fetch-wrench(?t3 ?t1)

close(?t1)

29 of 55

SIFT [Gosgens et al. ’25]

Many additional details in the paper

Handling additional information on traces (extended traces)
Static predicates
Completeness theorems

…

29

SIFT pseudo-code

While not done

Suggest possible predicates and affecting actions
Generate constraints based on traces
If the exists satisfying action model – done!

30 of 55

SIFT

Pros:

Learn only from action traces!
No supervision required
Highly scalable (?)
Works reasonably on benchmarks

Discussion:

Is the well-formed STRIPS assumption reasonable?
Losing explainability?
Is not knowing anything about the states practical?

30

31 of 55

Learning From Text [Lindsay et al. ’17]

31

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

“He drove the truck from A to B”

“Then he drove from B to C”

“Finally, he picked up the package”

Framer (Lindsay et al. ‘17)

Parse sentences to “action templates”
Cluster sentences by similarities
Run LOCM considering each cluster as an action

32 of 55

Learning From Text [Lindsay et al. ’17]

32

33 of 55

Learning From Text [Lindsay et al. ’17]

33

34 of 55

Learning From Text [Lindsay et al. ’17]

34

LOCM

35 of 55

Learning From Text [Lindsay et al. ’17]

35

36 of 55

No LLM?!?!

36

37 of 55

Creating Planning Domain Models with LLMs

Zero shot approaches (prompt engineering)

(Oates et al. ‘24, Zhang et al. ‘24)

Generate multiple candidates and merge (Huang et al., 2024)
Generate-Test-Revise (Kambhampati et al., 2024)

37

ACL 2025

38 of 55

Learning From Images

38

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

Images

39 of 55

LatPlan [Masataro and Fukanaga ‘18]

What would be a good symbolic state for an image?

Propositional (i.e., Boolean vector)
Encodes relevant information for planning

39

Solution: the State AutoEncoder (SAE)

40 of 55

LatPlan [Masataro and Fukanaga ‘18]

What would be a good symbolic state for an image?

Propositional (i.e., Boolean vector)
Encodes relevant information for planning

40

Solution: the State AutoEncoder (SAE)

DNNs

Latent state representation

= the symbolic state

Use Gamble-Softmax so the latent representation is categorial (Booleans)

41 of 55

LatPlan [Masataro and Fukanaga ‘18]

What would be a good symbolic action?

Encodes the change in a pair of consecutive images
Allows predicting the next state and applicability

41

Solution: the Action AutoEncoder (AAE)

DNNs

Latent action representation

= the symbolic action

Use Gamble-Softmax so the latent representation is categorial (Booleans)

42 of 55

LatPlan [Masataro and Fukanaga ‘18]

What would be a good symbolic representation?

Propositional (i.e., Boolean vector)
Encodes relevant information for planning
Allows predicting the next state and applicability

42

DNN

43 of 55

LatPlan [Masataro and Fukanaga ‘18]

What would be a good symbolic representation?

Propositional (i.e., Boolean vector)
Encodes relevant information for planning
Allows predicting the next state and applicability
Has propositional preconditions and effects

43

44 of 55

Cube-Space AutoEncoder [Asai and Muise‘20]

What would be a good symbolic representation?

Propositional (i.e., Boolean vector)
Encodes relevant information for planning
Allows predicting the next state and applicability
Has propositional preconditions and effects

44

Encodes preconditions and effects “directly”

Many details in the paper

45 of 55

LatPlan and Cube-Space AutoEncoder

Pros:

Learn only from images!
No supervision required
Works reasonably on benchmarks

Discussion:

Losing formal guarantees?
Losing explainability?
Is it practical to not know your agent’s actions?
Is it practical to know every frame?

45

46 of 55

I-ROSAME [Xi et al., ‘24]

Input: symbolic representation, image traces
Output: domain model

46

47 of 55

Learning from Low-Level Control

47

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

Raw set of features

(e.g., sensor data)

Low-level controllers

(AKA Skills, Options, VLA?…)

48 of 55

Learning from Low-Level Skills [Konidaris et al. ’18]

48

Many details in the paper

Identify predicates based on affected state variables
Learning for probabilistic planning domains

….

🡺preconditions

🡺effects(*)

49 of 55

Learning from Low-Level Skills [Konidaris et al. ’18]

49

From skills to symbols: Learning symbolic representations for abstract high-level planning. G. Konidaris et al. JAIR 2018

50 of 55

Learning from Low-Level Skills [Konidaris et al. ’18]

50

From skills to symbols: Learning symbolic representations for abstract high-level planning. G. Konidaris et al. JAIR 2018

51 of 55

Predicate Invention for Bi-Level Planning [Silver et al. ’23]

Input: plan traces, and goal predicate symbols
Approach: predicate invention as program synthesis

Synthesize symbols, starting from goal predicates
Learn symbolic action, try to plan for given traces, optimize

How to guide the program synthesis process?

Optimize for plans similar in cost to the given traces
Optimize for plans that were easier to find (A* search nodes)

51

52 of 55

VisualPredicator [Liang et al., 2025]

Input: plan traces+images, goal predicate symbols
Approach: program synthesis + a Vision Language Model

Synthesize symbols also using VLM
Learn symbolic action, try to plan for given traces, optimize

52

53 of 55

VisualPredicator [Liang et al., 2025]

Input: plan traces+images, goal predicate symbols
Approach: program synthesis + a Vision Language Model

Synthesize symbols also using VLM
Learn symbolic action, try to plan for given traces, optimize

53

Strategy #1 (Discrimination)

“Explain why action a worked in state s and failed in s’”

Strategy #2 (Transition Modeling)

“Explain what has changed after doing action a in state s”

Strategy #3 (Unconditional Generation)

“Suggest useful compositions of existing predicates”

54 of 55

Types of Input for Representation Learning

54

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

Images

Text

Raw set of features

(e.g., sensor data)

Basic set of features

(e.g., goal & init description)

LOCM

SIFT

FRAMER

LLM-based

VisualPredictor

Skills2Symbols

LatPlan

Program

Synthesis

55 of 55

What’s Next?

Given: symbolic states, actions, and traces

Not given: actions’ preconditions and effects

55

Time	Session	Speaker
08:30–09:15	Introduction & Domain Learning Basics	Roni Stern
09:15–09:45	Learning State Abstractions	Roni Stern
09:45–10:30	Offline Learning Domain Models	Leonardo Lamanna
10:30–11:00	Coffee Break
11:00–11:45	Hands-on Session	Leonardo Lamanna
11:45–12:30	Online Learning and Open Challenges	Roni Stern