1 of 55

1

Domain Model Learning in AI Planning

Tutorial in AAAI 2026

2 of 55

Part 2: Learning State Abstractions

Roni Stern

Ben Gurion University of the Negev

Some slides from: Christian Muise, Masataro Asai

3 of 55

Representation Learning

Learning a Symbolic Representation of the World

In general: hard task, ill-defined, deeply studied (ICLR?)

In particular: a symbolic representation for planning

Still hard, still ill-defined ☺

3

4 of 55

Learning Planning Domain Models

Planning

Problem

Formal Domain+Problem

(STRIPS, PDDL, PDDL+, RDDL, fSTRIPS,…)

Planning

Problem

Planning

Problem

Operator

Observation

Observation

Observation

Observation

Planning Domain Learner

5 of 55

Learning Planning Domain Models

Planning

Problem

Planning

Problem

Planning

Problem

Operator

Observation

Observation

Observation

Observation

Planning Domain Learner

domain.pddl

  • How to represent a state?
  • What operators do we have?
  • How do they work?

Model of the environment

6 of 55

Learning Planning Domain Models

domain.pddl

  • How to represent a state?
  • What operators do we have?
  • How do they work?

Model of the environment

(:predicates (on ?x ?y)

(smaller ?x ?y)

(clear ?x))

(:action move

:parameters (?x ?y ?z))

7 of 55

Learning Planning Domain Models

domain.pddl

  • How to represent a state?
  • What operators do we have?
  • How do they work?

Model of the environment

(:predicates (on ?x ?y)

(smaller ?x ?y)

(clear ?x))

(:action move

:parameters (?x ?y ?z))

What is the input?

8 of 55

Learning From Action Sequences

8

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

Move(A,B) changes some predicates of A or B that have been modified

Move(B,C) changes some predicates of B or C that have been modified

Insight: It’s same change!

Actually, this is an assumption

The “Object-Centric” approach

  1. Learn how action affect their parameters
  2. Suggest sufficient predicates to encode these effects

9 of 55

Types of Input for Representation Learning

9

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

Images

Text

Raw set of features

(e.g., sensor data)

Basic set of symbols

(e.g., goal description)

Low-level controllers

10 of 55

Learning From Action Sequences

Approach: an “Object-Centric” approach:

  • Learn how action affect their parameters
  • Suggest sufficient “features” to fit assumptions

10

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

Move(A,B) changes some predicates of A or B that have been modified

Move(B,C) changes some predicates of B or C that have been modified

Insight: It’s same change!

Actually, this is an assumption

11 of 55

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

Assumptions

  1. Objects have a “state”

  • Actions affects the “states” of its parameters

- Changes in the state of an object is a Finite State Automata

  • Actions with the same name transition similarly

11

Do(x) changes the state of x to “done” (done(x))

If Do(x) changes the state of x to “done” (done(x))

then Do(y) changes the state of y to “done” (done(y))

Not Done

Done

Do(x)

Undo(x)

Intuition: a FSA state will be a predicate(x)

12 of 55

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

12

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)
  9. (close c3)
  10. (open c3)

open(o)

fetch-jack(_,o)

fetch-wrench(_,o)

close(o)

S1

S2

S3

S4

S5

S6

S7

S8

13 of 55

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

13

open(o)

fetch-jack(_,o)

fetch-wrench(_,o)

close(o)

S1

S2

S3

S4

S5

S6

S7

S8

S2,S3

S4,S5

S6,S7

S1

S8

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)
  9. (close c3)
  10. (open c3)

14 of 55

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

14

open(o)

fetch-jack(_,o)

fetch-wrench(_,o)

close(o)

S1

S2

S3

S4

S5

S6

S7

S8

S2,S3

S4,S5

S6,S7

S1

S8

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)
  9. (close c3)
  10. (open c3)

15 of 55

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

15

open(o)

fetch-jack(_,o)

fetch-wrench(_,o)

close(o)

S1

S2

S3

S4

S5

S6

S7

S8

S2,S3

S5

S4,S5

S7

S6,S7

S3

S1

S8

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)
  9. (close c3)
  10. (open c3)

16 of 55

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

16

open(o)

fetch-jack(_,o)

fetch-wrench(_,o)

close(o)

S1

S2

S3

S4

S5

S6

S7

S8

S2,S3,S4,

S5, S6,S7

S1

S8

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)
  9. (close c3)
  10. (open c3)

17 of 55

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

17

open(o)

fetch-jack(_,o)

fetch-wrench(_,o)

close(o)

S1

S2

S3

S4

S5

S6

S7

S8

S2,S3,S4,

S5, S6,S7

S1

S8

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)
  9. (close c3)
  10. (open c3)

18 of 55

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

18

open(o)

fetch-jack(_,o)

fetch-wrench(_,o)

close(o)

S1

S2

S3

S4

S5

S6

S7

S8

S2,S3,S4,

S5, S6,S7

S1,S8

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)
  9. (close c3)
  10. (open c3)

19 of 55

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

19

S2,S3,S4,

S5, S6,S7

S1,S8

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)
  9. (close c3)
  10. (open c3)

fetch_wrench(_,o)

open(o)

Close(o)

fetch_jack(_,o)

P1(o)

P2(o)

closed(o)?

opened(o)?

Additional details and extensions

  • Zero-parameter predicates (e.g., hand-empty)
  • Detecting predicates with multiple parameters (LOCM2)
  • Learning action costs (N-LOCM)

20 of 55

SIFT [Gosgens et al. ’25]

Assumption: domain is a well-formed STRIPS domain

20

If opened(door) is an effect of op(door)

the not(opened(door)) is a precondition

 

SIFT pseudo-code

While not done

  1. Suggest possible predicates and affecting actions
  2. Generate constraints based on traces
  3. If the exists satisfying action model – done!

21 of 55

SIFT [Gosgens et al. ’25]

  •  

21

An action pattern represents

a predicate that is affected by the action

A feature represents a predicate and all the actions affecting it

  1. Suggest possible predicates and affecting actions

22 of 55

SIFT [Gosgens et al. ’25]

22

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)

 

P(?t1)

 

2. Generate constraints based on traces

23 of 55

SIFT [Gosgens et al. ’25]

23

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)

 

P(?t1)

 

2. Generate constraints based on traces

 

24 of 55

SIFT [Gosgens et al. ’25]

24

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)

 

P(?t1)

 

2. Generate constraints based on traces

 

 

Is satisfiable?

Yes

25 of 55

SIFT [Gosgens et al. ’25]

25

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)

 

P(?t1)

 

2. Generate constraints based on traces

 

 

 

UNSAT!

Note: SAT check here is efficient (2-SAT)

26 of 55

SIFT [Gosgens et al. ’25]

Problem: too many possible “features”

Solution(*): infer object types!

26

SIFT pseudo-code

While not done

  1. Suggest possible predicates and affecting actions
  2. Generate constraints based on traces
  3. If the exists satisfying action model – done!

 

27 of 55

SIFT [Gosgens et al. ’25]

Grouping action parameters to infer types

27

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)

open(?x1)

fetch-jack(?x2 ?x3)

fetch-wrench(?x4 ?x5)

close(?x6)

t1?

t1?

t1?

t1?

t2?

t3?

28 of 55

SIFT [Gosgens et al. ’25]

Grouping action parameters to infer types

28

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)

open(?t1)

fetch-jack(?t2 ?t1)

fetch-wrench(?t3 ?t1)

close(?t1)

29 of 55

SIFT [Gosgens et al. ’25]

Many additional details in the paper

  • Handling additional information on traces (extended traces)
  • Static predicates
  • Completeness theorems

29

SIFT pseudo-code

While not done

  1. Suggest possible predicates and affecting actions
  2. Generate constraints based on traces
  3. If the exists satisfying action model – done!

30 of 55

SIFT

Pros:

  • Learn only from action traces!
  • No supervision required
  • Highly scalable (?)
  • Works reasonably on benchmarks

Discussion:

  • Is the well-formed STRIPS assumption reasonable?
  • Losing explainability?
  • Is not knowing anything about the states practical?

30

31 of 55

Learning From Text [Lindsay et al. ’17]

31

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

“He drove the truck from A to B”

“Then he drove from B to C”

“Finally, he picked up the package”

Framer (Lindsay et al. ‘17)

  1. Parse sentences to “action templates”
  2. Cluster sentences by similarities
  3. Run LOCM considering each cluster as an action

32 of 55

Learning From Text [Lindsay et al. ’17]

32

33 of 55

Learning From Text [Lindsay et al. ’17]

33

34 of 55

Learning From Text [Lindsay et al. ’17]

34

LOCM

35 of 55

Learning From Text [Lindsay et al. ’17]

35

36 of 55

No LLM?!?!

36

37 of 55

Creating Planning Domain Models with LLMs

  • Zero shot approaches (prompt engineering)

(Oates et al. ‘24, Zhang et al. ‘24)

  • Generate multiple candidates and merge (Huang et al., 2024)
  • Generate-Test-Revise (Kambhampati et al., 2024)

37

ACL 2025

38 of 55

Learning From Images

38

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

Images

39 of 55

LatPlan [Masataro and Fukanaga ‘18]

What would be a good symbolic state for an image?

  1. Propositional (i.e., Boolean vector)
  2. Encodes relevant information for planning

39

Solution: the State AutoEncoder (SAE)

40 of 55

LatPlan [Masataro and Fukanaga ‘18]

What would be a good symbolic state for an image?

  1. Propositional (i.e., Boolean vector)
  2. Encodes relevant information for planning

40

Solution: the State AutoEncoder (SAE)

DNNs

Latent state representation

= the symbolic state

Use Gamble-Softmax so the latent representation is categorial (Booleans)

41 of 55

LatPlan [Masataro and Fukanaga ‘18]

What would be a good symbolic action?

  1. Encodes the change in a pair of consecutive images
  2. Allows predicting the next state and applicability

41

Solution: the Action AutoEncoder (AAE)

DNNs

Latent action representation

= the symbolic action

Use Gamble-Softmax so the latent representation is categorial (Booleans)

42 of 55

LatPlan [Masataro and Fukanaga ‘18]

What would be a good symbolic representation?

  1. Propositional (i.e., Boolean vector)
  2. Encodes relevant information for planning
  3. Allows predicting the next state and applicability

42

DNN

DNN

DNN

DNN

DNN

DNN

DNN

43 of 55

LatPlan [Masataro and Fukanaga ‘18]

What would be a good symbolic representation?

  1. Propositional (i.e., Boolean vector)
  2. Encodes relevant information for planning
  3. Allows predicting the next state and applicability
  4. Has propositional preconditions and effects

43

44 of 55

Cube-Space AutoEncoder [Asai and Muise‘20]

What would be a good symbolic representation?

  1. Propositional (i.e., Boolean vector)
  2. Encodes relevant information for planning
  3. Allows predicting the next state and applicability
  4. Has propositional preconditions and effects

44

Encodes preconditions and effects “directly”

Many details in the paper

45 of 55

LatPlan and Cube-Space AutoEncoder

Pros:

  • Learn only from images!
  • No supervision required
  • Works reasonably on benchmarks

Discussion:

  • Losing formal guarantees?
  • Losing explainability?
  • Is it practical to not know your agent’s actions?
  • Is it practical to know every frame?

45

46 of 55

I-ROSAME [Xi et al., ‘24]

  • Input: symbolic representation, image traces
  • Output: domain model

46

47 of 55

Learning from Low-Level Control

47

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

Raw set of features

(e.g., sensor data)

Low-level controllers

(AKA Skills, Options, VLA?…)

48 of 55

Learning from Low-Level Skills [Konidaris et al. ’18]

  •  

48

Many details in the paper

  • Identify predicates based on affected state variables
  • Learning for probabilistic planning domains

….

🡺preconditions

🡺effects(*)

49 of 55

Learning from Low-Level Skills [Konidaris et al. ’18]

49

From skills to symbols: Learning symbolic representations for abstract high-level planning. G. Konidaris et al. JAIR 2018

50 of 55

Learning from Low-Level Skills [Konidaris et al. ’18]

50

From skills to symbols: Learning symbolic representations for abstract high-level planning. G. Konidaris et al. JAIR 2018

51 of 55

Predicate Invention for Bi-Level Planning [Silver et al. ’23]

  • Input: plan traces, and goal predicate symbols
  • Approach: predicate invention as program synthesis
    • Synthesize symbols, starting from goal predicates
    • Learn symbolic action, try to plan for given traces, optimize
  • How to guide the program synthesis process?
    • Optimize for plans similar in cost to the given traces
    • Optimize for plans that were easier to find (A* search nodes)

51

52 of 55

VisualPredicator [Liang et al., 2025]

  • Input: plan traces+images, goal predicate symbols
  • Approach: program synthesis + a Vision Language Model
    • Synthesize symbols also using VLM
    • Learn symbolic action, try to plan for given traces, optimize

52

53 of 55

VisualPredicator [Liang et al., 2025]

  • Input: plan traces+images, goal predicate symbols
  • Approach: program synthesis + a Vision Language Model
    • Synthesize symbols also using VLM
    • Learn symbolic action, try to plan for given traces, optimize

53

Strategy #1 (Discrimination)

“Explain why action a worked in state s and failed in s’”

Strategy #2 (Transition Modeling)

“Explain what has changed after doing action a in state s”

Strategy #3 (Unconditional Generation)

“Suggest useful compositions of existing predicates”

54 of 55

Types of Input for Representation Learning

54

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

Images

Text

Raw set of features

(e.g., sensor data)

Basic set of features

(e.g., goal & init description)

LOCM

SIFT

FRAMER

LLM-based

VisualPredictor

Skills2Symbols

LatPlan

Program

Synthesis

55 of 55

What’s Next?

Given: symbolic states, actions, and traces

Not given: actions’ preconditions and effects

55

Time

Session

Speaker

08:30–09:15

Introduction & Domain Learning Basics

Roni Stern

09:15–09:45

Learning State Abstractions

Roni Stern

09:45–10:30

Offline Learning Domain Models

Leonardo Lamanna

10:30–11:00

Coffee Break

11:00–11:45

Hands-on Session

Leonardo Lamanna

11:45–12:30

Online Learning and Open Challenges

Roni Stern