1 of 65

Rate This Tutorial @ domain-learning.github.io

1

2 of 65

2

Domain Model Learning in AI Planning Tutorial

AAMAS 2026

3 of 65

What’s the Plan for Today?

3

Next: Learning State Abstraction for Planning

Time

Session

Speaker

08:30–09:15

Introduction & Domain Learning Basics

Roni Stern

09:15–10:00

Offline Learning Action Models

Argaman Mordoch

10:00-10:30

Coffee Break ☕

10:30-11.00

Learning State Abstraction

Roni Stern

11.00-11.45

Active Learning and Open Challenges

Argaman Mordoch &

Roni Stern

11.45-12.30

Hands-on Session

Leonardo Lamanna

4 of 65

Part 3: Learning State Abstractions

Roni Stern

Ben Gurion University of the Negev

Some slides from: Christian Muise, Masataro Asai

5 of 65

Representation Learning

Learning a Symbolic Representation of the World

In general: hard task, ill-defined, deeply studied (ICLR?)

In particular: a symbolic representation for planning

Still hard, still ill-defined ☺

5

6 of 65

Learning Planning Domain Models

Planning

Problem

Formal Domain+Problem

(STRIPS, PDDL, PDDL+, RDDL, fSTRIPS,…)

Planning

Problem

Planning

Problem

Operator

Observation

Observation

Observation

Observation

Planning Domain Learner

7 of 65

Learning Planning Domain Models

Planning

Problem

Planning

Problem

Planning

Problem

Operator

Observation

Observation

Observation

Observation

Planning Domain Learner

domain.pddl

  • How to represent a state?
  • What operators do we have?
  • How do they work?

Model of the environment

8 of 65

Learning Planning Domain Models

domain.pddl

  • How to represent a state?
  • What operators do we have?
  • How do they work?

Model of the environment

(:predicates (on ?x ?y)

(smaller ?x ?y)

(clear ?x))

(:action move

:parameters (?x ?y ?z))

9 of 65

Learning Planning Domain Models

domain.pddl

  • How to represent a state?
  • What operators do we have?
  • How do they work?

Model of the environment

(:predicates (on ?x ?y)

(smaller ?x ?y)

(clear ?x))

(:action move

:parameters (?x ?y ?z))

What is the input?

10 of 65

Types of Input for Representation Learning

10

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

A

B

C

Images

Text

Raw set of features

(e.g., sensor data)

Basic set of features

(e.g., goal & init description)

LOCM

SIFT

LLM-based

VisualPredictor

Skills2Symbols

LatPlan

Program Synthesis

I-ROSAME

11 of 65

Learning From Action Sequences

11

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

Move(A,B) changes some predicates of A or B that have been modified

Move(B,C) changes some predicates of B or C that have been modified

Insight: It’s same change!

Actually, this is an assumption

The “Object-Centric” approach

  1. Learn how action affect their parameters
  2. Suggest sufficient predicates to encode these effects

12 of 65

Types of Input for Representation Learning

12

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

Images

Text

Raw set of features

(e.g., sensor data)

Basic set of symbols

(e.g., goal description)

Low-level controllers

13 of 65

Learning From Action Sequences

Approach: an “Object-Centric” approach:

  • Learn how action affect their parameters
  • Suggest sufficient “features” to fit assumptions

13

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

Move(A,B) changes some predicates of A or B that have been modified

Move(B,C) changes some predicates of B or C that have been modified

Insight: It’s same change!

Actually, this is an assumption

14 of 65

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

Assumptions

  1. Objects have a “state”
  2. Actions affects the “states” of its parameters

  • Changes in the state of an object is a Finite State Automata

  • Actions with the same name transition similarly

14

the action Do(x) changes

the state of x to “Done” (Done(x))

If Do(x) changes the state of x to “Done” (Done(x))

then Do(y) changes the state of y to “Done” (Done(y))

Not Done

Done

Do(x)

Undo(x)

Intuition: an FSA state will be a predicate over x

15 of 65

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

15

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)
  9. (close c3)
  10. (open c3)

open(o)

fetch-jack(_,o)

fetch-wrench(_,o)

close(o)

S1

S2

S3

S4

S5

S6

S7

S8

16 of 65

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

16

open(o)

fetch-jack(_,o)

fetch-wrench(_,o)

close(o)

S1

S2

S3

S4

S5

S6

S7

S8

S2,S3

S4,S5

S6,S7

S1

S8

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)
  9. (close c3)
  10. (open c3)

17 of 65

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

17

open(o)

fetch-jack(_,o)

fetch-wrench(_,o)

close(o)

S1

S2

S3

S4

S5

S6

S7

S8

S2,S3

S4,S5

S6,S7

S1

S8

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)
  9. (close c3)
  10. (open c3)

18 of 65

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

18

open(o)

fetch-jack(_,o)

fetch-wrench(_,o)

close(o)

S1

S2

S3

S4

S5

S6

S7

S8

S2,S3

S5

S4,S5

S7

S6,S7

S3

S1

S8

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)
  9. (close c3)
  10. (open c3)

19 of 65

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

19

open(o)

fetch-jack(_,o)

fetch-wrench(_,o)

close(o)

S1

S2

S3

S4

S5

S6

S7

S8

S2,S3,S4,

S5, S6,S7

S1

S8

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)
  9. (close c3)
  10. (open c3)

20 of 65

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

20

open(o)

fetch-jack(_,o)

fetch-wrench(_,o)

close(o)

S1

S2

S3

S4

S5

S6

S7

S8

S2,S3,S4,

S5, S6,S7

S1

S8

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)
  9. (close c3)
  10. (open c3)

21 of 65

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

21

open(o)

fetch-jack(_,o)

fetch-wrench(_,o)

close(o)

S1

S2

S3

S4

S5

S6

S7

S8

S2,S3,S4,

S5, S6,S7

S1,S8

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)
  9. (close c3)
  10. (open c3)

22 of 65

LOCM [Cresswell and Gregory ’11, Cresswell et al. ’13, …]

22

S2,S3,S4,

S5, S6,S7

S1,S8

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)
  9. (close c3)
  10. (open c3)

fetch_wrench(_,o)

open(o)

Close(o)

fetch_jack(_,o)

P1(o)

P2(o)

closed(o)?

opened(o)?

Additional details and extensions

  • Zero-parameter predicates (e.g., hand-empty)
  • Detecting predicates with multiple parameters (LOCM2)
  • Learning action costs (N-LOCM)

23 of 65

SIFT [Gosgens et al. ’25]

Assumption: domain is a well-formed STRIPS domain

23

If Opened(Door) is an effect of Open(Door)

the not(Opened(Door)) is a precondition

 

SIFT pseudo-code

While not done

  1. Suggest possible predicates and affecting actions
  2. Generate constraints based on traces
  3. If there exists satisfying action model – done!

24 of 65

SIFT [Gosgens et al. ’25]

  •  

24

An action pattern represents

a predicate that is affected by the action

A feature represents a predicate and all the actions affecting it

  1. Suggest possible predicates and affecting actions

25 of 65

SIFT [Gosgens et al. ’25]

25

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)

 

P(?t1)

 

2. Generate constraints based on traces

26 of 65

SIFT [Gosgens et al. ’25]

26

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)

 

P(?t1)

 

2. Generate constraints based on traces

 

27 of 65

SIFT [Gosgens et al. ’25]

27

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)

 

P(?t1)

 

2. Generate constraints based on traces

 

 

Is satisfiable?

Yes

28 of 65

SIFT [Gosgens et al. ’25]

28

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)

 

P(?t1)

 

2. Generate constraints based on traces

 

 

 

UNSAT!

Note: SAT check here is efficient (2-SAT)

29 of 65

SIFT [Gosgens et al. ’25]

Problem: too many possible “features”

Solution(*): infer object types!

29

SIFT pseudo-code

While not done

  1. Suggest possible predicates and affecting actions
  2. Generate constraints based on traces
  3. If the exists satisfying action model – done!

 

30 of 65

SIFT [Gosgens et al. ’25]

Grouping action parameters to infer types

30

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)

open(?x1)

fetch-jack(?x2 ?x3)

fetch-wrench(?x4 ?x5)

close(?x6)

t1?

t1?

t1?

t1?

t2?

t3?

31 of 65

SIFT [Gosgens et al. ’25]

Grouping action parameters to infer types

31

Example

  1. (open c1)
  2. (fetch-jack j1 c1)
  3. (fetch-wrench wr1 c1)
  4. (close c1)
  5. (open c2)
  6. (fetch-wrench wr2 c2)
  7. (fetch-jack j2 c2)
  8. (close c2)

open(?t1)

fetch-jack(?t2 ?t1)

fetch-wrench(?t3 ?t1)

close(?t1)

32 of 65

SIFT [Gosgens et al. ’25]

Many additional details in the paper

  • Handling additional information on traces (extended traces)
  • Static predicates
  • Completeness theorems

32

SIFT pseudo-code

While not done

  1. Suggest possible predicates and affecting actions
  2. Generate constraints based on traces
  3. If the exists satisfying action model – done!

33 of 65

SIFT [Gosgens et al. ’25]

Pros:

  • Learn only from action traces!
  • No supervision required
  • Highly scalable (?)
  • Works reasonably on benchmarks

Discussion:

  • Is the well-formed STRIPS assumption reasonable?
  • Losing explainability?
  • Is not knowing anything about the states practical?

33

34 of 65

Learning From Images

34

What would be a good symbolic state?

What would be a good symbolic action?

35 of 65

LatPlan [Masataro and Fukanaga ‘18]

What would be a good symbolic state for an image?

  1. Propositional (i.e., Boolean vector)
  2. Encodes relevant information for planning

35

Solution: the State AutoEncoder (SAE)

36 of 65

LatPlan [Masataro and Fukanaga ‘18]

What would be a good symbolic state?

  1. Propositional (i.e., Boolean vector)
  2. Encodes relevant information for planning

36

Solution: the State AutoEncoder (SAE)

DNNs

Latent state representation

= the symbolic state

Use Gamble-Softmax so the latent representation is propositional (Booleans)

37 of 65

LatPlan [Masataro and Fukanaga ‘18]

What would be a good symbolic action?

  1. Encodes the change in a pair of consecutive images
  2. Allows predicting the next state and applicability

37

Solution: the Action AutoEncoder (AAE)

DNNs

Latent action representation

= the symbolic action

Use Gamble-Softmax so the latent representation is categorial (Booleans)

38 of 65

LatPlan [Masataro and Fukanaga ‘18]

What would be a good symbolic representation?

  1. Propositional (i.e., Boolean vector)
  2. Encodes relevant information for planning
  3. Allows predicting the next state and applicability

38

DNN

DNN

DNN

DNN

DNN

DNN

DNN

39 of 65

LatPlan [Masataro and Fukanaga ‘18]

What would be a good symbolic representation?

  1. Propositional (i.e., Boolean vector)
  2. Encodes relevant information for planning
  3. Allows predicting the next state and applicability
  4. Has propositional preconditions and effects

39

40 of 65

Cube-Space AutoEncoder [Asai and Muise‘20]

What would be a good symbolic representation?

  1. Propositional (i.e., Boolean vector)
  2. Encodes relevant information for planning
  3. Allows predicting the next state and applicability
  4. Has propositional preconditions and effects

40

Encodes preconditions and effects “directly”

Many details in the paper

41 of 65

LatPlan and Cube-Space AutoEncoder

Pros:

  • Learn only from images!
  • No supervision required
  • Works reasonably on benchmarks

Discussion:

  • Losing formal guarantees?
  • Losing explainability?
  • Is it practical to not know your agent’s actions?
  • Is it practical to know every frame?

41

42 of 65

I-ROSAME [Xi et al., ‘24]

  • Input: symbolic representation, image & action traces
  • Output: action model + CV model

42

43 of 65

Learning from Low-Level Control

43

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

Raw set of features

(e.g., sensor data)

Low-level controllers

(AKA Skills, Options, VLA?…)

44 of 65

Learning from Low-Level Control

44

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

Raw set of features

(e.g., sensor data)

Low-level controllers

(AKA Skills, Options, VLA?…)

45 of 65

Learning from Low-Level Control

45

Move

(A, B)

Load

(Pkg, C)

Move

(B, C)

Raw set of features

(e.g., sensor data)

Low-level controllers

(AKA Skills, Options, VLA?…)

46 of 65

Learning from Low-Level Skills [Konidaris et al. ’18]

  •  

46

Many details in the paper

  • Identify predicates based on affected state variables
  • Learning for probabilistic planning domains

….

🡺preconditions

🡺effects(*)

47 of 65

Learning from Low-Level Skills [Konidaris et al. ’18]

47

From skills to symbols: Learning symbolic representations for abstract high-level planning. G. Konidaris et al. JAIR 2018

48 of 65

Learning from Low-Level Skills [Konidaris et al. ’18]

48

From skills to symbols: Learning symbolic representations for abstract high-level planning. G. Konidaris et al. JAIR 2018

49 of 65

Predicate Invention for Bi-Level Planning [Silver et al. ’23]

  • Input: plan traces, and goal predicate symbols
  • Approach: predicate invention as program synthesis
    • Synthesize symbols, starting from goal predicates
    • Learn symbolic action, try to plan for given traces, optimize
  • How to guide the program synthesis process?
    • Optimize for plans similar in cost to the given traces
    • Optimize for plans that were easier to find (A* search nodes)

49

50 of 65

VisualPredicator [Liang et al., 2025]

  • Input: plan traces+images, goal predicate symbols
  • Approach: program synthesis + a Vision Language Model
    • Synthesize symbols also using VLM
    • Learn symbolic action, try to plan for given traces, optimize

50

51 of 65

VisualPredicator [Liang et al., 2025]

  • Input: plan traces+images, goal predicate symbols
  • Approach: program synthesis + a Vision Language Model
    • Synthesize symbols also using VLM
    • Learn symbolic action, try to plan for given traces, optimize

51

Strategy #1 (Discrimination)

“Explain why action a worked in state s and failed in s’”

Strategy #2 (Transition Modeling)

“Explain what has changed after doing action a in state s”

Strategy #3 (Unconditional Generation)

“Suggest useful compositions of existing predicates”

52 of 65

Learning From Text

53 of 65

Learning From Text

53

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

“He drove the truck from A to B”

“Then he drove from B to C”

“Finally, he picked up the package”

Framer (Lindsay et al. ‘17)

  1. Parse sentences to “action templates”
  2. Cluster sentences by similarities
  3. Run LOCM considering each cluster as an action

Ancient Pre-Historical Era

(i.e., pre-LLM)

54 of 65

Learning From Text [Lindsay et al. ’17]

54

55 of 65

Learning From Text [Lindsay et al. ’17]

55

56 of 65

Learning From Text [Lindsay et al. ’17]

56

LOCM

57 of 65

Learning From Text [Lindsay et al. ’17]

57

58 of 65

Creating Planning Domain Models with LLMs

58

ACL 2025

59 of 65

LLMs as Planning Formalizers

59

On the Limits of Language Models as Planning Formalizers (Huang & Zhang, ACL ’25)

Prior work suggests LLM as Formalizers is a more robust approach

(but results are somewhat inconclusive)

60 of 65

How Natural is the Natural Language?

60

On the Limits of Language Models as Planning Formalizers (Huang & Zhang, ACL ’25)

61 of 65

LLMs as Planning Formalizers

  • Zero shot approaches (Oates et al. ‘24, Zhang et al. ‘24)
  • Generate multiple candidates and merge (Huang et al., 2024)
  • Generate-Test-Revise (Kambhampati et al., 2024)

….

61

LLMs as Planning Formalizers: A Survey for Leveraging Large Language Models to Construct Automated Planning Models (Tantakoun et al., ACL ‘25)

62 of 65

Types of Input for Representation Learning

62

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

Images

Text

Raw set of features

(e.g., sensor data)

Basic set of features

(e.g., goal & init description)

LOCM

SIFT

FRAMER

LLM-based

VisualPredictor

Skills2Symbols

LatPlan

Program

Synthesis

63 of 65

Summary

63

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

A

B

C

Images

Text

Raw set of features

(e.g., sensor data)

Basic set of features

(e.g., goal & init description)

LOCM

SIFT

VisualPredictor

Skills2Symbols

LatPlan

Program Synthesis

I-ROSAME

LLM-based

64 of 65

What’s the Plan for Today?

64

Next: Active Learning and Open Challenges

Time

Session

Speaker

08:30–09:15

Introduction & Domain Learning Basics

Roni Stern

09:15–10:00

Offline Learning Action Models

Argaman Mordoch

10:00-10:30

Coffee Break ☕

10:30-11.00

Learning State Abstraction

Roni Stern

11.00-11.45

Active Learning and Open Challenges

Argaman Mordoch &

Roni Stern

11.45-12.30

Hands-on Session

Leonardo Lamanna

Argaman Mordoch

BGU

65 of 65

Rate This Tutorial @ domain-learning.github.io

65