1 of 42

1

Domain Model Learning for AI Planning

Tutorial in AAAI 2026

2 of 42

What’s the Plan for Today?

2

Time

Session

Speaker

08:30–09:15

Introduction & Domain Learning Basics

Roni Stern

09:15–09:45

Learning State Abstractions

Roni Stern

09:45–10:30

Offline Learning Domain Models

Leonardo Lamanna

10:30–11:00

Coffee Break

11:00–11:45

Hands-on Session

Leonardo Lamanna

11:45–12:30

Online Learning and Open Challenges

Roni Stern

3 of 42

What’s the Plan for Today?

3

Time

Session

Speaker

08:30–09:15

Introduction & Domain Learning Basics

Roni Stern

09:15–09:45

Learning State Abstractions

Roni Stern

09:45–10:30

Offline Learning Domain Models

Leonardo Lamanna

10:30–11:00

Coffee Break

11:00–11:45

Hands-on Session

Leonardo Lamanna

11:45–12:30

Online Learning and Open Challenges

Roni Stern

1. What is planning?

Symbolic

Domain -independent

2. Why learn a domain for planning?

3. The domain model learning problem

4 of 42

Domain Model Learning in AI Planning ��Part 1: Introduction & Basics

Roni Stern

Ben Gurion University of the Negev

5 of 42

Background: Planning

What is Planning?

5

S

A

B

Initial State

Operators

Is goal?

C

D

E

Is goal?

Is goal?

Is goal?

Is goal?

Yes!

A plan!

Goal

6 of 42

Background: Planning

Why Planning?

6

S

A

B

Initial State

Operators

Is goal?

C

D

E

Is goal?

Is goal?

Is goal?

Is goal?

Yes!

Goal

7 of 42

Why Plan if you can Chat?

7

8 of 42

Why Planning?

8

9 of 42

Why Planning?

9

10 of 42

Why Planning?

10

11 of 42

Why Planning?

11

12 of 42

Background: Planning

Why Planning

is Hard?

12

S

A

B

Initial State

Operators

Is goal?

C

D

E

Is goal?

Is goal?

Is goal?

Is goal?

Yes!

A plan!

Goal

Key challenge in planning: Combinatorial search

13 of 42

Background: Domain-Independent Planning

To build a general planning algorithm

we need a way to explain our problem to the AI

Shakey the robot

14 of 42

Planning Domain-Definition Language (PDDL)

14

domain.pddl

problem.pddl

  • How to represent a state?
  • What operators do we have?
  • How do they work?
  • What is the initial state?
  • What is the goal?
  • (optional) What I want to optimize?

Model of the environment

Model of the current task

One domain to rule them all

15 of 42

PDDL Example: Tower of Hanoi

domain.pddl

(define (domain HANOI)

(:requirements :strips)

(:predicates (on ?x ?y)

(smaller ?x ?y)

(clear ?x))

(:action move

:parameters (?x ?y ?z)

:precondition (and

(on ?x ?y)

(clear ?z) (clear ?x)

(smaller ?x ?z))

:effect

(and (on ?x ?z)

(clear ?y)

(not (on ?x ?y))

(not (clear ?z)))))

16 of 42

PDDL Example: Tower of Hanoi

domain.pddl

(define (domain HANOI)

(:requirements :strips)

(:predicates (on ?x ?y)

(smaller ?x ?y)

(clear ?x))

(:action move

:parameters (?x ?y ?z)

:precondition (and

(on ?x ?y)

(clear ?z) (clear ?x)

(smaller ?x ?z))

:effect

(and (on ?x ?z)

(clear ?y)

(not (on ?x ?y))

(not (clear ?z)))))

problem.pddl

(define (problem HANOI-4-0)

(:domain HANOI)

(:objects A B C D E1 E2 E3)

(:init

(ON A B) (ON B C) (ON C D) (ON D E1)

(CLEAR A) (CLEAR E2) (CLEAR E3)

(SMALLER A B) (SMALLER A C) (SMALLER A D)

(SMALLER B C) (SMALLER B D) (SMALLER C D)

(SMALLER A E1) (SMALLER A E2) (SMALLER A E3)

(SMALLER B E1) (SMALLER B E2) (SMALLER B E3)

(SMALLER C E1) (SMALLER C E2) (SMALLER C E3)

(SMALLER D E1) (SMALLER D E2) (SMALLER D E3))

(:goal (AND (ON A B) (ON B C) (ON C D)

(ON D E3))))

17 of 42

Classical Planning and Beyond

Classical planning (PDDL)

    • Single acting agents
    • State is a set of predicates (i.e., Booleans)
    • Deterministic effects
    • Full observability

Beyond classical planning

    • Numeric planning (PDDL 2.1)
    • Temporal planning (PDDL 2.1)
    • Exogenous events (PDDL 2.1, 3, +)
    • Multi-agent planning (MA-PDDL)
    • Probabilistic planning (PPDDL, RDDL)
    • Contingent planning …

17

18 of 42

The Dream: One Planner to Rule them All

18

Planning

Problem

Formal Domain+Problem

(STRIPS, PDDL, PDDL+, RDDL, fSTRIPS,…)

AI Planner

Plan

How to find a plan in a large state space?

Which formal language to use?

Planning

Problem

?

19 of 42

Learning Planning Domain Models

Planning

Problem

Formal Domain+Problem

(STRIPS, PDDL, PDDL+, RDDL, fSTRIPS,…)

AI Planner

Plan

Planning

Problem

Planning

Problem

?

Operator

Observation

Observation

Observation

Observation

AI Learner

20 of 42

Learning Planning Domain Models

Planning

Problem

Formal Domain+Problem

(STRIPS, PDDL, PDDL+, RDDL, fSTRIPS,…)

AI Planner

Plan

Planning

Problem

Planning

Problem

?

Operator

Observation

Observation

Observation

Observation

Planning Domain Learner

21 of 42

Why Plan if you can just Act?

21

22 of 42

Planning vs. Reinforcement Learning (RL)

Symbolic Planning

Reinforcement Learning

Achieve goal, minimize cost

Maximize reward

Declaratively described state space, requires strong assumptions

Unstructured state space,

few assumptions needed

Explicitly designed to exploit domain structure (predicates, signatures, etc.)

Implicit use of domains structure

(learned online)

Offline planning, no errors ;)

Online, requires trial-and-error

No training, long, problem-specific planning

Long, problem-specific, training

Formal guarantees on solution quality

Hard to obtain guarantees on solution

Offline RL

Meta RL

RL+shielding

DNN Verifiers

23 of 42

Why Planning?

23

24 of 42

Why Planning?

24

25 of 42

Why Planning?

25

26 of 42

Why Planning?

26

27 of 42

Why Learn a Planning Domain

instead of just Learning to Act?

27

28 of 42

Learning Planning Domain vs. RL

Learning Planning Domains

Reinforcement Learning

Model-Based

Model-Free

?

Fewer than model-free

Usually data-intensive

?

Runtime scales well with data

Produces reusable composable domain model that planners can exploit

Produces policy optimizing a fixed reward (can retrain offline)

Produces policy optimizing a fixed reward

Lifelong RL

Transfer learning

Offline RL

Meta RL

RL+shielding

DNN Verifiers

Learning symbolic planning can be viewed as a special case of MBRL

29 of 42

Why Learn a Symbolic Planning?

  • Powerful domain-independent planners
    • Search algorithms
    • Heuristics
    • Symmetry detection

  • Formal solution quality guarantees
  • Easier to explain
  • Easier to diagnose and debug

29

See more on this

Vallati et al. ‘25

30 of 42

Learning Planning Domain Models

Planning

Problem

Formal Domain+Problem

(STRIPS, PDDL, PDDL+, RDDL, fSTRIPS,…)

AI Planner

Plan

Planning

Problem

Planning

Problem

Operator

Observation

Observation

Observation

Observation

AI Learner

FAMA (Aineto et al.),

ARMS (Yang et al.), �LOCM (Cresswell et al.),

NOLAM (Lamanna & Serafini)

….

Deep RL

(e.g., PPO, DQN,…)

can do it better!

Learning the domain is too hard

31 of 42

Learning Planning Domain Models?

31

Deep RL

(e.g., PPO, DQN,…)

can do it better!

Learning the domain is too hard

Is it really?

How far can we go?

32 of 42

What’s the Plan for Today?

32

Time

Session

Speaker

08:30–09:15

Introduction & Domain Learning Basics

Roni Stern

09:15–09:45

Learning State Abstractions

Roni Stern

09:45–10:30

Offline Learning Domain Models

Leonardo Lamanna

10:30–11:00

Coffee Break

11:00–11:45

Hands-on Session

Leonardo Lamanna

11:45–12:30

Online Learning and Open Challenges

Roni Stern

1. What is planning?

Symbolic

Domain -independent

2. Why learn a domain for planning?

3. The domain model learning problem

33 of 42

Offline vs. Online Domain Model Learning

33

Observations

Action

State

Action

State

Learning Agent

External Agent

  • Choose actions
  • Process observations
  • Learn/update domain model
  • Process observation
  • Learn domain model

Learning Agent

Offline

Learning

Online

Learning

Active

Learning

34 of 42

Types of Domain Model Learning Problems

  • Assumption about the environment
    • Deterministic? single-agent? propositional? …
  • What is the objective?
    • Learn an accurate model?
    • Learn a useful model? Useful for what?
    • How to measure success? (more on this later)
  • Assumptions over the given observations

34

35 of 42

What Can We Observe?

35

Observations

Action

State

Learning Agent

External Agent

  • Process observation
  • Learn domain model

Offline

Learning

Move

(A, B)

Load

(Pkg, C)

Move

(B, C)

A

B

C

A

B

C

A

B

C

A

B

C

AKA: Trace, trajectory, episode, execution, transitions ….

36 of 42

Partial Observability

36

Observations

Action

State

Learning Agent

External Agent

  • Process observation
  • Learn domain model

Offline

Learning

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

37 of 42

Partial Observability

37

Observations

Action

State

Learning Agent

External Agent

  • Process observation
  • Learn domain model

Offline

Learning

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

38 of 42

Partial Observability

38

Observations

Action

State

Learning Agent

External Agent

  • Process observation
  • Learn domain model

Offline

Learning

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

39 of 42

Noisy Observations

39

Observations

Action

State

Learning Agent

External Agent

  • Process observation
  • Learn domain model

Offline

Learning

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

40 of 42

Raw Observations

40

Observations

Action

State

Learning Agent

External Agent

  • Process observation
  • Learn domain model

Offline

Learning

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

Images?

Text?

Sensor data?

41 of 42

Types of Domain Model Learning Problems

41

domain.pddl

(define (domain HANOI)

(:requirements :strips)

(:predicates (on ?x ?y) (smaller ?x ?y) (clear ?x))

(:action move

:parameters (?x ?y ?z)

:precondition

(and (on ?x ?y) (clear ?z)

(clear ?x) (smaller ?x ?z))

:effect

(and (on ?x ?z) (clear ?y)

(not (on ?x ?y)) (not (clear ?z)))))

Representation Learning

Action Model Learning

42 of 42

What’s Next

42

Time

Session

Speaker

08:30–09:15

Introduction & Domain Learning Basics

Roni Stern

09:15–09:45

Learning State Abstractions

Roni Stern

09:45–10:30

Offline Learning of Action Models

Leonardo Lamanna

10:30–11:00

Coffee Break

11:00–11:45

Hands-on Session

Leonardo Lamanna

11:45–12:30

Online Learning and Open Challenges

Roni Stern