1 of 51

1

2 of 51

What’s the Plan for Today?

2

Time

Session

Speaker

08:30–09:15

Introduction & Domain Learning Basics

Roni Stern

09:15–10:00

Offline Learning Action Models

Leonardo Lamanna

10:00-10:30

Coffee Break ☕

10:30-11.00

Learning State Abstraction

Roni Stern

11.00-11.45

Active Learning and Open Challenges

Roni Stern

11.45-12.30

Hands-on Session

Leonardo Lamanna

3 of 51

What’s the Plan for Today?

3

Time

Session

Speaker

08:30–09:15

Introduction & Domain Learning Basics

Roni Stern

09:15–10:00

Offline Learning Action Models

Argaman Mordoch

10:00-10:30

Coffee Break ☕

10:30-11.00

Learning State Abstraction

Roni Stern

11.00-11.45

Active Learning and Open Challenges

Argaman Mordoch &

Roni Stern

11.45-12.30

Hands-on Session

Leonardo Lamanna

4 of 51

What’s the Plan for Today?

4

Time

Session

Speaker

08:30–09:15

Introduction & Domain Learning Basics

Roni Stern

09:15–10:00

Offline Learning Action Models

Argaman Mordoch

10:00-10:30

Coffee Break ☕

10:30-11.00

Learning State Abstraction

Roni Stern

11.00-11.45

Active Learning and Open Challenges

Argaman Mordoch &

Roni Stern

11.45-12.30

Hands-on Session

Leonardo Lamanna

1. What is planning?

Symbolic

Domain -independent

2. Why learn a domain for planning?

3. The domain model learning problem

5 of 51

Domain Model Learning in AI Planning ��Part 1: Introduction & Basics

Roni Stern

Ben Gurion University of the Negev

6 of 51

Background: Planning

What is Planning?

6

S

A

B

Initial State

Operators

Is goal?

C

D

E

Is goal?

Is goal?

Is goal?

Is goal?

Yes!

A plan!

Goal

7 of 51

Background: Planning

Why Planning?

7

S

A

B

Initial State

Operators

Is goal?

C

D

E

Is goal?

Is goal?

Is goal?

Is goal?

Yes!

Goal

8 of 51

Why Plan if you can Chat?

8

9 of 51

Why Planning?

9

10 of 51

Why Planning?

10

11 of 51

Why Planning?

11

12 of 51

Why Planning?

12

13 of 51

4 months has passed…

(since the AAAI tutorial)

13

14 of 51

Why Planning?

14

15 of 51

Why Planning?

15

16 of 51

Symbolic Planning

The World Still Needs Symbolic Planners

  • The world is described by symbols (predicates, objects, functions,…)
  • The world dynamics are well-understood
  • Planners reason over how actions changes the state

16

Reasoning trace can be explained

Designed for long-term reasoning

Can output

“no solution”

(instead of hallucinating)

Formal validation of each step

17 of 51

Symbolic Planning

The World Still Needs Symbolic Planners

  • The world is described by symbols (predicates, objects, functions,…)
  • The world dynamics are well-understood
  • Planners reason over how actions changes the state

17

Reasoning trace can be explained

Designed for long-term reasoning

Can output

“no solution”

(instead of hallucinating)

Formal validation of each step

18 of 51

Background: Domain-Independent Planning

To build a general planning algorithm

we need a way to explain our problem to the AI

Shakey the robot

19 of 51

Planning Domain-Definition Language (PDDL)

19

domain.pddl

problem.pddl

  • How to represent a state?
  • What operators do we have?
  • How do they work?
  • What is the initial state?
  • What is the goal?
  • (optional) What I want to optimize?

Model of the environment

Model of the current task

One domain to rule them all

20 of 51

PDDL Example: Tower of Hanoi

domain.pddl

(define (domain HANOI)

(:requirements :strips)

(:predicates (on ?x ?y)

(smaller ?x ?y)

(clear ?x))

(:action move

:parameters (?x ?y ?z)

:precondition (and

(on ?x ?y)

(clear ?z) (clear ?x)

(smaller ?x ?z))

:effect

(and (on ?x ?z)

(clear ?y)

(not (on ?x ?y))

(not (clear ?z)))))

21 of 51

PDDL Example: Tower of Hanoi

domain.pddl

(define (domain HANOI)

(:requirements :strips)

(:predicates (on ?x ?y)

(smaller ?x ?y)

(clear ?x))

(:action move

:parameters (?x ?y ?z)

:precondition (and

(on ?x ?y)

(clear ?z) (clear ?x)

(smaller ?x ?z))

:effect

(and (on ?x ?z)

(clear ?y)

(not (on ?x ?y))

(not (clear ?z)))))

problem.pddl

(define (problem HANOI-4-0)

(:domain HANOI)

(:objects A B C D E1 E2 E3)

(:init

(ON A B) (ON B C) (ON C D) (ON D E1)

(CLEAR A) (CLEAR E2) (CLEAR E3)

(SMALLER A B) (SMALLER A C) (SMALLER A D)

(SMALLER B C) (SMALLER B D) (SMALLER C D)

(SMALLER A E1) (SMALLER A E2) (SMALLER A E3)

(SMALLER B E1) (SMALLER B E2) (SMALLER B E3)

(SMALLER C E1) (SMALLER C E2) (SMALLER C E3)

(SMALLER D E1) (SMALLER D E2) (SMALLER D E3))

(:goal (AND (ON A B) (ON B C) (ON C D)

(ON D E3))))

22 of 51

Classical Planning and Beyond

Classical planning (PDDL)

    • Single acting agents
    • State is a set of predicates (i.e., Booleans)
    • Deterministic effects
    • Full observability

Beyond classical planning

    • Numeric planning (PDDL 2.1)
    • Temporal planning (PDDL 2.1)
    • Exogenous events (PDDL 2.1, 3, +)
    • Multi-agent planning (MA-PDDL)
    • Probabilistic planning (PPDDL, RDDL)
    • Contingent planning …

22

23 of 51

The Dream: One Planner to Rule them All

Planning

Problem

Formal Domain+Problem

(STRIPS, PDDL, PDDL+, RDDL, fSTRIPS,…)

AI Planner

Plan

Planning

Problem

How to find a plan in a large state space?

Which formal language to use?

?

Why Planning

is Hard?

24 of 51

Symbolic Planning

The World Still Needs Symbolic Planners

  • The world is described by symbols (predicates, objects, functions,…)
  • The world dynamics are well-understood
  • Planners reason over how actions changes the state

24

Reasoning trace can be explained

Designed for long-term reasoning

Can output

“no solution”

(instead of hallucinating)

Formal validation of each step

25 of 51

Learning Planning Domain Models

Planning

Problem

Formal Domain+Problem

(STRIPS, PDDL, PDDL+, RDDL, fSTRIPS,…)

AI Planner

Plan

Planning

Problem

Planning

Problem

?

Operator

Observation

Observation

Observation

Observation

AI Learner

26 of 51

Learning Planning Domain Models

Planning

Problem

Formal Domain+Problem

(STRIPS, PDDL, PDDL+, RDDL, fSTRIPS,…)

AI Planner

Plan

Planning

Problem

Planning

Problem

?

Operator

Observation

Observation

Observation

Observation

Planning Domain Learner

27 of 51

Learning Planning Domain Models

Planning

Problem

Formal Domain+Problem

(STRIPS, PDDL, PDDL+, RDDL, fSTRIPS,…)

AI Planner

Plan

Planning

Problem

Planning

Problem

Operator

Observation

Observation

Observation

Observation

AI Learner

28 of 51

Learning Planning Domain Models

Planning

Problem

Formal Domain+Problem

(STRIPS, PDDL, PDDL+, RDDL, fSTRIPS,…)

AI Planner

Plan

Planning

Problem

Planning

Problem

Operator

Observation

Observation

Observation

Observation

Planning Domain Learner

29 of 51

Why Plan if you can just Act?

29

30 of 51

Planning vs. Reinforcement Learning (RL)

Symbolic Planning

Reinforcement Learning

Achieve goal, minimize cost

Maximize reward

Declaratively described state space, requires strong assumptions

Unstructured state space,

few assumptions needed

Explicitly designed to exploit domain structure (predicates, signatures, etc.)

Implicit use of domains structure

(learned online)

Offline planning, no errors ;)

Online, requires trial-and-error

No training, long, problem-specific planning

Long, problem-specific, training

Formal guarantees on solution quality

Hard to obtain guarantees on solution

Offline RL

Meta RL

RL+shielding

DNN Verifiers

31 of 51

Why Planning?

31

32 of 51

Why Planning?

32

33 of 51

Why Planning?

33

34 of 51

Why Planning?

34

35 of 51

Why Learn a Planning Domain

instead of just Learning to Act?

35

36 of 51

Learning Planning Domain vs. RL

Learning Planning Domains

Reinforcement Learning

Model-Based

Model-Free

?

Fewer than model-free

Usually data-intensive

?

Runtime scales well with data

Produces reusable composable domain model that planners can exploit

Produces policy optimizing a fixed reward (can retrain offline)

Produces policy optimizing a fixed reward

Lifelong RL

Transfer learning

Offline RL

Meta RL

RL+shielding

DNN Verifiers

Learning symbolic planning can be viewed as a special case of MBRL

37 of 51

Why Learn a Symbolic Planning?

  • Powerful domain-independent planners
    • Search algorithms
    • Heuristics
    • Symmetry detection

  • Formal solution quality guarantees
  • Easier to explain
  • Easier to diagnose and debug

37

See more on this

Vallati et al. ‘25

38 of 51

Learning Planning Domain Models

Planning

Problem

Formal Domain+Problem

(STRIPS, PDDL, PDDL+, RDDL, fSTRIPS,…)

AI Planner

Plan

Planning

Problem

Planning

Problem

Operator

Observation

Observation

Observation

Observation

AI Learner

FAMA (Aineto et al.),

ARMS (Yang et al.), �LOCM (Cresswell et al.),

NOLAM (Lamanna & Serafini)

….

Deep RL

(e.g., PPO, DQN,…)

can do it better!

Learning the domain is too hard

39 of 51

Learning Planning Domain Models?

39

Deep RL

(e.g., PPO, DQN,…)

can do it better!

Learning the domain is too hard

Is it really?

How far can we go?

40 of 51

What’s the Plan for Today?

40

1. What is planning?

Symbolic

Domain -independent

2. Why learn a domain for planning?

3. The domain model learning problem

Time

Session

Speaker

08:30–09:15

Introduction & Domain Learning Basics

Roni Stern

09:15–10:00

Offline Learning Action Models

Argaman Mordoch

10:00-10:30

Coffee Break ☕

10:30-11.00

Learning State Abstraction

Roni Stern

11.00-11.45

Active Learning and Open Challenges

Argaman Mordoch &

Roni Stern

11.45-12.30

Hands-on Session

Leonardo Lamanna

41 of 51

Types of Domain Model Learning Problems

41

domain.pddl

(define (domain HANOI)

(:requirements :strips)

(:predicates (on ?x ?y) (smaller ?x ?y) (clear ?x))

(:action move

:parameters (?x ?y ?z)

:precondition

(and (on ?x ?y) (clear ?z)

(clear ?x) (smaller ?x ?z))

:effect

(and (on ?x ?z) (clear ?y)

(not (on ?x ?y)) (not (clear ?z)))))

Representation Learning

Action Model Learning

42 of 51

Offline vs. Online Domain Model Learning

42

Observations

Action

State

Action

State

Learning Agent

External Agent

  • Choose actions
  • Process observations
  • Learn/update domain model
  • Process observation
  • Learn domain model

Learning Agent

Offline

Learning

Online

Learning

Active

Learning

43 of 51

Types of Domain Model Learning Problems

  • Assumption about the environment
    • Deterministic? single-agent? propositional? …
  • What is the objective?
    • Learn an accurate model?
    • Learn a useful model? Useful for what?
    • How to measure success? (more on this later)
  • Assumptions over the given observations

43

44 of 51

What Can We Observe?

44

Observations

Action

State

Learning Agent

External Agent

  • Process observation
  • Learn domain model

Offline

Learning

Move

(A, B)

Load

(Pkg, C)

Move

(B, C)

A

B

C

A

B

C

A

B

C

A

B

C

AKA: Trace, trajectory, episode, execution, transitions ….

45 of 51

Partial Observability

45

Observations

Action

State

Learning Agent

External Agent

  • Process observation
  • Learn domain model

Offline

Learning

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

46 of 51

Partial Observability

46

Observations

Action

State

Learning Agent

External Agent

  • Process observation
  • Learn domain model

Offline

Learning

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

47 of 51

Partial Observability

47

Observations

Action

State

Learning Agent

External Agent

  • Process observation
  • Learn domain model

Offline

Learning

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

48 of 51

Noisy Observations

48

Observations

Action

State

Learning Agent

External Agent

  • Process observation
  • Learn domain model

Offline

Learning

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

49 of 51

Raw Observations

49

Observations

Action

State

Learning Agent

External Agent

  • Process observation
  • Learn domain model

Offline

Learning

A

B

C

Move

(A, B)

A

B

C

A

B

C

Load

(Pkg, C)

Move

(B, C)

A

B

C

Images?

Text?

Sensor data?

50 of 51

What’s the Plan for Today?

50

Next: Offline Learning of Action Models

Time

Session

Speaker

08:30–09:15

Introduction & Domain Learning Basics

Roni Stern

09:15–10:00

Offline Learning Action Models

Argaman Mordoch

10:00-10:30

Coffee Break ☕

10:30-11.00

Learning State Abstraction

Roni Stern

11.00-11.45

Active Learning and Open Challenges

Argaman Mordoch &

Roni Stern

11.45-12.30

Hands-on Session

Leonardo Lamanna

Argaman Mordoch

BGU

51 of 51

Rate This Tutorial @ domain-learning.github.io

51