2 of 27

Classical Planning is Unrealistic

Assumptions in classical planning

Determinism: Actions have deterministic effects
Full Observability: Observations result in a single state (current)
Reachability Goals: Goals are set of states

Build a plan that leads to one of the goals

Nondeterminism

Classical planning assumes a perfect model (nominal cases of behaviors)

Throw of a die should be fully determined (not realistic)

It is impossible to predict everything with certainty

3 of 27

Nondeterminism is the Rule of Nature

Deterministic behavior does not hold in several applications

It is important to model the possible failures of an action that sets the signals of a railway crossing.

Should have plan for contingent situation

Sometimes deterministic outcomes do not exist.

When we throw a dice

4 of 27

Planning with Non-Deterministic Domains

Plan may result in different execution path
Planning algorithms should analyze all possible outcomes and generate plans that have conditional behaviors
How to model non-determinism?

Associating probabilities to the outcomes of actions
Some actions are more likely than others
Marginal outcomes have lower probabilities than the frequent outcomes

5 of 27

Planning with Markov Decision Process

A planning domain is modelled as a stochastic system

Nondeterministic state-transition system that assigns probabilities to state transitions
Uncertainty about action outcomes is thus modeled with a probability distribution function

Goals are represented by means of utility functions

numeric functions that give preferences to states to be traversed and/or actions to be performed
Utility functions can express preferences on the entire execution path of a plan

6 of 27

Planning with Markov Decision Process

Plans are represented as policies

specify the action to execute in each state
Execution of a policy results in conditional behavior

The planning problem is seen as an optimization problem

Planning algorithms search for a plan that maximizes the utility function

7 of 27

Planning with Fully Observable Domains

8 of 27

Domain as Stochastic System

Nondeterministic Actions

move(r1,l2,l3)

move(r1,l1,l4)

9 of 27

Plan as Policies

10 of 27

Example Policies

All three policies try to move the robot toward state s4.

11 of 27

Policy Execution

Policy executions correspond to infinite sequences of states, called histories

Markov Chain: Sequence of random values (states) whose probability depends on the history

12 of 27

Probability of a History

13 of 27

Goals as Utility Functions

14 of 27

Goals as Utility Functions

Goals are no longer set of desired final states
Quantitative way to represented preferred execution

We prefer the robot to move to the state s4 rather than s5

Costs and rewards provide an easy way to express competing criteria

the criteria of shortest path versus location avoidance

15 of 27

Utility Function

16 of 27

Utility of a History

17 of 27

Expected Utility of a Policy

Example expected utility

18 of 27

Planning as an Optimization Problem

19 of 27

Expected Utility without Reward

When there are no rewards associated with the states

20 of 27

Bellman Equation

21 of 27

Planning Algorithm

24 of 27

Solve system of equations

25 of 27

Improve the policy by choosing

Optimal Policy:

27 of 27

Possible iterations for state s1 at the first step

1 of 27