1 of 27

Planning Under Uncertainty

2 of 27

Classical Planning is Unrealistic

  • Assumptions in classical planning
    • Determinism: Actions have deterministic effects
    • Full Observability: Observations result in a single state (current)
    • Reachability Goals: Goals are set of states
      • Build a plan that leads to one of the goals
  • Nondeterminism
    • Classical planning assumes a perfect model (nominal cases of behaviors)
      • Throw of a die should be fully determined (not realistic)
    • It is impossible to predict everything with certainty

3 of 27

Nondeterminism is the Rule of Nature

  • Deterministic behavior does not hold in several applications
    • It is important to model the possible failures of an action that sets the signals of a railway crossing.
      • Should have plan for contingent situation
    • Sometimes deterministic outcomes do not exist.
      • When we throw a dice

4 of 27

Planning with Non-Deterministic Domains

  • Plan may result in different execution path
  • Planning algorithms should analyze all possible outcomes and generate plans that have conditional behaviors
  • How to model non-determinism?
    • Associating probabilities to the outcomes of actions
    • Some actions are more likely than others
    • Marginal outcomes have lower probabilities than the frequent outcomes

5 of 27

Planning with Markov Decision Process

  • A planning domain is modelled as a stochastic system
    • Nondeterministic state-transition system that assigns probabilities to state transitions
    • Uncertainty about action outcomes is thus modeled with a probability distribution function
  • Goals are represented by means of utility functions
    • numeric functions that give preferences to states to be traversed and/or actions to be performed
    • Utility functions can express preferences on the entire execution path of a plan

6 of 27

Planning with Markov Decision Process

  • Plans are represented as policies
    • specify the action to execute in each state
    • Execution of a policy results in conditional behavior
  • The planning problem is seen as an optimization problem
    • Planning algorithms search for a plan that maximizes the utility function

7 of 27

Planning with Fully Observable Domains

  •  

8 of 27

Domain as Stochastic System

 

Nondeterministic Actions

move(r1,l2,l3)

move(r1,l1,l4)

 

9 of 27

Plan as Policies

  •  

10 of 27

Example Policies

All three policies try to move the robot toward state s4.

 

 

 

 

11 of 27

Policy Execution

  • Policy executions correspond to infinite sequences of states, called histories
    • Markov Chain: Sequence of random values (states) whose probability depends on the history

12 of 27

Probability of a History

13 of 27

Goals as Utility Functions

l1

l2

l3

l5

l4

14 of 27

Goals as Utility Functions

  • Goals are no longer set of desired final states
  • Quantitative way to represented preferred execution
    • We prefer the robot to move to the state s4 rather than s5
  • Costs and rewards provide an easy way to express competing criteria
    • the criteria of shortest path versus location avoidance

15 of 27

Utility Function

  •  

16 of 27

Utility of a History

 

 

17 of 27

Expected Utility of a Policy

 

 

Example expected utility

18 of 27

Planning as an Optimization Problem

  •  

19 of 27

Expected Utility without Reward

When there are no rewards associated with the states

20 of 27

Bellman Equation

 

 

 

 

 

 

 

 

21 of 27

Planning Algorithm

  •  

22 of 27

23 of 27

24 of 27

 

Solve system of equations

25 of 27

Improve the policy by choosing

Optimal Policy:

26 of 27

27 of 27

 

Possible iterations for state s1 at the first step