1 of 88

2 of 88

Lecture 02: Sequential Decisions under Uncertainty - Markov Decision Processes

2

Scan me:

Quizzes

Anonymous Questions (during or after the lecture)

The Lecture will be recorded

vevox.app�161-834-334

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

3 of 88

Course Plan

3

without�Uncertainty

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

4 of 88

Agenda for today

4

  1. Small example
  2. Defining a problem of Sequential Decisions under Uncertainty
  3. Student’s problem as a Sequential Decisions problem
  4. Applying what we learned on the Lead Example
  5. How to code an Evaluation Framework
  6. Quiz with 1.5 bonus points for the top 3.

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

5 of 88

The process of designing “Decision-making” frameworks

5

Define

Design

Evaluate

Markov Decision Process

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

6 of 88

The process of designing “Decision-making” frameworks

6

Define

Design

Evaluate

Markov Decision Process

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

7 of 88

The Bike-to-University problem

7

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

8 of 88

The Bike-to-University problem

8

 

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

9 of 88

Stochastic Dynamics

9

 

 

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

10 of 88

State Variables

10

 

 

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

11 of 88

The Bike-to-University problem as a Markov Decision Process (MDP)

 

11

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

12 of 88

The Bike-to-University problem as a Markov Decision Process (MDP)

12

 

“The Markov property”

 

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

13 of 88

Solution Concept

13

Solution Concept: …?

 

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

14 of 88

Policy

14

 

State Variables

 

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

15 of 88

Markov Decision Process (MDP)

  •  

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

16 of 88

Markov Decision Process (MDP)

  •  

 

 

a.k.a. “actions” or “controls” or “manipulated variables”

a.k.a. “Reward” function

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

17 of 88

Markov Decision Process (MDP)

  •  

 

Are all sequential decision frameworks Markovian?

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

18 of 88

Markov Decision Process (MDP)

  •  

 

Are all sequential decision frameworks Markovian?

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

19 of 88

Markov Decision Process (MDP)

  •  

 

Are all sequential decision frameworks Markovian?

 

No, but they can be made Markovian with State Augmentation

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

20 of 88

Markov Decision Process (MDP)

  •  

 

Are all sequential decision frameworks Markovian?

 

Can you think of examples?

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

21 of 88

Markov Decision Process (MDP)

  •  

 

 

Are all sequential decision frameworks Markovian?

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

22 of 88

Markov Decision Process (MDP)

  •  

 

Are all sequential decision frameworks Markovian?

 

a.k.a. Control Law or Agent

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

23 of 88

Defining an MDP out of verbal explanations

23

Define

Design

Evaluate

Markov Decision Process

How do we extract this information from the stakeholder?

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

24 of 88

Defining an MDP out of verbal explanations

24

Define

Design

Evaluate

Markov Decision Process

  1. What is our performance metric?
  2. What do we get to decide / control?
  3. How often do we get/need to make a decision?
  4. What are the sources of uncertainty/ What can be observed*?
  5. How exactly are these connected?

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

25 of 88

Defining an MDP out of verbal explanations

25

Define

Design

Evaluate

Markov Decision Process

  1. What is our performance metric?
  2. What do we get to decide / control?
  3. How often do we get/need to make a decision?
  4. What are the sources of uncertainty/ What can be observed*?
  5. How exactly are these connected?

Cost Function

Actions �(+ constraints)

Time granularity

State Variables

Dynamics�(+constraints)

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

26 of 88

The student’s problem

 

 

 

 

Let’s simplify it so that we don’t carry too much luggage

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

27 of 88

The student’s problem simplified

 

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

28 of 88

The student’s problem simplified

 

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

29 of 88

The student’s problem under uncertainty

 

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

30 of 88

Deterministic Optimization Problem

 

 

The student’s problem as a sequential decision problem

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

31 of 88

Deterministic Optimization Problem

Sequential Decision Problem

 

 

Cost:

Actions:

States:

Transition�Dynamics:

 

The student’s problem as a sequential decision problem

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

32 of 88

32

Deterministic Optimization Problem

Sequential Decision Problem

 

 

 

 

 

 

 

The student’s problem as a sequential decision problem

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

33 of 88

33

Deterministic Optimization Problem

Sequential Decision Problem

 

 

 

 

 

Exogenous vs Endogenous state variables

Endogenous or Exogenous?

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

34 of 88

34

Deterministic Optimization Problem

Sequential Decision Problem

 

 

 

 

 

Exogenous vs Endogenous state variables

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

35 of 88

35

Deterministic Optimization Problem

Sequential Decision Problem

 

 

 

 

 

 

The student’s problem including stress

 

What is “stressed”?

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

36 of 88

36

Deterministic Optimization Problem

Sequential Decision Problem

 

 

 

 

 

 

The student’s problem including stress

What is “stressed”?

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

37 of 88

37

Deterministic Optimization Problem

Sequential Decision Problem

 

 

 

 

 

 

The student’s problem including stress

Is this formulation complete?

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

38 of 88

38

Deterministic Optimization Problem

Sequential Decision Problem

 

 

 

 

 

 

The student’s problem including stress

I must specify the dynamics for every state variable

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

39 of 88

39

Deterministic Optimization Problem

Sequential Decision Problem

 

 

 

 

 

 

The student’s problem including stress

 

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

40 of 88

40

Deterministic Optimization Problem

Sequential Decision Problem

 

 

 

 

 

 

The student’s problem including stress

 

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

41 of 88

The Lead Example

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

42 of 88

The Lead Example: Deterministic Optimization

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

43 of 88

Agent vs Environment

43

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

44 of 88

Back to the Big Picture

44

Define

Design

Evaluate

 

MDP

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

45 of 88

Back to the Big Picture

45

Define

Design

Evaluate

 

 

MDP

Agent

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

46 of 88

Back to the Big Picture

46

Define

Design

Evaluate

 

 

 

MDP

Agent

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

47 of 88

Back to the Big Picture

47

Define

Design

Evaluate

 

 

 

MDP

Agent

Environment

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

48 of 88

Agent vs Environment

48

 

The environment is a simulator of the real system�i.e. the agent’s action is applied to it (input) and it returns the next state and the reward (output).

DTU Compute

24 May 2023

49 of 88

Agent vs Environment

49

 

The environment is a simulator of the real system�i.e. the agent’s action is applied to it (input) and it returns the next state and the reward (output).

The agent may internally model the environment (namely, the state dynamics) to account for the future. But it does not have perfect foresight over the future.

DTU Compute

24 May 2023

50 of 88

The student’s problem

 

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

51 of 88

The student’s problem

51

You observe your state

 

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

52 of 88

The student’s problem

52

You observe your state

You reason about how your action will affect your reward and perhaps also how it may affect your next state and beyond

 

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

53 of 88

The student’s problem

53

You observe your state

You reason about how your action will affect your reward and perhaps also how it may affect your next state and beyond

 

 

You apply an action

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

54 of 88

 

The student’s problem

54

You observe your state

You reason about how your action will affect your reward and perhaps also how it may affect your next state and beyond

 

You apply an action

Next week, you observe your new (realized) state. Which is generally different than the one you might have predicted.

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

55 of 88

The student’s problem

55

You observe your state

 

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

56 of 88

The student’s problem

You observe your state

Next week, you observe your new (realized) state. Which is generally different than the one you might have predicted.

Reward

Action

State

Constraints

Dynamics

 

You reason about how your action will affect your reward and perhaps also how it may affect your next state and beyond

You apply an action

Where do these live?

 

Effort

 

 

 

 

 

 

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

57 of 88

The student’s problem

You observe your state

Next week, you observe your new (realized) state. Which is generally different than the one you might have predicted.

You reason about how your action will affect your reward and perhaps also how it may affect your next state and beyond

You apply an action

Work 100 hours each and every week

What is the cost of this action?

What is the next state?

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

58 of 88

The student’s problem

You observe your state

Next week, you observe your new (realized) state. Which is generally different than the one you might have predicted.

You reason about how your action will affect your reward and perhaps also how it may affect your next state and beyond

You apply an action

Work 100 hours each and every week

What is the cost of this action?

What is the next state?

  1. Detect infeasible actions, kill the simulation and print an error
  2. Map any action into the feasible space

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

59 of 88

Lead Example

59

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

60 of 88

The Lead Example

  •  

60

 

DTU Compute

24 May 2023

61 of 88

The Lead Example: Deterministic Optimization

61

DTU Compute

24 May 2023

62 of 88

The Lead Example

  •  

62

 

DTU Compute

24 May 2023

63 of 88

The Lead Example

  •  

63

 

 

 

 

 

 

DTU Compute

24 May 2023

64 of 88

The Lead Example

  •  

64

 

 

 

 

 

 

How many State �variables do �we need?

DTU Compute

24 May 2023

65 of 88

The Lead Example

65

 

 

 

 

 

 

 

 

 

 

 

 

 

DTU Compute

24 May 2023

66 of 88

The Lead Example

66

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

DTU Compute

24 May 2023

67 of 88

The Lead Example

67

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

DTU Compute

24 May 2023

68 of 88

The Lead Example as an MDP

68

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

DTU Compute

24 May 2023

69 of 88

The Lead Example as an MDP

69

 

 

 

 

 

 

 

 

 

 

 

DTU Compute

24 May 2023

70 of 88

The Lead Example as an MDP

70

 

 

 

 

 

 

 

 

 

 

 

DTU Compute

24 May 2023

71 of 88

The Lead Example as an MDP

71

 

 

 

 

 

DTU Compute

24 May 2023

72 of 88

The Lead Example as an MDP

72

 

 

 

 

 

DTU Compute

24 May 2023

73 of 88

The Lead Example as an MDP

  •  

 

 

 

 

DTU Compute

24 May 2023

74 of 88

The Lead Example as an MDP

  •  

 

 

 

 

DTU Compute

24 May 2023

75 of 88

The Lead Example’s Environment

  •  

 

 

 

 

Decision

Cost�Next state

Environment

DTU Compute

24 May 2023

76 of 88

The Lead Example’s Environment

  •  

 

 

 

 

Decision

Cost�Next state

Environment

Check Feasibility

Map to Feasible Actions

If infeasible, apply dummy actions

Apply Transition

 

DTU Compute

24 May 2023

77 of 88

Evaluating a policy

77

Check Feasibility

Map to Feasible Actions

If infeasible, apply dummy actions

Apply Transition

Decision

Cost�Next state

Environment

DTU Compute

24 May 2023

78 of 88

Evaluating a policy

78

CheckFeasibility.py ( )

Map to Feasible Actions

If infeasible, apply dummy actions

Apply Transition

Decision

Cost�Next state

Environment

DTU Compute

24 May 2023

79 of 88

Evaluating a policy

79

CheckFeasibility.py ( )

Map to Feasible Actions

If infeasible, apply dummy actions

Apply Transition

Decision

Cost�Next state

Environment

Infeasible =

if Infeasible = 1:

Decisions = dummy_policy.py(state)

DTU Compute

24 May 2023

80 of 88

Evaluating a policy

80

CheckFeasibility.py ( )

Actions = dummy_policy.py(state)

Decision

Cost�Next state

Environment

Infeasible =

if Infeasible = 1:

DTU Compute

24 May 2023

81 of 88

Evaluating a policy

81

CheckFeasibility.py ( )

Decisions = dummy_policy.py(state)

Decision

Cost�Next state

Environment

Infeasible =

if Infeasible = 1:

next_state = apply_dynamics(state, Decisions)

cost = current_price * Decisions[power_from_grid]

DTU Compute

24 May 2023

82 of 88

Evaluating a policy

82

CheckFeasibility.py ( )

Decisions = dummy_policy.py(state)

Decision

Infeasible =

if Infeasible = 1:

next_state = apply_dynamics(state, Decisions)

cost = current_price * Decisions[power_from_grid]

DTU Compute

24 May 2023

83 of 88

Evaluating a policy

83

CheckFeasibility.py ( )

Decisions = dummy_policy.py(state)

Decision

Infeasible =

if Infeasible = 1:

next_state = apply_dynamics(state, Decisions)

cost[t] = current_price * Decisions[power_from_grid]

For t in range(T):

DTU Compute

24 May 2023

84 of 88

Evaluating a policy

84

CheckFeasibility.py ( )

Decisions = dummy_policy.py(state)

Decision

Infeasible =

if Infeasible = 1:

next_state = apply_dynamics(state, Decisions)

cost[t] = current_price * Decisions[power_from_grid]

For t in range(T):

state = next_state

Decisions = policy.py(state)

DTU Compute

24 May 2023

85 of 88

Evaluating a policy

85

CheckFeasibility.py ( )

Decisions = dummy_policy.py(state)

Decision

Infeasible =

if Infeasible = 1:

next_state = apply_dynamics(state, Decisions)

cost[t] = current_price * Decisions[power_from_grid]

For t in range(T):

state = next_state

Decisions = policy.py(state)

EvaluateDailyCost.py(day)

DTU Compute

24 May 2023

86 of 88

Exercise

86

CheckFeasibility.py ( )

Decisions = dummy_policy.py(state)

Decision

Infeasible =

if Infeasible = 1:

next_state = apply_dynamics(state, Decisions)

cost[t] = current_price * Decisions[power_from_grid]

For t in range(T):

state = next_state

Decisions = policy.py(state)

EvaluateDailyCost.py(day)

For day in range(Days):

Implement the Python function that evaluates the average cost of a policy for the Energy Hub problem

DTU Compute

24 May 2023

87 of 88

Questions

87

vevox.app

161-834-334

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty

88 of 88

Quiz

88

vevox.app

123-410-417

DTU Compute

2 February 2021

Welcome to 02435Decision-making under uncertainty