2 of 88

Lecture 02: Sequential Decisions under Uncertainty - Markov Decision Processes

Scan me:

Quizzes

Anonymous Questions (during or after the lecture)

The Lecture will be recorded

vevox.app�161-834-334

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

3 of 88

Course Plan

without�Uncertainty

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

4 of 88

Agenda for today

Small example
Defining a problem of Sequential Decisions under Uncertainty
Student’s problem as a Sequential Decisions problem
Applying what we learned on the Lead Example
How to code an Evaluation Framework
Quiz with 1.5 bonus points for the top 3.

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

5 of 88

The process of designing �“Decision-making” frameworks

Define

Design

Evaluate

Markov Decision Process

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

6 of 88

The process of designing �“Decision-making” frameworks

Define

Design

Evaluate

Markov Decision Process

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

7 of 88

The Bike-to-University problem

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

8 of 88

The Bike-to-University problem

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

9 of 88

Stochastic Dynamics

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

10 of 88

State Variables

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

11 of 88

The Bike-to-University problem as a Markov Decision Process (MDP)

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

12 of 88

The Bike-to-University problem as a Markov Decision Process (MDP)

“The Markov property”

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

13 of 88

Solution Concept

Solution Concept: …?

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

14 of 88

Policy

State Variables

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

15 of 88

Markov Decision Process (MDP)

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

16 of 88

Markov Decision Process (MDP)

a.k.a. “actions” or “controls” or “manipulated variables”

a.k.a. “Reward” function

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

17 of 88

Markov Decision Process (MDP)

Are all sequential decision frameworks Markovian?

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

18 of 88

Markov Decision Process (MDP)

Are all sequential decision frameworks Markovian?

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

19 of 88

Markov Decision Process (MDP)

Are all sequential decision frameworks Markovian?

No, but they can be made Markovian with State Augmentation

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

20 of 88

Markov Decision Process (MDP)

Are all sequential decision frameworks Markovian?

Can you think of examples?

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

21 of 88

Markov Decision Process (MDP)

Are all sequential decision frameworks Markovian?

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

22 of 88

Markov Decision Process (MDP)

Are all sequential decision frameworks Markovian?

a.k.a. Control Law or Agent

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

23 of 88

Defining an MDP out of verbal explanations

Define

Design

Evaluate

Markov Decision Process

How do we extract this information from the stakeholder?

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

24 of 88

Defining an MDP out of verbal explanations

Define

Design

Evaluate

Markov Decision Process

What is our performance metric?
What do we get to decide / control?
How often do we get/need to make a decision?
What are the sources of uncertainty/ What can be observed*?
How exactly are these connected?

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

25 of 88

Defining an MDP out of verbal explanations

Define

Design

Evaluate

Markov Decision Process

What is our performance metric?
What do we get to decide / control?
How often do we get/need to make a decision?
What are the sources of uncertainty/ What can be observed*?
How exactly are these connected?

Cost Function

Actions �(+ constraints)

Time granularity

State Variables

Dynamics�(+constraints)

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

26 of 88

The student’s problem

Let’s simplify it so that we don’t carry too much luggage

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

27 of 88

The student’s problem simplified

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

28 of 88

The student’s problem simplified

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

29 of 88

The student’s problem under uncertainty

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

30 of 88

Deterministic Optimization Problem

The student’s problem as a sequential decision problem

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

31 of 88

Deterministic Optimization Problem

Sequential Decision Problem

Cost:

Actions:

States:

Transition�Dynamics:

The student’s problem as a sequential decision problem

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

32 of 88

Deterministic Optimization Problem

Sequential Decision Problem

The student’s problem as a sequential decision problem

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

33 of 88

Deterministic Optimization Problem

Sequential Decision Problem

Exogenous vs Endogenous state variables

Endogenous or Exogenous?

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

34 of 88

Deterministic Optimization Problem

Sequential Decision Problem

Exogenous vs Endogenous state variables

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

35 of 88

Deterministic Optimization Problem

Sequential Decision Problem

The student’s problem including stress

What is “stressed”?

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

36 of 88

Deterministic Optimization Problem

Sequential Decision Problem

The student’s problem including stress

What is “stressed”?

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

37 of 88

Deterministic Optimization Problem

Sequential Decision Problem

The student’s problem including stress

Is this formulation complete?

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

38 of 88

Deterministic Optimization Problem

Sequential Decision Problem

The student’s problem including stress

I must specify the dynamics for every state variable

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

39 of 88

Deterministic Optimization Problem

Sequential Decision Problem

The student’s problem including stress

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

40 of 88

Deterministic Optimization Problem

Sequential Decision Problem

The student’s problem including stress

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

41 of 88

The Lead Example

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

42 of 88

The Lead Example: Deterministic Optimization

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

43 of 88

Agent vs Environment

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

44 of 88

Back to the Big Picture

Define

Design

Evaluate

MDP

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

45 of 88

Back to the Big Picture

Define

Design

Evaluate

MDP

Agent

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

46 of 88

Back to the Big Picture

Define

Design

Evaluate

MDP

Agent

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

47 of 88

Back to the Big Picture

Define

Design

Evaluate

MDP

Agent

Environment

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

48 of 88

Agent vs Environment

The environment is a simulator of the real system�i.e. the agent’s action is applied to it (input) and it returns the next state and the reward (output).

DTU Compute

24 May 2023

49 of 88

Agent vs Environment

The environment is a simulator of the real system�i.e. the agent’s action is applied to it (input) and it returns the next state and the reward (output).

The agent may internally model the environment (namely, the state dynamics) to account for the future. But it does not have perfect foresight over the future.

DTU Compute

24 May 2023

50 of 88

The student’s problem

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

51 of 88

The student’s problem

You observe your state

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

52 of 88

The student’s problem

You observe your state

You reason about how your action will affect your reward and perhaps also how it may affect your next state and beyond

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

53 of 88

The student’s problem

You observe your state

You reason about how your action will affect your reward and perhaps also how it may affect your next state and beyond

You apply an action

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

54 of 88

The student’s problem

You observe your state

You reason about how your action will affect your reward and perhaps also how it may affect your next state and beyond

You apply an action

Next week, you observe your new (realized) state. Which is generally different than the one you might have predicted.

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

55 of 88

The student’s problem

You observe your state

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

56 of 88

The student’s problem

You observe your state

Next week, you observe your new (realized) state. Which is generally different than the one you might have predicted.

Reward

Action

State

Constraints

Dynamics

You reason about how your action will affect your reward and perhaps also how it may affect your next state and beyond

You apply an action

Where do these live?

Effort

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

57 of 88

The student’s problem

You observe your state

Next week, you observe your new (realized) state. Which is generally different than the one you might have predicted.

You reason about how your action will affect your reward and perhaps also how it may affect your next state and beyond

You apply an action

Work 100 hours each and every week

What is the cost of this action?

What is the next state?

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

58 of 88

The student’s problem

You observe your state

Next week, you observe your new (realized) state. Which is generally different than the one you might have predicted.

You reason about how your action will affect your reward and perhaps also how it may affect your next state and beyond

You apply an action

Work 100 hours each and every week

What is the cost of this action?

What is the next state?

Detect infeasible actions, kill the simulation and print an error
Map any action into the feasible space

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

59 of 88

Lead Example

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

60 of 88

The Lead Example

DTU Compute

24 May 2023

61 of 88

The Lead Example: Deterministic Optimization

DTU Compute

24 May 2023

62 of 88

The Lead Example

DTU Compute

24 May 2023

63 of 88

The Lead Example

DTU Compute

24 May 2023

64 of 88

The Lead Example

How many State �variables do �we need?

DTU Compute

24 May 2023

65 of 88

The Lead Example

DTU Compute

24 May 2023

66 of 88

The Lead Example

DTU Compute

24 May 2023

67 of 88

The Lead Example

DTU Compute

24 May 2023

68 of 88

The Lead Example as an MDP

DTU Compute

24 May 2023

69 of 88

The Lead Example as an MDP

DTU Compute

24 May 2023

70 of 88

The Lead Example as an MDP

DTU Compute

24 May 2023

71 of 88

The Lead Example as an MDP

DTU Compute

24 May 2023

72 of 88

The Lead Example as an MDP

DTU Compute

24 May 2023

73 of 88

The Lead Example as an MDP

DTU Compute

24 May 2023

74 of 88

The Lead Example as an MDP

DTU Compute

24 May 2023

75 of 88

The Lead Example’s Environment

Decision

Cost�Next state

Environment

DTU Compute

24 May 2023

76 of 88

The Lead Example’s Environment

Decision

Cost�Next state

Environment

Check Feasibility

Map to Feasible Actions

If infeasible, apply dummy actions

Apply Transition

DTU Compute

24 May 2023

77 of 88

Evaluating a policy

Check Feasibility

Map to Feasible Actions

If infeasible, apply dummy actions

Apply Transition

Decision

Cost�Next state

Environment

DTU Compute

24 May 2023

78 of 88

Evaluating a policy

CheckFeasibility.py ( )

Map to Feasible Actions

If infeasible, apply dummy actions

Apply Transition

Decision

Cost�Next state

Environment

DTU Compute

24 May 2023

79 of 88

Evaluating a policy

CheckFeasibility.py ( )

Map to Feasible Actions

If infeasible, apply dummy actions

Apply Transition

Decision

Cost�Next state

Environment

Infeasible =

if Infeasible = 1:

Decisions = dummy_policy.py(state)

DTU Compute

24 May 2023

80 of 88

Evaluating a policy

CheckFeasibility.py ( )

Actions = dummy_policy.py(state)

Decision

Cost�Next state

Environment

Infeasible =

if Infeasible = 1:

DTU Compute

24 May 2023

81 of 88

Evaluating a policy

CheckFeasibility.py ( )

Decisions = dummy_policy.py(state)

Decision

Cost�Next state

Environment

Infeasible =

if Infeasible = 1:

next_state = apply_dynamics(state, Decisions)

cost = current_price * Decisions[power_from_grid]

DTU Compute

24 May 2023

82 of 88

Evaluating a policy

CheckFeasibility.py ( )

Decisions = dummy_policy.py(state)

Decision

Infeasible =

if Infeasible = 1:

next_state = apply_dynamics(state, Decisions)

cost = current_price * Decisions[power_from_grid]

DTU Compute

24 May 2023

83 of 88

Evaluating a policy

CheckFeasibility.py ( )

Decisions = dummy_policy.py(state)

Decision

Infeasible =

if Infeasible = 1:

next_state = apply_dynamics(state, Decisions)

cost[t] = current_price * Decisions[power_from_grid]

For t in range(T):

DTU Compute

24 May 2023

84 of 88

Evaluating a policy

CheckFeasibility.py ( )

Decisions = dummy_policy.py(state)

Decision

Infeasible =

if Infeasible = 1:

next_state = apply_dynamics(state, Decisions)

cost[t] = current_price * Decisions[power_from_grid]

For t in range(T):

state = next_state

Decisions = policy.py(state)

DTU Compute

24 May 2023

85 of 88

Evaluating a policy

CheckFeasibility.py ( )

Decisions = dummy_policy.py(state)

Decision

Infeasible =

if Infeasible = 1:

next_state = apply_dynamics(state, Decisions)

cost[t] = current_price * Decisions[power_from_grid]

For t in range(T):

state = next_state

Decisions = policy.py(state)

EvaluateDailyCost.py(day)

DTU Compute

24 May 2023

86 of 88

Exercise

CheckFeasibility.py ( )

Decisions = dummy_policy.py(state)

Decision

Infeasible =

if Infeasible = 1:

next_state = apply_dynamics(state, Decisions)

cost[t] = current_price * Decisions[power_from_grid]

For t in range(T):

state = next_state

Decisions = policy.py(state)

EvaluateDailyCost.py(day)

For day in range(Days):

Implement the Python function that evaluates the average cost of a policy for the Energy Hub problem

DTU Compute

24 May 2023

87 of 88

Questions

vevox.app

161-834-334

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty

88 of 88

Quiz

vevox.app

123-410-417

DTU Compute

2 February 2021

Welcome to 02435�Decision-making under uncertainty