Lecture 02: Sequential Decisions under Uncertainty - Markov Decision Processes
2
Scan me:
Quizzes
Anonymous Questions (during or after the lecture)
The Lecture will be recorded
vevox.app�161-834-334
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
Course Plan
3
without�Uncertainty
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
Agenda for today
4
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
The process of designing �“Decision-making” frameworks
5
Define
Design
Evaluate
Markov Decision Process
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
The process of designing �“Decision-making” frameworks
6
Define
Design
Evaluate
Markov Decision Process
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
The Bike-to-University problem
7
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
The Bike-to-University problem
8
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
Stochastic Dynamics
9
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
State Variables
10
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
The Bike-to-University problem as a Markov Decision Process (MDP)
11
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
The Bike-to-University problem as a Markov Decision Process (MDP)
12
“The Markov property”
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
Solution Concept
13
Solution Concept: …?
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
Policy
14
State Variables
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
Markov Decision Process (MDP)
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
Markov Decision Process (MDP)
a.k.a. “actions” or “controls” or “manipulated variables”
a.k.a. “Reward” function
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
Markov Decision Process (MDP)
Are all sequential decision frameworks Markovian?
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
Markov Decision Process (MDP)
Are all sequential decision frameworks Markovian?
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
Markov Decision Process (MDP)
Are all sequential decision frameworks Markovian?
No, but they can be made Markovian with State Augmentation
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
Markov Decision Process (MDP)
Are all sequential decision frameworks Markovian?
Can you think of examples?
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
Markov Decision Process (MDP)
Are all sequential decision frameworks Markovian?
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
Markov Decision Process (MDP)
Are all sequential decision frameworks Markovian?
a.k.a. Control Law or Agent
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
Defining an MDP out of verbal explanations
23
Define
Design
Evaluate
Markov Decision Process
How do we extract this information from the stakeholder?
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
Defining an MDP out of verbal explanations
24
Define
Design
Evaluate
Markov Decision Process
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
Defining an MDP out of verbal explanations
25
Define
Design
Evaluate
Markov Decision Process
Cost Function
Actions �(+ constraints)
Time granularity
State Variables
Dynamics�(+constraints)
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
The student’s problem
Let’s simplify it so that we don’t carry too much luggage
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
The student’s problem simplified
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
The student’s problem simplified
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
The student’s problem under uncertainty
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
Deterministic Optimization Problem
The student’s problem as a sequential decision problem
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
Deterministic Optimization Problem
Sequential Decision Problem
Cost:
Actions:
States:
Transition�Dynamics:
The student’s problem as a sequential decision problem
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
32
Deterministic Optimization Problem
Sequential Decision Problem
The student’s problem as a sequential decision problem
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
33
Deterministic Optimization Problem
Sequential Decision Problem
Exogenous vs Endogenous state variables
Endogenous or Exogenous?
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
34
Deterministic Optimization Problem
Sequential Decision Problem
Exogenous vs Endogenous state variables
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
35
Deterministic Optimization Problem
Sequential Decision Problem
The student’s problem including stress
What is “stressed”?
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
36
Deterministic Optimization Problem
Sequential Decision Problem
The student’s problem including stress
What is “stressed”?
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
37
Deterministic Optimization Problem
Sequential Decision Problem
The student’s problem including stress
Is this formulation complete?
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
38
Deterministic Optimization Problem
Sequential Decision Problem
The student’s problem including stress
I must specify the dynamics for every state variable
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
39
Deterministic Optimization Problem
Sequential Decision Problem
The student’s problem including stress
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
40
Deterministic Optimization Problem
Sequential Decision Problem
The student’s problem including stress
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
The Lead Example
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
The Lead Example: Deterministic Optimization
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
Agent vs Environment
43
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
Back to the Big Picture
44
Define
Design
Evaluate
MDP
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
Back to the Big Picture
45
Define
Design
Evaluate
MDP
Agent
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
Back to the Big Picture
46
Define
Design
Evaluate
MDP
Agent
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
Back to the Big Picture
47
Define
Design
Evaluate
MDP
Agent
Environment
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
Agent vs Environment
48
The environment is a simulator of the real system�i.e. the agent’s action is applied to it (input) and it returns the next state and the reward (output).
DTU Compute
24 May 2023
Agent vs Environment
49
The environment is a simulator of the real system�i.e. the agent’s action is applied to it (input) and it returns the next state and the reward (output).
The agent may internally model the environment (namely, the state dynamics) to account for the future. But it does not have perfect foresight over the future.
DTU Compute
24 May 2023
The student’s problem
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
The student’s problem
51
You observe your state
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
The student’s problem
52
You observe your state
You reason about how your action will affect your reward and perhaps also how it may affect your next state and beyond
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
The student’s problem
53
You observe your state
You reason about how your action will affect your reward and perhaps also how it may affect your next state and beyond
You apply an action
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
The student’s problem
54
You observe your state
You reason about how your action will affect your reward and perhaps also how it may affect your next state and beyond
You apply an action
Next week, you observe your new (realized) state. Which is generally different than the one you might have predicted.
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
The student’s problem
55
You observe your state
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
The student’s problem
You observe your state
Next week, you observe your new (realized) state. Which is generally different than the one you might have predicted.
Reward
Action
State
Constraints
Dynamics
You reason about how your action will affect your reward and perhaps also how it may affect your next state and beyond
You apply an action
Where do these live?
Effort
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
The student’s problem
You observe your state
Next week, you observe your new (realized) state. Which is generally different than the one you might have predicted.
You reason about how your action will affect your reward and perhaps also how it may affect your next state and beyond
You apply an action
Work 100 hours each and every week
What is the cost of this action?
What is the next state?
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
The student’s problem
You observe your state
Next week, you observe your new (realized) state. Which is generally different than the one you might have predicted.
You reason about how your action will affect your reward and perhaps also how it may affect your next state and beyond
You apply an action
Work 100 hours each and every week
What is the cost of this action?
What is the next state?
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
Lead Example
59
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
The Lead Example
60
DTU Compute
24 May 2023
The Lead Example: Deterministic Optimization
61
DTU Compute
24 May 2023
The Lead Example
62
DTU Compute
24 May 2023
The Lead Example
63
DTU Compute
24 May 2023
The Lead Example
64
How many State �variables do �we need?
DTU Compute
24 May 2023
The Lead Example
65
DTU Compute
24 May 2023
The Lead Example
66
DTU Compute
24 May 2023
The Lead Example
67
DTU Compute
24 May 2023
The Lead Example as an MDP
68
DTU Compute
24 May 2023
The Lead Example as an MDP
69
DTU Compute
24 May 2023
The Lead Example as an MDP
70
DTU Compute
24 May 2023
The Lead Example as an MDP
71
DTU Compute
24 May 2023
The Lead Example as an MDP
72
DTU Compute
24 May 2023
The Lead Example as an MDP
DTU Compute
24 May 2023
The Lead Example as an MDP
DTU Compute
24 May 2023
The Lead Example’s Environment
Decision
Cost�Next state
Environment
DTU Compute
24 May 2023
The Lead Example’s Environment
Decision
Cost�Next state
Environment
Check Feasibility
Map to Feasible Actions
If infeasible, apply dummy actions
Apply Transition
DTU Compute
24 May 2023
Evaluating a policy
77
Check Feasibility
Map to Feasible Actions
If infeasible, apply dummy actions
Apply Transition
Decision
Cost�Next state
Environment
DTU Compute
24 May 2023
Evaluating a policy
78
CheckFeasibility.py ( )
Map to Feasible Actions
If infeasible, apply dummy actions
Apply Transition
Decision
Cost�Next state
Environment
DTU Compute
24 May 2023
Evaluating a policy
79
CheckFeasibility.py ( )
Map to Feasible Actions
If infeasible, apply dummy actions
Apply Transition
Decision
Cost�Next state
Environment
Infeasible =
if Infeasible = 1:
Decisions = dummy_policy.py(state)
DTU Compute
24 May 2023
Evaluating a policy
80
CheckFeasibility.py ( )
Actions = dummy_policy.py(state)
Decision
Cost�Next state
Environment
Infeasible =
if Infeasible = 1:
DTU Compute
24 May 2023
Evaluating a policy
81
CheckFeasibility.py ( )
Decisions = dummy_policy.py(state)
Decision
Cost�Next state
Environment
Infeasible =
if Infeasible = 1:
next_state = apply_dynamics(state, Decisions)
cost = current_price * Decisions[power_from_grid]
DTU Compute
24 May 2023
Evaluating a policy
82
CheckFeasibility.py ( )
Decisions = dummy_policy.py(state)
Decision
Infeasible =
if Infeasible = 1:
next_state = apply_dynamics(state, Decisions)
cost = current_price * Decisions[power_from_grid]
DTU Compute
24 May 2023
Evaluating a policy
83
CheckFeasibility.py ( )
Decisions = dummy_policy.py(state)
Decision
Infeasible =
if Infeasible = 1:
next_state = apply_dynamics(state, Decisions)
cost[t] = current_price * Decisions[power_from_grid]
For t in range(T):
DTU Compute
24 May 2023
Evaluating a policy
84
CheckFeasibility.py ( )
Decisions = dummy_policy.py(state)
Decision
Infeasible =
if Infeasible = 1:
next_state = apply_dynamics(state, Decisions)
cost[t] = current_price * Decisions[power_from_grid]
For t in range(T):
state = next_state
Decisions = policy.py(state)
DTU Compute
24 May 2023
Evaluating a policy
85
CheckFeasibility.py ( )
Decisions = dummy_policy.py(state)
Decision
Infeasible =
if Infeasible = 1:
next_state = apply_dynamics(state, Decisions)
cost[t] = current_price * Decisions[power_from_grid]
For t in range(T):
state = next_state
Decisions = policy.py(state)
EvaluateDailyCost.py(day)
DTU Compute
24 May 2023
Exercise
86
CheckFeasibility.py ( )
Decisions = dummy_policy.py(state)
Decision
Infeasible =
if Infeasible = 1:
next_state = apply_dynamics(state, Decisions)
cost[t] = current_price * Decisions[power_from_grid]
For t in range(T):
state = next_state
Decisions = policy.py(state)
EvaluateDailyCost.py(day)
For day in range(Days):
Implement the Python function that evaluates the average cost of a policy for the Energy Hub problem
DTU Compute
24 May 2023
Questions
87
vevox.app
161-834-334
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty
Quiz
88
vevox.app
123-410-417
DTU Compute
2 February 2021
Welcome to 02435�Decision-making under uncertainty