Reinforcement Learning
Monte Carlo Method
1
2
keywords
3
Reinforcement Learning?
4
Easier way
5
6
Agent
Action: left, right, jump …
Reward: (score, coins)
State:
Map info
Enemy location
Time left
7
Let's build an agent that can play blackjack
8
Basic Rules
9
Basic Rules
10
11
12
13
14
keywords
15
This is the policy
16
Dealer’s first card
Player’s cards
Policy: deciding action (hit or stand)
based on state (Dealer’s card and Player’s cards)
Monte Carlo
Monte Carlo methods vary, but tend to follow a particular pattern:
17
18
19
20
Example�episode1
21
More episodes…
22
23
Results- Ace card in hand
24
Dealer’s first card
Player’s card
Results- Ace not in hand
25
Application- Multi Agent
26
27
Hider
Seeker
Object
Agent: Hider and Seeker
Environment: wall, floor, objects
Action: move around, push and pull object
Hider can ‘lock’ object
28
Hider uses cube objects to block the enterance
29
Seeker uses ramp objects to climb over the wall
30
Hider takes away ramp objects and block the enterance
31
Hider builds wall, locks ramp so that seekers can not use them
Seeker found glitch, climbs over cube and moves with it
Locked
32
Seeker use cube like a vehicle
Moves with it
Glitch
33
Personal thoughts
34
35
Hider builds wall, locks every object from now on :(
Application- Bio
36
37
38
39
Environment: FDA approved UVA/Padova simulator
Agent: Insulin Pump
Action: release insulin or not
Goal is to maintain normal state
40
41
Traditional MSA algorithms works, but computational complexity needs to be improved.
Agent (RL model) gets rewarded if MSA result is similar to traditional algorithm (e.g. , dynamic programming)