Michigan Data Science Team
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
Week 7
Final Project Expo
Motivation:
Monte-Carlo Tree Search (AlphaGo Zero)
Initial State
Prior (P) is already computed
No search, so mean value (Q) is empty
# Visits: 15
Total Val: 1.8
Prior: 0.5
# Visits: 3
Total Val: 0.3
Prior: 0.0
# Visits: 1
Total Val: 0.9
Prior: 0.7
# Visits: 7
Total Val: 0.4
Prior: 0.1
# Visits: 4
Total Val: 0.2
Prior: 0.2
Monte-Carlo Tree Search (AlphaGo Zero)
# Visits: 15
Total Val: 1.8
Prior: 0.5
# Visits: 3
Total Val: 0.3
Prior: 0.0
# Visits: 1
Total Val: 0.9
Prior: 0.7
# Visits: 7
Total Val: 0.4
Prior: 0.1
# Visits: 4
Total Val: 0.2
Prior: 0.2
Select
Start from the current position
Keep selecting the “best” child until we reach a leaf
We select this leaf to continue
Monte-Carlo Tree Search (AlphaGo Zero)
# Visits: 16
Total Val: 2.7
Prior: 0.5
# Visits: 3
Total Val: 0.3
Prior: 0.0
# Visits: 2
Total Val: 1.8
Prior: 0.7
# Visits: 7
Total Val: 0.4
Prior: 0.1
# Visits: 4
Total Val: 0.2
Prior: 0.2
Evaluate & Backup
At the selected node, evaluate the position with the value net
This counts as a visit and updates the total value
The priors for the leaf should also be saved here
Note: It is ideal to evaluate this leaf AND its dihedral transformations, but this can be implemented later
Monte-Carlo Tree Search (AlphaGo Zero)
Expand
From the selected node, we expand the tree by adding all possible children to our tree
Record the priors we’ve already previously computed
# Visits: 2
Total Val: 1.8
Prior: 0.7
# Visits: 0
Total Val: 0
Prior: 0.2
# Visits: 0
Total Val: 0
Prior: 0.1
# Visits: 0
Total Val: 0
Prior: 0.6
# Visits: 0
Total Val: 0
Prior: 0.1
Monte-Carlo Tree Search (AlphaGo Zero)
Repeat
Search multiple times
AlphaZero runs 1600 searches before making each move
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
Root
Begin search here
Search many times
Monte-Carlo Tree Search Reference Page
Edge (action) Statistics:
N = # times this move was considered
W = sum of values
Q = W/N if N != 0 else 0
P = prior probability
Parent Node (prev state) Statistics:
List[P] = priors of all children
(kind of optional runtime optimization)
Child Node (state) Statistics:
N, W, Q, of the action leading to this (exists and is unique by tree invariants)
Implementation Note: Both AG and AZ papers describe statistics in terms of edges but we implement in terms of nodes because the implementation is more natural that way