1 of 9

Michigan Data Science Team

Week 7

2 of 9

Final Project Expo

There is a form we need to submit
Big group presentation or poster/electronic demo
???????

3 of 9

Motivation:

4 of 9

Monte-Carlo Tree Search (AlphaGo Zero)

Initial State

Prior (P) is already computed

No search, so mean value (Q) is empty

# Visits: 15

Total Val: 1.8

Prior: 0.5

# Visits: 3

Total Val: 0.3

Prior: 0.0

# Visits: 1

Total Val: 0.9

Prior: 0.7

# Visits: 7

Total Val: 0.4

Prior: 0.1

# Visits: 4

Total Val: 0.2

Prior: 0.2

5 of 9

Monte-Carlo Tree Search (AlphaGo Zero)

# Visits: 15

Total Val: 1.8

Prior: 0.5

# Visits: 3

Total Val: 0.3

Prior: 0.0

# Visits: 1

Total Val: 0.9

Prior: 0.7

# Visits: 7

Total Val: 0.4

Prior: 0.1

# Visits: 4

Total Val: 0.2

Prior: 0.2

Select

Start from the current position

Keep selecting the “best” child until we reach a leaf

We select this leaf to continue

6 of 9

Monte-Carlo Tree Search (AlphaGo Zero)

# Visits: 16

Total Val: 2.7

Prior: 0.5

# Visits: 3

Total Val: 0.3

Prior: 0.0

# Visits: 2

Total Val: 1.8

Prior: 0.7

# Visits: 7

Total Val: 0.4

Prior: 0.1

# Visits: 4

Total Val: 0.2

Prior: 0.2

Evaluate & Backup

At the selected node, evaluate the position with the value net

This counts as a visit and updates the total value

The priors for the leaf should also be saved here

Note: It is ideal to evaluate this leaf AND its dihedral transformations, but this can be implemented later

7 of 9

Monte-Carlo Tree Search (AlphaGo Zero)

Expand

From the selected node, we expand the tree by adding all possible children to our tree

Record the priors we’ve already previously computed

# Visits: 2

Total Val: 1.8

Prior: 0.7

# Visits: 0

Total Val: 0

Prior: 0.2

# Visits: 0

Total Val: 0

Prior: 0.1

# Visits: 0

Total Val: 0

Prior: 0.6

# Visits: 0

Total Val: 0

Prior: 0.1

8 of 9

Monte-Carlo Tree Search (AlphaGo Zero)

Repeat

Search multiple times

AlphaZero runs 1600 searches before making each move

Root

Begin search here

Search many times

9 of 9

Monte-Carlo Tree Search Reference Page

Edge (action) Statistics:

N = # times this move was considered

W = sum of values

Q = W/N if N != 0 else 0

P = prior probability

Parent Node (prev state) Statistics:

List[P] = priors of all children

(kind of optional runtime optimization)

Child Node (state) Statistics:

N, W, Q, of the action leading to this (exists and is unique by tree invariants)

Implementation Note: Both AG and AZ papers describe statistics in terms of edges but we implement in terms of nodes because the implementation is more natural that way