1 of 15

CSC 384 MCTS for Othello

2 of 15

In this slide, we are going to walkthrough how MCTS works in Othello from a high level perspective. To simply demonstrate the algorithm, we have reduced the number of option (possible legal move). For instance, there should be 4 children instead of 2 in Iter1.

This slide is based on examples from this link (https://www.analyticsvidhya.com/blog/2019/01/monte-carlo-tree-search-introduction-algorithm-deepmind-alphago/). However, the original one does contain some errors in both diagram and description.

3 of 15

Iter1

  1. Here we use a 4x4 board and dark color as our AI player to walkthrough the beginning phases of the searching.
  2. We begin with an initial state S0. Inside each state, “reward” means the number of win from related simulations and “total” indicates the number of simulation for itself and its children.
  3. We use (row, column) to identify the chess position. In this initial state, we have two options: (0,1), (1,0).

4 of 15

Iter1 Selection

UCB_Sn = reward_Sn / total_Sn + bias*sqrt(ln(total_Sn_parent) / total_Sn)

  1. We calculate the UCB value for both S1 and S2 to determine which child to be selected.
  2. Bias can be any constant value and it does not have to stay same for each node in the search tree. In this walkthrough, we use sqrt(2) here but you can pick your own value.
  3. However, none of the nodes has been visited yet, both will have infinite UCB values. Hence, we just pick S1 here.

5 of 15

Iter1 Expansion & Simulation

  1. Since S1 is a leaf node and not visited before. We will not expand this node. Instead, we do a simulation until the game is over.
  2. The game results in a win for dark chess.

6 of 15

Iter1 Backpropagation

  1. We update the reward and total in S1.
  2. Since S0 is the parent of S1, its reward and total should also be updated. (In a deeper tree, you need to recursively do this step until it reaches the root)
  3. Now we have finished the first iteration.

7 of 15

Iter2 Selection

  1. For the second iteration, we go back to the initial state and calculate the UCB values for children again.
  2. Since UCB_S1 < UCB_S2, we pick S2.

8 of 15

Iter2 Expansion & Simulation

  1. Similar to S1 in Iter1, S2 is a leaf node and not visted before. We skip expansion and perform a simulation here.
  2. The simulated result shows that light color wins.

9 of 15

Iter2 Backpropagation

  1. We update the reward and total in S2 and its parent S0.
  2. Now we have finished the second iteration.

10 of 15

Iter3 Selection

  1. We go back to the initial state and calculate the UCB values for children again.
  2. Since UCB_S1 > UCB_S2, we pick S1 this time.

11 of 15

Iter3 Expansion

  1. Unlike Iter1, we have already visited S1 before. So we need to determine whether we can expand this node.
  2. Since S1 is not a terminal node, it still has legal moves or the game is not over yet. We should perform a expansion on S1 before we continue.
  3. We call the api function and get 2 options*, that is (0,0) and (0,2) for the light color.

*: We reduced it to 2 for simplicity here.

12 of 15

Iter3 Expansion (Selection)

  1. We add two new nodes as children to S1. Since S3 and S4 are neither visited like the situation in Iter1 selection. We pick S3 here.

13 of 15

Iter3 Simulation

  1. The simulated result shows that light color wins, which gives a reward of 0.

14 of 15

Iter3 Backpropagation

  1. We update the reward and total in S3 first.
  2. As S1 is the parent of S3 and S0 is the parent of S1, we update their record as well.
  3. Now we have finished the third iteration.
  4. Now suppose that the time is up and we can no longer perform more iterations here. We can still decide which move is better using current search tree. For example, we can simply assessed S1 and S2 using reward/total.

15 of 15

Iter4 Selection

  1. However, if we still have time left. We can continue this loop. More iterations can bring more info to our search tree and help the algorithm to make more reliable and unbiased decision.
  2. To continue, we go back to S0 and calculate UCB values again for both. Since S1 > S2, we take S1 again.
  3. The above illustration should be good enough to help you start.