1 of 11

Improvements over AlphaZero in a general-purpose game AI library

Jakob Hain

2 of 11

Example Games

Chess

Connect-4

Connect-4 3 player

Tic-tac-toe

3 of 11

What?

Why?

A JavaScript game AI library based on AlphaZero, with improvements / differences:

  • Different games
  • Multiple players
  • Swappable evaluation function (instead of neural network)
  • Separate neural network for each player and action type

Performant: MCTS training done on multiple cores, model fitting uses CUDA (GPU)

Easier to use and extend vs. existing open-source JavaScript implementations

Faster training: AlphaZero takes a long time, if we can inject a custom evaluation function can we speed up training?

Ease-of-use: most developers don't understand AlphaZero and other AIs. Existing JavaScript implementations are not libraries and not generalized.

Further generalization: AlphaZero on non-classical games, games with more players, games with weird rules (e.g. multiple actions, uneven players, multiple turns, multiple teams)

4 of 11

What?

How?

A JavaScript game AI library based on AlphaZero, with improvements / differences:

  • Different games
  • Multiple players
  • Swappable evaluation function (instead of neural network)
  • Separate neural network for each player and action type

Performant: MCTS training done on multiple cores, model fitting uses CUDA (GPU)

Easier to use and extend vs. existing open-source JavaScript implementations

Different games

Common interface AIGame can be implemented by each game.

export interface AIGame<GameState>

Once AIGame is implemented, you can use MCTSNNetTrainer to train a neural network for the game.

class MCTSNNetTrainer<GameState>

constructor (game: AIGame<GameState>, nnet: TrainableNNet<GameState>, args: Hyperparameters)

5 of 11

What?

How?

A JavaScript game AI library based on AlphaZero, with improvements / differences:

  • Different games
  • Multiple players
  • Swappable evaluation function (instead of neural network)
  • Separate neural network for each player and action type

Performant: MCTS training done on multiple cores, model fitting uses CUDA (GPU)

Easier to use and extend vs. existing open-source JavaScript implementations

Multiple Players

Generalizes the 2-player algorithm:

  • Opponent = any other player (e.g. in MCTS, we invert the value of nodes where it is any other player's turn)
  • We keep track of the player instead of just inverting the score

6 of 11

What?

How?

A JavaScript game AI library based on AlphaZero, with improvements / differences:

  • Different games
  • Multiple players
  • Swappable evaluation function (instead of neural network)
  • Separate neural network for each player and action type

Performant: MCTS training done on multiple cores, model fitting uses CUDA (GPU)

Easier to use and extend vs. existing open-source JavaScript implementations

Swappable evaluation function

MCTS in AlphaZero uses the neural network to predict initial policy / value. We can pass in a different heuristic

export class MCTS<GameState>

constructor (game: AIGame<GameState>, heuristic: Heuristic<GameState>, args: MCTSHyperparameters)

7 of 11

What?

How?

A JavaScript game AI library based on AlphaZero, with improvements / differences:

  • Different games
  • Multiple players
  • Swappable evaluation function (instead of neural network)
  • Separate neural network for each player and action type

Performant: MCTS training done on multiple cores, model fitting uses CUDA (GPU)

Easier to use and extend vs. existing open-source JavaScript implementations

Separate neural network for each player and action type

e.g. in Chess, black strategies are slightly different than white's.

Not really an "improvement" in most games because it halves the training speed. However in some games it's more important

Support for different action types (e.g. if player moves are multi-part, or if there are different game phases), and each action type has a different network

8 of 11

What?

How?

A JavaScript game AI library based on AlphaZero, with improvements / differences:

  • Different games
  • Multiple players
  • Swappable evaluation function (instead of neural network)
  • Separate neural network for each player and action type

Performant: MCTS training done on multiple cores, model fitting uses CUDA (GPU)

Easier to use and extend vs. existing open-source JavaScript implementations

Performant

Uses node('cluster') and ZMQ to coordinate MCTS training across multiple cores

Uses tensorflow-node-gpu to train on CUDA when available

This was a significant speedup, over 5x

9 of 11

What?

How?

A JavaScript game AI library based on AlphaZero, with improvements / differences:

  • Different games
  • Multiple players
  • Swappable evaluation function (instead of neural network)
  • Separate neural network for each player and action type

Performant: MCTS training done on multiple cores, model fitting uses CUDA (GPU)

Easier to use and extend vs. existing open-source JavaScript implementations

Easier to use and extend

Users can just add the dependency from npm and implement AIGame

Existing JavaScript projects on Github are not general:

10 of 11

Challenges

Overfitting for strategies, e.g. try to get connect-4 as fast as possible without blocking the opponent

Training takes a very long time despite multicore and runs out of RAM despite having 10GB+

Generalizing AlphaZero is not particularly hard (not sure if the more general versions run as well as the original though). For example generalizing to multiple players is very straightforward

next: X

| |

-----------

| |

-----------

| |

Turn 3. Player 1

next: O

X | X |

-----------

| |

-----------

O | |

Turn 6. Player 0

next: X

X | X | O

-----------

| X | O

-----------

O | |

Turn 1. Player 1

next: O

| X |

-----------

| |

-----------

| |

Turn 4. Player 0

next: X

X | X | O

-----------

| |

-----------

O | |

Turn 7. Player 1

next: X

X | X | O

-----------

| X | O

-----------

O | X |

Turn 2. Player 0

next: X

| X |

-----------

| |

-----------

O | |

Turn 5. Player 1

next: O

X | X | O

-----------

| X |

-----------

O | |

Game over: Turn 7.

Opponent won

NEW/OLD WINS : 1/0

DRAWS : 0

11 of 11

Future work

Teams: Multiple players optimized for any player in the same team to win

Instead of completely separate neural networks, train one neural network initially and then split for the last few iterations

Predicting hidden input with another neural network (generating examples from games)

Further performance improvements, further training, more generalization