Improvements over AlphaZero in a general-purpose game AI library
Jakob Hain
Example Games
Chess
Connect-4
Connect-4 3 player
Tic-tac-toe
What?
Why?
A JavaScript game AI library based on AlphaZero, with improvements / differences:
Performant: MCTS training done on multiple cores, model fitting uses CUDA (GPU)
Easier to use and extend vs. existing open-source JavaScript implementations
Faster training: AlphaZero takes a long time, if we can inject a custom evaluation function can we speed up training?
Ease-of-use: most developers don't understand AlphaZero and other AIs. Existing JavaScript implementations are not libraries and not generalized.
Further generalization: AlphaZero on non-classical games, games with more players, games with weird rules (e.g. multiple actions, uneven players, multiple turns, multiple teams)
What?
How?
A JavaScript game AI library based on AlphaZero, with improvements / differences:
Performant: MCTS training done on multiple cores, model fitting uses CUDA (GPU)
Easier to use and extend vs. existing open-source JavaScript implementations
Different games
Common interface AIGame can be implemented by each game.
export interface AIGame<GameState>
Once AIGame is implemented, you can use MCTSNNetTrainer to train a neural network for the game.
class MCTSNNetTrainer<GameState>
constructor (game: AIGame<GameState>, nnet: TrainableNNet<GameState>, args: Hyperparameters)
What?
How?
A JavaScript game AI library based on AlphaZero, with improvements / differences:
Performant: MCTS training done on multiple cores, model fitting uses CUDA (GPU)
Easier to use and extend vs. existing open-source JavaScript implementations
Multiple Players
Generalizes the 2-player algorithm:
What?
How?
A JavaScript game AI library based on AlphaZero, with improvements / differences:
Performant: MCTS training done on multiple cores, model fitting uses CUDA (GPU)
Easier to use and extend vs. existing open-source JavaScript implementations
Swappable evaluation function
MCTS in AlphaZero uses the neural network to predict initial policy / value. We can pass in a different heuristic
export class MCTS<GameState>
constructor (game: AIGame<GameState>, heuristic: Heuristic<GameState>, args: MCTSHyperparameters)
What?
How?
A JavaScript game AI library based on AlphaZero, with improvements / differences:
Performant: MCTS training done on multiple cores, model fitting uses CUDA (GPU)
Easier to use and extend vs. existing open-source JavaScript implementations
Separate neural network for each player and action type
e.g. in Chess, black strategies are slightly different than white's.
Not really an "improvement" in most games because it halves the training speed. However in some games it's more important
Support for different action types (e.g. if player moves are multi-part, or if there are different game phases), and each action type has a different network
What?
How?
A JavaScript game AI library based on AlphaZero, with improvements / differences:
Performant: MCTS training done on multiple cores, model fitting uses CUDA (GPU)
Easier to use and extend vs. existing open-source JavaScript implementations
Performant
Uses node('cluster') and ZMQ to coordinate MCTS training across multiple cores
Uses tensorflow-node-gpu to train on CUDA when available
This was a significant speedup, over 5x
What?
How?
A JavaScript game AI library based on AlphaZero, with improvements / differences:
Performant: MCTS training done on multiple cores, model fitting uses CUDA (GPU)
Easier to use and extend vs. existing open-source JavaScript implementations
Easier to use and extend
Users can just add the dependency from npm and implement AIGame
Existing JavaScript projects on Github are not general:
Challenges
Overfitting for strategies, e.g. try to get connect-4 as fast as possible without blocking the opponent
Training takes a very long time despite multicore and runs out of RAM despite having 10GB+
Generalizing AlphaZero is not particularly hard (not sure if the more general versions run as well as the original though). For example generalizing to multiple players is very straightforward
next: X
| |
-----------
| |
-----------
| |
Turn 3. Player 1
next: O
X | X |
-----------
| |
-----------
O | |
Turn 6. Player 0
next: X
X | X | O
-----------
| X | O
-----------
O | |
Turn 1. Player 1
next: O
| X |
-----------
| |
-----------
| |
Turn 4. Player 0
next: X
X | X | O
-----------
| |
-----------
O | |
Turn 7. Player 1
next: X
X | X | O
-----------
| X | O
-----------
O | X |
Turn 2. Player 0
next: X
| X |
-----------
| |
-----------
O | |
Turn 5. Player 1
next: O
X | X | O
-----------
| X |
-----------
O | |
Game over: Turn 7.
Opponent won
NEW/OLD WINS : 1/0
DRAWS : 0
Future work
Teams: Multiple players optimized for any player in the same team to win
Instead of completely separate neural networks, train one neural network initially and then split for the last few iterations
Predicting hidden input with another neural network (generating examples from games)
Further performance improvements, further training, more generalization