1 of 13

Your Own AlphaGo

Advanced Play using only a Laptop

By Niall Cardin

niallc@gmail.com

2 of 13

A Machine that Plays Games

When I was a teenager I found a program called Igowin.

It was so cool that it could play go!

How can it do that?

Back in 1998 though, computers weren’t good at Go.

(People had tried neural nets!)

Author: David Fotland

Smart Games LLC

3 of 13

AlphaGo

2016: DeepMind’s AlphaGo surpassed humans.

I bumped into David Fotland ( ) at a party!

He told me it wasn’t that hard, read the paper.

4 of 13

Can a hobbyist make “an AlphaGo”?

It works fine if you just try but it might disappoint after a while

Want to touch on one important detail (only have 5 minutes so I can’t talk about everything I’d like to, there’s really a lot more to say!)

Using one laptop you can produce advanced, if not superhuman, play.

Note: others¹ did this way before me, and better.

¹Leela, KataGo, …

5 of 13

What can a Neural Network do?

For our purposes a neural network is a “powerful function fitter”.

Step 1: Train on many ‘labeled’ examples of A_i → B_i.

Step 2: You have an unlabeled A^* ask the network for B^*.

How: Neural nets are like stacked regression models that insert ‘hidden’ intermediate nodes. We then update to find good intermediate values via “back propagation” using “automatic differentiation”.

This approach turns out to be hugely flexible, so it can learn to map many different types of A → B

How does this help us play games?

Bluebird

Blackbird

Not a bird

???

6 of 13

Playing Games with Neural Networks – Policy

What is playing a game? Why is it mapping A → B?

You repeatedly make a move given your situation.

So we're mapping (game-state → move)

Given some games we can feed them to a neural network and it will learn to copy the moves.

This let’s us effectively imitate moves from a dataset

The full design lets play better than the training.

If there’s time (unlikely), we can talk about the fuller design

A

B

?

7 of 13

Is Training Going Well?

Training minimizes a loss function. Over time, loss usually improves.

That’s not always better.

The games we train on may not be the best
The model may fit too narrowly, to those specific games

Better evaluation: Do we win more games?

Hold tournaments between training checkpoints.

Computers but mostly do the same thing every time, by default

Suppose we’re training over many past games (something more complicated than Noughts and crosses).

We’ll want to know whether it’s working, and when to stop.

Training uses a “loss” function, and it usually improves over time – modulo various caveats.

But what we really care about is good gameplay, and a better loss is not always the same as better gameplay.

Maybe the games we’re training on aren’t of a high enough standard, or maybe we’re fitting them very narrowly and the model isn’t making great guesses on positions it hasn’t seen before.

AlphaGo evaluates models by running tournaments, with different stages of training playing games against each other – the models which win are kept, and we stop training when we don’t produce new best-models.

But one game isn’t enough to determine which of two models is better, and computers are largely deterministic, so repeated games will get the same result.

8 of 13

Randomness

Our network roughly produces a probability for each possible next move:

47% → Move 1
18% → Move 2�9% → Move 3�…
0.01% → Move 361

We could play be choosing each move with that probability.

There are other (better) schemes for randomness, but this will do here.

9 of 13

Which model is better?

	Model Score	Real Value
Move 1	90%	7/10
Move 2	5%	5/10
Move 3 …	2%	3/10
Move 13	0.01%	10/10

	Model Score	Real Value
Move 1	35%	10/10
Move 2	15%	7/10
Move 3 …	10%	5/10
Move 13	0.1%	1/10

Model 1

Model 2

Suppose typical move values we as below:

With randomness, model 2 will play many bad moves.

Without randomness, model 2 plays great moves. Model 1 plays decent but not great moves.

Intuitively, I’d actually prefer Model 2 to Model 1.

Here’s a subtlety though, one which mattered for me.

The winner of a game with randomness isn’t always the the one that plays best, deterministically.

Or the model I think is best.

Why not?

Let’s look at two models.

At the top of each table are the moves each model thinks are best.

In the 3rd column we show the true move values.

Model 1 is confident but doesn’t get the ranking perfect, model 2 has an accurate ranking but is less confident.

This is a real thing that happens but why does it matter?

Ranking better, model 2 would win without randomness, but with randomness it will periodically pick moves a long way down the list.

These can lose a game immediately.

Model 1 more reliably plays moves that are good-enough, but not great. �

Model 1 will typically beat Model 2 but Model 2 feels better to me and wins in deterministic games.

This means that the model which wins my tournaments isn’t the one I want!

What if you don’t want that?

10 of 13

A Better Evaluation

Have a library of existing games already

Sample a unique collection of openings

Play deterministically from those openings

Have each contestant play first per opening.

Winner: Model that’s best at picking the best move

Model 1 first

Model2 first

Model 1 first

Model2 first

Model 1 first

Model2 first

Model 1 first

Model2 first

11 of 13

Extra Time: Value Network

Learning to play better than the training data – the really cool part

A 2nd “value” network maps position → probability of a win.

This allows reading ahead to evaluate moves with tree search. Efficient tree search can gives games better than the training data.

Iterate training → self-play → training.

If there’s time (unlikely), we can talk about the fuller design

Value Network

50%

52%

54%

50%

A

B

12 of 13

Extra Time 2 – Returning to Evaluation

To use self-play → new training data, we need to use randomness again.

Does that mean evaluating deterministic play isn’t quite right?

Ultimately the ideal solution is:

Decide how much randomness you need
For each model, tune the tree search to be as strong as possible, given 1.
Choose the best (model, tree search) combination to create new games

But this is computationally expensive. So without infinite compute we need to take shortcuts and use intuition. I’ve had good results using randomized openings.

13 of 13

Thanks for Listening!

Try it out!

A program to play hex (great game!).

sf25.niallcardin.com

github.com/niallc/Snowflake2025