Computers playing the game of Go
Google-Deepmind vs Fan Hui, the machine wins 5 - 0.
Notes on the performance “Alpha-Go” (Google DeepMind) vs “Fan Hui” (pro 2p).
at last, a computer won against a pro human at the game of Go, in a fair game (no handicap), in 19x19. Up to now, computers needed at least 4 handicap stones.
The algorithm uses several tools from artificial intelligence:
We see that the algorithm is “multiple”, and that most human forms of learning are included (learning by studying games, by pattern recognition, by self-play, by mental analysis of possible futures).
Whereas many games are either solved (i.e. the machine plays optimally, as in Draughts) or quasi-solved (in the sense that the machine is stronger than all humans, as in Chess), the game of Go was until recently untractable by computers, where humans were still unbeatable. The game of Go comes from Asia.
Go board (source: Wikipedia).
The tool used by Google Deepmind uses several artificial intelligence tools: deep neural networks, Monte Carlo tree search.
A neuron computes a linear combination z of its entries x1,x2,...,xk, z=w0+w1x1+w2x2+...+wkxk, then transformed by a sigmoid function z → f(z), for example f(z)=1/(1+exp(-z)). The neuron, when it receives an input x, outputs 1/(1+exp(-w0-w1x1-w2x2-...-wkxk)).
A neural network combines a large number of neurons, where, say, the information arrives on the left and is propagated/processed to the right where an output is given. Below, y is the output obtained when an input x is proposed.
Such a network has a huge expressivity; thanks to nonlinearities (sigmoids), each layer drastically increases the expressivity. A challenge, however, is to find the coefficients w such that the network computes something interesting.
Some solutions have been proposed long ago, when data x1,y1,...,xm,ym are available: we compute the w such that the network outputs (on the right) y1 when it sees the input x1 on the left (and yn for the input xn). The classical technique for that is called “gradient backpropagation”. However, this method is slow when there are many layers. For this reason, the artificial intelligence community has a little bit forgotten neural networks with many layers (“deep” neural nets). However, two key tools can save the situation:
We have seen that neural networks are a powerful tool, but that it’s difficult to handle the w coefficients (millions of neurons). An approach consists in creating successive layers in an unsupervised manner, i.e. without using the output (the yi) part. In the case of Go, it means that the network first studies positions only, and not the move proposed by human experts. This can be done layerwise, making the problem much easier. Then, the last layer, and only this one, uses the output information (the yi). Possibly, later, all the network is updated, using the whole data.
But for images, and a Go board is an image (one pixel per interesction of the board), another trick brings a huge improvement. The first layers, as well as in the human visual cortex, are low level features; they are the same everywhere in the picture; therefore we can have far less coefficients and use the same coefficients everywhere. Even for high level features, such as recognizing a cat, a cat in the top of an image is the same as a cat in the bottom of an image. With such a trick, what is learnt in a part of the board is learnt also for everywhere on the board.
Illustration of deep neural network by Google, used by Cypher. The first layers distinguish some simple features (such as “diagonal line”). The later layers recognize high level concepts such as “cats” or “faces”.
Deep neural networks have outperformed all other approaches for many areas of image recognition and are used in many domains - including Atari games and the game of Go.
Using convolutional network learning, the neural network is ready for associated some outputs y (which are moves) to some inputs x (which are Go positions). For this, it is given plenty of boards equipped with an excellent move (this is obtained on games from strong players). This is the first step of the program. Please note that the network proposes several possible moves, with different probabilities. This neural net is said “actor”: it is a network which provides legal moves in given situations. This is imitation learning, because it is based on mimicking human moves.
Learning by imitation might be disappointed at first view, because it is aimed at reproducing the behavior of an existing expert; but it is a good first step for a learning (in the case of Go, the neural network is later trained by self-play). In this picture, table tennis learning at Tuebingen MPI-IS.
In a second step, the deep network plays against itself or earlier versions of itself. The neural learns using the REINFORCE algorithm. Then, using gradients (again) it increases the probability of moves which lead to more victories. In addition, the games are stored; they will be used for a second deep convolutional network (value network, also known as critic network), which evaluates positions (for each position, the network provides probabilities).
Contrarily to Chess or many games, Go, as well as some games which are hard for computers, has no simple evaluation function: there is no simple function which, given a board with stones, outputs an approximate probability of winning. For these games, some pioneering works have shown that the simple methodologie in which random games are played, and the frequency of games won by Black is an estimate of the quality of the position for Black, is not that bad: this is termed the Monte Carlo approach. B. Bouzy developped various works in this direction, originally proposed by B. Bruegman. T. Cazenave still uses it for very hard games, such as Phantom Go, for which it is still the state of the art.
Left: a naive Monte Carlo. Right: a smarter Monte Carlo. Y. Wang has been a pioneer in the art of using “smart” Monte Carlo in MCTS.
This approach, nonetheless, lacks a critical component, at least for fully observable games: a good combination with tree search.
Artificial intelligence algorithms for games are usually either alpha-beta (with as a central components the tree search and the evaluation function; the latter is used as a bias in the former), or Monte Carlo; R. Coulom’s article in 2006 provided a new tool, MCTS. The principle of MCTS is to introduce, dynamically, a bias in the Monte Carlo simulations; moves leading to more victories are more likely to be played. Therefore, the simulations become smarter and smarter.
This approach has been critical for the first milestones of Computer Go against humans, with the first wins against pros in 9x9 (without handicap) and in 19x19 (with handicap), thanks to parallel machines.
The first win in 9x9 against a top pro, without handicap, with the disadvantageous side (komi 7.5, Computer is black). This was realized by MoGo (2009).
Schéma de l’utilisation de réseaux profonds pour biaiser la fouille d’arbre Monte Carlo.
The combination is realized as follows:
AI designed by Google Deepmind, combining the last technologies from artificial intelligence (deep neural networks, Monte Carlo tree search).
Fan Hui (https://fr.wikipedia.org/wiki/Fan_Hui) is second Dan pro (2p). He is usually considered as the best Go player in Europe. The best players in the world are however in Asia.
5 games have been played formally, with time settings chosen by the human player; the machine won 5 out of 5 games. Then, the human requested blitz games (3 byoyomis of 30s); the machine won 3 games out of 5.
1 2 3 4 5
 In the game of Go, stones are put on line intersections.
 Inspired by the pioneering work by Yann Lecun (1998).