2 of 50

Index

Games
OPTIMAL DECISIONS IN GAMES

The minimax algorithm
Optimal decisions in multiplayer games

ALPHA–BETA PRUNING
IMPERFECT REAL-TIME DECISIONS
STOCHASTIC GAMES
PARTIALLY OBSERVABLE GAMES

Kriegspiel: Partially observable chess
Card games

3 of 50

Games

In this chapter we cover competitive environments, in which the agents’ goals are in conflict, giving rise to adversarial search problems often known as games.
Mathematical game theory, a branch of economics, views any multiagent environment as a game, provided that the impact of each agent on the others is “significant,” regardless of whether the agents are cooperative or competitive.
In AI, the most common games are of a rather specialized kind what game theorists call deterministic, turn-taking, two-player, zero-sum games of perfect information (such as chess).
In our terminology, this means deterministic, fully observable environments in which two agents act alternately and in which the utility values at the end of the game are always equal and opposite.

4 of 50

Cont…

For example, if one player wins a game of chess, the other player necessarily loses. It is this opposition between the agents’ utility functions that makes the situation adversarial.
Games have engaged the intellectual faculties of humans—sometimes to an alarming degree for as long as civilization has existed.
For AI researchers, the abstract nature of games makes them an appealing subject for study. The state of a game is easy to represent, and agents are usually restricted to a small number of actions whose outcomes are defined by precise rules.
Physical games, such as croquet and ice hockey, have much more complicated descriptions, a much larger range of possible actions, and rather imprecise rules defining the legality of actions.
With the exception of robot soccer, these physical games have notattracted much interest in the AI community.

5 of 50

Cont…

We begin with a definition of the optimal move and an algorithm for finding it. We then look at techniques for choosing a good move when time is limited.
Pruning allows us to ignore portions of the search tree that make no difference to the final choice, and heuristic evaluation functions allow us to approximate the true utility of a state without doing a complete search.
Games such as backgammon that include an element of chance; we also discuss bridge, which includes elements of imperfect information because not all cards are visible to each player.
Finally, we look at how state-of-the-art game-playing programs fare against human opposition and at directions for future developments.
We first consider games with two players, whom we call MAX and MIN for reasons that will soon become obvious. MAX moves first, and then they take turns moving until the game is over. At the end of the game, points are awarded to the winning player and penalties are given to the loser.

6 of 50

Cont…

A game can be formally defined as a kind of search problem with the following elements:

7 of 50

Cont…

The initial state, ACTIONS function, and RESULT function define the game tree for the game—a tree where the nodes are game states and the edges are moves. Figure 5.1 shows part of the game tree for tic-tac-toe (noughts and crosses).

9 of 50

OPTIMAL DECISIONS IN GAMES

In a normal search problem, the optimal solution would be a sequence of actions leading to a goal state a terminal state that is a win.
In adversarial search, MIN has something to say about it. MAX therefore must find a contingent strategy, which specifies MAX’s move in the initial state, then MAX’s moves in the states resulting from every possible response by MIN, then MAX’s moves in the states resulting from every possible response by MIN to those moves, and so on.
This is exactly analogous to the AND–OR search algorithm with MAX playing the role of OR and MIN equivalent to AND. Roughly speaking, an optimal strategy leads to outcomes at least as good as any other strategy when one is playing an infallible opponent. We begin by showing how to find this optimal strategy.

10 of 50

Cont…

Even a simple game like tic-tac-toe is too complex for us to draw the entire game tree on one page, so we will switch to the trivial game in Figure 5.2.
The possible moves for MAX at the root node are labeled a1, a2, and a3. The possible replies to a1 for MIN are b1, b2, b3, and so on. This particular game ends after one move each by MAX and MIN.
(In game parlance, we say that this tree is one move deep, consisting of two half-moves, each of which is called a ply.) The utilities of PLY the terminal states in this game range from 2 to 14.

13 of 50

Cont…

Let us apply these definitions to the game tree in Figure 5.2. The terminal nodes on the bottom level get their utility values from the game’s UTILITY function.
The first MIN node, labeled B, has three successor states with values 3, 12, and 8, so its minimax value is 3. Similarly, the other two MIN nodes have minimax value 2.
The root node is a MAX node; its successor states have minimax values 3, 2, and 2; so it has a minimax value of 3. We can also identify the minimax decision at the root: action a1 is the optimal choice for MAX because it leads to the state with the highest minimax value.
This definition of optimal play for MAX assumes that MIN also plays optimally it maximizes the worst-case outcome for MAX. What if MIN does not play optimally? Then it is easy to show (Exercise 5.7) that MAX will do even better. Other strategies against suboptimal opponents may do better than the minimax strategy, but these strategies necessarily do worse against optimal opponents.

14 of 50

The minimax algorithm

The minimax algorithm (Figure 5.3) computes the minimax decision from the current state.
It uses a simple recursive computation of the minimax values of each successor state, directly implementing the defining equations.
The recursion proceeds all the way down to the leaves of the tree, and then the minimax values are backed up through the tree as the recursion unwinds.
For example, in Figure 5.2, the algorithm first recurses down to the three bottomleft nodes and uses the UTILITY function on them to discover that their values are 3, 12, and 8, respectively. Then it takes the minimum of these values, 3, and returns it as the backedup value of node B.
A similar process gives the backed-up values of 2 for C and 2 for D.
Finally, we take the maximum of 3, 2, and 2 to get the backed-up value of 3 for the root node.

17 of 50

Optimal decisions in multiplayer games

Many popular games allow more than two players. Let us examine how to extend the minimax idea to multiplayer games. This is straightforward from the technical viewpoint, but raises some interesting new conceptual issues.
First, we need to replace the single value for each node with a vector of values. For example, in a three-player game with players A, B, and C, a vector vA, vB, vC is associated with each node.
For terminal states, this vector gives the utility of the state from each player’s viewpoint. (In two-player, zero-sum games, the two-element vector can be reduced to a single value because the values are always opposite.)
The simplest way to implement this is to have the UTILITY function return a vector of utilities.

19 of 50

Cont…

Multiplayer games usually involve alliances, whether formal or informal, among the players. Alliances are made and broken as the game proceeds.
How are we to understand such behavior? Are alliances a natural consequence of optimal strategies for each player in a multiplayer game? It turns out that they can be.
For example, suppose A and B are in weak positions and C is in a stronger position. Then it is often optimal for both A and B to attack C rather than each other, lest C destroy each of them individually. In this way, collaboration emerges from purely selfish behavior.
Of course, as soon as C weakens under the joint onslaught, the alliance loses its value, and either A or B could violate the agreement. In some cases, explicit alliances merely make concrete what would have happened anyway.
In other cases, a social stigma attaches to breaking an alliance, so players must balance the immediate advantage of breaking an alliance against the long-term disadvantage of being perceived as untrustworthy.

20 of 50

ALPHA–BETA PRUNING

Consider again the two-ply game tree from Figure 5.2. Let’s go through the calculation of the optimal decision once more, this time paying careful attention to what we know at each point in the process. The steps are explained in Figure 5.5.
The outcome is that we can identify the minimax decision without ever evaluating two of the leaf nodes.
Another way to look at this is as a simplification of the formula for MINIMAX. Let the two unevaluated successors of node C in Figure 5.5 have values x and y. Then the value of the root node is given by

26 of 50

Move ordering

The effectiveness of alpha–beta pruning is highly dependent on the order in which the states are examined.
For example, in Figure 5.5(e) and (f), we could not prune any successors of D at all because the worst successors (from the point of view of MIN) were generated first.
If the third successor of D had been generated first, we would have been able to prune the other two.
This suggests that it might be worthwhile to try to examine first the successors that are likely to be best.

27 of 50

IMPERFECT REAL-TIME DECISIONS

The minimax algorithm generates the entire game search space, whereas the alpha–beta algorithm allows us to prune large parts of it. However, alpha–beta still has to search all the way to terminal states for at least a portion of the search space.
This depth is usually not practical, because moves must be made in a reasonable amount of time—typically a few minutes at most. Claude Shannon’s paper Programming a Computer for Playing Chess (1950) proposed instead that programs should cut off the search earlier and apply a heuristic evaluation function to states in the search, effectively turning nonterminal nodes into terminal leaves.
In other words, the suggestion is to alter minimax or alpha–beta in two ways: replace the utility function by a heuristic evaluation function EVAL, which estimates the position’s utility, and replace the terminal test by a cutoff test that decides when to apply EVAL. That gives us the following for heuristic minimax for state s and maximum depth d:

29 of 50

Evaluation functions

An evaluation function returns an estimate of the expected utility of the game from a given position. The idea of an estimator was not new when Shannon proposed it.
For centuries, chess players (and aficionados of other games) have developed ways of judging the value of a position because humans are even more limited in the amount of search they can do than are computer programs.
It should be clear that the performance of a game-playing program depends strongly on the quality of its evaluation function.
An inaccurate evaluation function will guide an agent toward positions that turn out to be lost.
How exactly do we design good evaluation functions?

30 of 50

Cont…

First, the evaluation function should order the terminal states in the same way as the true utility function: states that are wins must evaluate better than draws, which in turn must be better than losses.
Otherwise, an agent using the evaluation function might err even if it can see ahead all the way to the end of the game.
Second, the computation must not take too long! (The whole point is to search faster.) Third, for nonterminal states, the evaluation function should be strongly correlated with the actual chances of winning.

31 of 50

Cont…

One might well wonder about the phrase “chances of winning.” After all, chess is not a game of chance: we know the current state with certainty, and no dice are involved.
But if the search must be cut off at nonterminal states, then the algorithm will necessarily be uncertain about the final outcomes of those states.
This type of uncertainty is induced by computational, rather than informational, limitations.
Given the limited amount of computation that the evaluation function is allowed to do for a given state, the best it can do is make a guess about the final outcome.

32 of 50

Cont…

Let us make this idea more concrete. Most evaluation functions work by calculating various features of the state for example, in chess, we would have features for the number of white pawns, black pawns, white queens, black queens, and so on.
The features, taken together, define various categories or equivalence classes of states: the states in each category have the same values for all the features.
For example, one category contains all two-pawn vs. one-pawn endgames.
Any given category, generally speaking, will contain some states that lead to wins, some that lead to draws, and some that lead to losses.
The evaluation function cannot know which states are which, but it can return a single value that reflects the proportion of states with each outcome.
For example, suppose our experience suggests that 72% of the states encountered in the two-pawns vs. one-pawn category lead to a win (utility
+1); 20% to a loss (0), and 8% to a draw (1/2).

33 of 50

Cont…

Then a reasonable evaluation for states in the category is the expected value:(0.72 × +1) + (0.20 × 0) + (0.08 × 1/2) = 0.76.
In principle, the expected value can be determined for each category, resulting in an evaluation function that works for any state. As with terminal states, the evaluation function need not return actual expected values as long as the ordering of the states is the same.
In practice, this kind of analysis requires too many categories and hence too much experience to estimate all the probabilities of winning. Instead, most evaluation functions compute separate numerical contributions from each feature and then combine them to find the total value.
For example, introductory chess books give an approximate material value
for each piece: each pawn is worth 1, a knight or bishop is worth 3, a rook 5, and the queen 9.
Other features such as “good pawn structure” and “king safety” might be worth half a pawn, say. These feature values are then simply added up to obtain the evaluation of the position.

34 of 50

Cont…

A secure advantage equivalent to a pawn gives a substantial likelihood of winning, and a secure advantage equivalent to three pawns should give almost certain victory, as illustrated in Figure 5.8(a). Mathematically, this kind of evaluation function is called a weighted linear function because it can be expressed as

36 of 50

Cutting off search

The next step is to modify ALPHA-BETA-SEARCH so that it will call the heuristic EVAL function when it is appropriate to cut off the search. We replace the two lines in Figure 5.7 that mention TERMINAL-TEST with the following line:

We also must arrange for some bookkeeping so that the current depth is incremented on each recursive call.
The most straightforward approach to controlling the amount of search is to set a fixed depth limit so that CUTOFF-TEST(state, depth) returns true for all depth greater than some fixed depth d.
It must also return true for all terminal states, just as TERMINAL-TEST
did.
The depth d is chosen so that a move is selected within the allocated time. A more robust approach is to apply iterative deepening. (See Chapter 3.)

37 of 50

Cont…

When time runs out, the program returns the move selected by the deepest completed search. As a bonus, iterative deepening also helps with move ordering.
These simple approaches can lead to errors due to the approximate nature of the evaluation function. Consider again the simple evaluation function for chess based on material advantage.
Suppose the program searches to the depth limit, reaching the position in Figure 5.8(b), where Black is ahead by a knight and two pawns.
It would report this as the heuristic value of the state, thereby declaring that the state is a probable win by Black.
But White’s next move captures Black’s queen with no compensation. Hence, the position is really won for White, but this can be seen only by looking ahead one more ply.

38 of 50

Cont…

Obviously, a more sophisticated cutoff test is needed. The evaluation function should be applied only to positions that are quiescent that is, unlikely to exhibit wild swings in value in the near future. In chess, for example, positions in which favorable captures can be made are not quiescent for an evaluation function that just counts material.
Nonquiescent positions can be expanded further until quiescent positions are reached. This extra search is called a quiescence search; sometimes it is restricted to consider only certain types of moves, such as capture moves, that will quickly resolve the uncertainties in the position.

39 of 50

Cont…

The horizon effect is more difficult to eliminate. It arises when the program is facing an opponent’s move that causes serious damage and is ultimately unavoidable, but can be temporarily avoided by delaying tactics. Consider the chess game in Figure 5.9. It is clear that there is no way for the black bishop to escape.

1 of 50

2 of 50

3 of 50

4 of 50

5 of 50

6 of 50

7 of 50

8 of 50

9 of 50

10 of 50

11 of 50

12 of 50

13 of 50

14 of 50

15 of 50

16 of 50

17 of 50

18 of 50

19 of 50

20 of 50

21 of 50

22 of 50

23 of 50

24 of 50

25 of 50

26 of 50

27 of 50

28 of 50

29 of 50

30 of 50

31 of 50

32 of 50

33 of 50

34 of 50

35 of 50

36 of 50

37 of 50

38 of 50

39 of 50

40 of 50

41 of 50

42 of 50

43 of 50

44 of 50

45 of 50

46 of 50

47 of 50

48 of 50

49 of 50

50 of 50