The games that we’ve looked at so far are the simplest case of
basic games. In these games, we’ve got a payoff matrix, and both players
can see the whole matrix – the players have equal information,
and nothing is secret. The players move simultaneously – so neither player can wait to see what his opponent does before making his own decision. Finally, the game is played exactly once – there are no repetitions.
The first complication we can add is to make it an iterative game – that is, instead of each player going once, they’ll repeat the game 10 times in sequence. Everything else stays the same: perfect, equal information, simultaneous moves, etc.
This creates an interesting added dimension. Now, we’ve got two layers
of strategy: each iteration, each player selects a strategy; and then there’s an over-arching strategy that they use to select their strategy each iteration. That over-arching strategy is called a grand strategy
In an iterated game, the optimal solution for players can be different than in the single iteration. The goal is to maximize your total utility at the end of a series of iterations. It’s OK if some iterations result in a loss of utility, so long as by the end of the series of iterations, the final
sum of the utility of the series is maximal.
For example, look at the following game. (This example is copied
from the textbook The Compleat Strategyst (Complete Strategist): Being A Primer On The Theory Of Games Of Strategy.)
Player 1 needs to choose either 1 or 2; player 2 needs to choose A or B. For simplicity, this is set up so that player 2 always pays player 1, so the utility values in the game grid are written as payoffs to player 1. (Originally, I didn’t include an explanation of the game grid payoffs. This confused a lot of readers. Sorry!)
There’s no saddle point here. The maximin for player 1 is 4 (at 2,B); the
minimax for player 2 is 3 (at 1,A). So the simple solution formula from before
doesn’t work. If we’re only playing the game once then there’s no way for
either player to create an idea outcome for themselves. Look at it from the
perspective of player 1. If player 2 picks strategy A, then the best thing for
player 2 to do is to pick strategy 2. But if player 2 picks B, then player 1
is better off picking A. Since the players move simultaneously, player
1 cannot pick the best strategy.
If the game is played once, then there’s no way to play optimally. It’s
basically random. But if you repeat the game multiple times – that
is, you play it as an iterated sequence of games – then you can find an
optimal strategy for selecting strategies. That’s called a grand
strategy: in a game where you make multiple moves, each move is a
strategy, and the process of selecting strategies for each move is a
There is a successful grand strategy in this kind of iterated
zero-sum game. What’s interesting about it is that our idea of
strategy is not really what works as a successful grand strategy: the successful grand strategy is random.
After all: if player 1 can figure out player 2’s grand strategy,
then he can easily pick an optimal strategy. So the best grand strategy
is one that the other player can’t figure out. If you’re playing randomly, then there’s no way that the other player can predict
what strategy you choose.
So, let’s look at an example of a series of iterations, with player 1 selecting randomly, with a uniform probability distribution, and player
2 playing minimax:
- Iteration one: Player one selects 1, player two selects A. Player one takes 3.
- Iteration two: Player one selects 2, player two selects A. Player one
It will continue like that. Player one will win 3 half the time, and 5
the other half – for an average of 4 – which is his maximin value. So
playing this way is at least as good for player 1 as maximin.
This is a pretty trivial game: for player 2, he’s guaranteed to
lose, and the only question is how quickly he’s going to lose. And
there’s not any strategy that’s going to really improve things much for player 2. There’s not much that 2 can do: playing strategy B is pretty much a lose for
him. So why would he ever do it?
If player 1 knows that player 2 is always going to play A, then he
could play strategy 2, and always win 5. So player two is motivated to
play B just often enough to make player one try 1 once in a while. The
key is, how often should he do that? What’s the best distribution for him,
to minimize his losses?
As you can see from the example, the key to finding the optimal strategy is
probability. You need to assign a probability distribution to the different
strategies. Each iteration, you pick a strategy randomly, according to the
distribution. If you can find the right probability distribution for your grand strategy, then you can optimize your winnings.
But is there always an optimal distribution? And how can you find it?
There is always an optimum. John vonNeumann (who managed to have his
fingers in an astonishing number of pies) proved that for every two player,
simultaneous move zero-sum iterated game, there is an optimal grand strategy based
on a probability distribution. And it’s computable. It took a while after
JvN to figure out the fast way of doing it – but it’s doable, and it’s doable
quickly, through a technique called linear programming.
Linear programming is a fascinating topic in itself. And it’s complicated
enough that it deserves a post of its own. (Or, more likely, several.)