When I last wrote about game theory, we were working up to how to find
the general solution to an iterated two-player zero-sum game. Since it’s been
a while, I’ll take a moment and refresh your memory a bit.
A zero-sum game is described as a matrix. One player picks a row, and one player picks a column. Those selections are called strategies. The intersection
of the strategies in the game matrix describes what one player pays to the other. The
matrix is generally written from the point of view of one player. So if we call
our two players A and B, and the matrix is written from the viewpoint of A, then
an entry of $50 means that B has to pay $50 to A; an entry of $-50 means that A has to pay $50 to B.
In an iterated game, you’re not going to just play once – you’re going to play it repeatedly. To maximize your winnings, you may want to change strategies. It turns out that the optimal strategy is probabilistic – you’ll assign probabilities to your
strategy choices, and use that probability assignment to randomly select your
strategy each iteration. That probability assignment that dictates how you’ll
choose a strategy each iteration is called a grand strategy.
Jon Von Neumann proved that for any two-player zero-sum game, there is an optimal grand strategy based on probability. In a game where both players know exactly what the payoffs/penalties are, the best strategy is based on constrained randomness – because any deliberate system for choosing strategies can be cracked by your opponent, resulting in his countering you. The best outcome comes from assessing potential wins and losses, and developing a probabilistic scheme for optimizing the way that you play.
Once you make that fundamental leap, realizing that it’s a matter
of probability, it’s not particularly difficult to find the grand strategy: it’s just a simple optimization task, solveable via linear programming. The solution is very elegant: once you see it, once you see how to
formulate the game as a linear optimization process, it just seems completely obvious.
The basic idea is: take the game matrix. Work out the expected payoff for each
strategy for one player in terms of an unknown probability assignment. Then, using linear programming, optimize for the maximum value of the minimum expected payoff,
subject to constraints that basically describe a probability assignment.
As usual, that sounds terribly confusing until you see an example.
Let’s get specific. Here’s a game. We’ll call the two players H (for the player who picks a strategy that’s a horizontal line across the graph), and V. The grid is set up from the viewpoint of player H: the entry in a position is what V needs to pay to H if that pair of strategies is selected:
Then the linear programming problem works out as follows:
- Let E(H1) = 3p1 + 1p2 + 3p3
- Let E(H2) = 2p1 + 4p2 + 5p3
- Let E(H3) = 3p1 + 6p2 + 1p3
Maximize: min(E(H1), E(H2), E(H3)), where Σpi=1,
and ∀i: 0≤pi≤1.
When we run that through our linear programming tool, we get the following probability assignment (rounded to two figures): p1=0.57, p2=0.17, p3=0.26.
If we transpose, and look at things from the perspective of V, we get that the
optimal strategy is (0.70, 0.09, 0.22).
That example is an unfair game, because V always pays H. But the
same basic construction of the solution will work. It’s implicit in the names of
the basic strategies: maximin; maximize the minimum. It’s just that now, instead
of maximizing the minimum of a single strategy selection, we’re maximizing the minimum of the expected outcomes based on the probability assignments in the grand strategy.
There’s one more question about the game: what’s the expected payoff? In game theory terms, we call that the value of the game: the value of the game to a player is the amount that the player will win/lose if both they and their opponent play optimally. It’s very easy to figure: the probability assignments work out so that the expectation is the same for all three strategies. So to find the value for H, you pick one strategy for V, and work out the expectation for H given the probabilities in his grand strategy.
So, suppose that player V chose strategy 1. Then the expectation for H would be:
0.57*3 + 0.17*2 + 0.26*3 = 2.83. So the value of the game to player H is 2.83. If you work it out from the transpose matrix to get the value of the game to player V, it works out to -2.83.
When you see this, it’s pretty obvious that it works. In fact, most people see it,
and react, roughly, “Of course! Maximize the minimum!” It seems so simple! The real fundamental achievement by Jon Von Neumann wasn’t figuring out how to describe the probabilistic grand strategies as an optimization problem – that’s actually easy. The real key was realizing that the optimal strategy was based on probability – that the whole key to optimizing a zero-sum game lies in randomness constrained by a probability distribution.