{"id":637,"date":"2008-05-05T10:09:36","date_gmt":"2008-05-05T10:09:36","guid":{"rendered":"http:\/\/scientopia.org\/blogs\/goodmath\/2008\/05\/05\/iterated-zero-sum-games\/"},"modified":"2008-05-05T10:09:36","modified_gmt":"2008-05-05T10:09:36","slug":"iterated-zero-sum-games","status":"publish","type":"post","link":"http:\/\/www.goodmath.org\/blog\/2008\/05\/05\/iterated-zero-sum-games\/","title":{"rendered":"Iterated Zero-Sum Games"},"content":{"rendered":"<p> The games that we&#8217;ve looked at so far are the simplest case of<br \/>\nbasic games. In these games, we&#8217;ve got a payoff matrix, and both players<br \/>\ncan see the whole matrix &#8211; the players have equal information,<br \/>\nand nothing is secret. The players move simultaneously &#8211; so neither player can wait to see what his opponent does before making his own decision. Finally, the game is played exactly once &#8211; there are no repetitions.<\/p>\n<p> The first complication we can add is to make it an <em>iterative game<\/em> &#8211; that is, instead of each player going once, they&#8217;ll repeat the game 10 times in sequence. Everything else stays the same: perfect, equal information, simultaneous moves, etc.<\/p>\n<p> This creates an interesting added dimension. Now, we&#8217;ve got two layers<br \/>\nof strategy: each iteration, each player selects a strategy; and then there&#8217;s an over-arching strategy that they use to select their strategy each iteration. That over-arching strategy is called a <em>grand strategy<\/em><\/p>\n<p> In an iterated game, the optimal solution for players can be different than in the single iteration. The goal is to maximize your total utility at the end of a series of iterations. It&#8217;s OK if some iterations result in a loss of utility, so long as by the end of the series of iterations, the final<br \/>\nsum of the utility of the series is maximal.<\/p>\n<p><!--more--><\/p>\n<p> For example, look at the following game. (This example is copied<br \/>\nfrom the textbook <a href=\"http:\/\/www.amazon.com\/gp\/product\/0070703965?ie=UTF8&amp;tag=goodmathbadma-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0070703965\">The Compleat Strategyst (Complete Strategist): Being A Primer On The Theory Of Games Of Strategy<\/a><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/www.assoc-amazon.com\/e\/ir?t=goodmathbadma-20&amp;l=as2&amp;o=1&amp;a=0070703965\" width=\"1\" height=\"1\" border=\"0\" alt=\"\" style=\"border:none !important;margin:0px !important\" \/>.)  <\/p>\n<table border=\"1\">\n<tr>\n<th><\/th>\n<th>A<\/th>\n<th>B<\/th>\n<\/tr>\n<tr>\n<th>1<\/th>\n<td>3<\/td>\n<td>6<\/td>\n<\/tr>\n<tr>\n<th>2<\/th>\n<td>5<\/td>\n<td>4<\/td>\n<\/tr>\n<\/table>\n<p> Player 1 needs to choose either 1 or 2; player 2 needs to choose A or B. For simplicity, this is set up so that player 2 always pays player 1, so the utility values in the game grid are written as payoffs to player 1. <em>(Originally, I didn&#8217;t include an explanation of the game grid payoffs. This confused a lot of readers. Sorry!)<\/em><\/p>\n<p> There&#8217;s no saddle point here. The maximin for player 1 is 4 (at 2,B); the<br \/>\nminimax for player 2 is 3 (at 1,A). So the simple solution formula from before<br \/>\ndoesn&#8217;t work. If we&#8217;re only playing the game once then there&#8217;s no way for<br \/>\neither player to create an idea outcome for themselves. Look at it from the<br \/>\nperspective of player 1. If player 2 picks strategy A, then the best thing for<br \/>\nplayer 2 to do is to pick strategy 2. But if player 2 picks B, then player 1<br \/>\nis better off picking A. Since the players move simultaneously, player<br \/>\n1 cannot pick the best strategy.<\/p>\n<p> If the game is played once, then there&#8217;s no way to play optimally. It&#8217;s<br \/>\nbasically random. But if you <em>repeat<\/em> the game multiple times &#8211; that<br \/>\nis, you play it as an iterated sequence of games &#8211; then you can find an<br \/>\noptimal strategy for selecting strategies. That&#8217;s called a <em>grand<br \/>\nstrategy<\/em>: in a game where you make multiple moves, each move is a<br \/>\nstrategy, and the process of selecting strategies for each move is a<br \/>\n<em>grand strategy<\/em>. <\/p>\n<p> There is a successful grand strategy in this kind of iterated<br \/>\nzero-sum game. What&#8217;s interesting about it is that our idea of<br \/>\nstrategy is not really what works as a successful grand strategy: the successful grand strategy is <em>random<\/em>. <\/p>\n<p> After all: if player 1 can figure out player 2&#8217;s grand strategy,<br \/>\nthen he can easily pick an optimal strategy. So the best grand strategy<br \/>\nis one that the other player <em>can&#8217;t<\/em> figure out. If you&#8217;re playing randomly, then there&#8217;s no way that the other player can predict<br \/>\nwhat strategy you choose. <\/p>\n<p> So, let&#8217;s look at an example of a series of iterations, with player 1 selecting randomly, with a uniform probability distribution, and player<br \/>\n2 playing minimax:<\/p>\n<ul>\n<li> Iteration one: Player one selects 1, player two selects A. Player one takes 3.<\/li>\n<li> Iteration two: Player one selects 2, player two selects A. Player one<br \/>\nwins 5.<\/li>\n<\/ul>\n<p> It will continue like that. Player one will win 3 half the time, and 5<br \/>\nthe other half &#8211; for an average of 4 &#8211; which is his maximin value. So<br \/>\nplaying this way is at least as good for player 1 as maximin. <\/p>\n<p> This is a pretty trivial game: for player 2, he&#8217;s guaranteed to<br \/>\nlose, and the only question is how quickly he&#8217;s going to lose. And<br \/>\nthere&#8217;s not any strategy that&#8217;s going to really improve things much for player 2. There&#8217;s not much that 2 can do: playing strategy B is pretty much a lose for<br \/>\nhim. So why would he ever do it?<\/p>\n<p> If player 1 <em>knows<\/em> that player 2 is always going to play A, then he<br \/>\ncould play strategy 2, and always win 5. So player two is motivated to<br \/>\nplay B just often enough to make player one try 1 once in a while. The<br \/>\nkey is, how often should he do that? What&#8217;s the best distribution for him,<br \/>\nto minimize his losses?<\/p>\n<p> As you can see from the example, the key to finding the optimal strategy is<br \/>\nprobability. You need to assign a probability distribution to the different<br \/>\nstrategies. Each iteration, you pick a strategy randomly, according to the<br \/>\ndistribution. If you can find the right probability distribution for your grand strategy, then you can optimize your winnings.<\/p>\n<p> But is there always an optimal distribution? And how can you find it?<\/p>\n<p> There <em>is<\/em> always an optimum. John vonNeumann (who managed to have his<br \/>\nfingers in an astonishing number of pies) proved that for every two player,<br \/>\nsimultaneous move zero-sum iterated game, there is an optimal grand strategy based<br \/>\non a probability distribution. And it&#8217;s computable. It took a while after<br \/>\nJvN to figure out the fast way of doing it &#8211; but it&#8217;s doable, and it&#8217;s doable<br \/>\nquickly, through a technique called linear programming.<\/p>\n<p><p> Linear programming is a fascinating topic in itself. And it&#8217;s complicated<br \/>\nenough that it deserves a post of its own. (Or, more likely, several.)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The games that we&#8217;ve looked at so far are the simplest case of basic games. In these games, we&#8217;ve got a payoff matrix, and both players can see the whole matrix &#8211; the players have equal information, and nothing is secret. The players move simultaneously &#8211; so neither player can wait to see what his [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[88],"tags":[],"class_list":["post-637","post","type-post","status-publish","format-standard","hentry","category-game-theory"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4lzZS-ah","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/637","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/comments?post=637"}],"version-history":[{"count":0,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/637\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/media?parent=637"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/categories?post=637"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/tags?post=637"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}