{"id":2256,"date":"2013-11-18T10:00:08","date_gmt":"2013-11-18T15:00:08","guid":{"rendered":"http:\/\/scientopia.org\/blogs\/goodmath\/?p=2256"},"modified":"2016-07-31T16:02:42","modified_gmt":"2016-07-31T20:02:42","slug":"the-birthday-paradox","status":"publish","type":"post","link":"http:\/\/www.goodmath.org\/blog\/2013\/11\/18\/the-birthday-paradox\/","title":{"rendered":"The Birthday Paradox"},"content":{"rendered":"<p> To me, the thing that makes probability fun is that the results are frequently surprising. We&#8217;ve got very strong instincts about how we expect numbers to work. But when you do anything that involves a lot of computations with big numbers, our intuition goes out the window &#8211; nothing works the way we expect it to. A great example of that is something called the birthday paradox.<\/p>\n<p> Suppose you&#8217;ve got a classroom full of people. What&#8217;s the probability that there are two people with the same birthday? Intuitively, most people expect that it&#8217;s pretty unlikely. It seems like it shouldn&#8217;t be likely &#8211; 365 possible birthdays, and 20 or 30 people in a classroom? Very little chance, right?<\/p>\n<p> Let&#8217;s look at it, and figure out how to compute the probability.<\/p>\n<p> Interesting probability problems are all about finding out how to put things together. You&#8217;re looking at things where there are huge numbers of possible outcomes, and you want to determine the odds of a specific <em>class<\/em> of outcomes. Finding the solutions is all about figuring out how to structure the problem.<\/p>\n<p> A great example of this is something called the birthday paradox. This is a problem with a somewhat surprising outcome. It&#8217;s also a problem where finding the right way to structure the problem is has a dramatic result.<\/p>\n<p> Here&#8217;s the problem: you&#8217;ve got a group of 30 people. What&#8217;s the probability that two people out of that group of thirty have the same birthday?<\/p>\n<p> We&#8217;ll look at it with some simplifying assumptions. We&#8217;ll ignore leap year &#8211; so we&#8217;ve got 365 possible birthdays. We&#8217;ll assume that all birthdays are equally likely &#8211; no variation for weekdays\/weekends, no variation for seasons, and no holidays, etc. Just 365 equally probable days.<\/p>\n<p> How big is the space? That is, how many different ways are there to assign birthdays to 30 people? It&#8217;s 365<sup>30<\/sup> or something in the vicinity of 7.4*10<sup>76<\/sup>.<\/p>\n<p> To start off, we&#8217;ll reverse the problem. It&#8217;s easier to structure the problem if we try to ask &#8220;What&#8217;s the probability that no two people share a birthday&#8221;. If P(B) is the probability that no two people share a birthday, then 1-P(B) is the probability that at least two people share a birthday.<\/p>\n<p> So let&#8217;s look at a couple of easy cases. Suppose we&#8217;ve got two people? What&#8217;s the odds that they&#8217;ve got the same birthday? 1 in 365: there are 365<sup>2<\/sup> possible pairs of birthdays; there are 365 possible pairs. So there&#8217;s a probability of 365\/365<sup>2<\/sup> that the two people have the same birthday. For just two people, it&#8217;s pretty easy. In the reverse form, there&#8217;s a 364\/365 chance that the two people have different birthdays.<\/p>\n<p> What about 3 people? It&#8217;s the probability of the first two having different birthdays, <em>and<\/em> the probability of the third person having a different birthday that either of those first two. There are 365 possible birthdays for the third person, and 363 possible days that don&#8217;t overlay with the first two. So for N people, the probability of having distinct birthdays is <img src='http:\/\/l.wordpress.com\/latex.php?latex=1%20%5Ctimes%20%281%20-%201%2F365%29%20%5Ctimes%20%281%20-%202%2F365%29%20%5Ctimes%20%5Cdots%20%281%20-%20%28n%2F365%29%29&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='1 \\times (1 - 1\/365) \\times (1 - 2\/365) \\times \\dots (1 - (n\/365))' style='vertical-align:1%' class='tex' alt='1 \\times (1 - 1\/365) \\times (1 - 2\/365) \\times \\dots (1 - (n\/365))' \/>.<\/p>\n<p> At this point, we&#8217;ve got a nice recursive definition. Let&#8217;s say that <img src='http:\/\/l.wordpress.com\/latex.php?latex=f%28N%29&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='f(N)' style='vertical-align:1%' class='tex' alt='f(N)' \/> is the probability of <img src='http:\/\/l.wordpress.com\/latex.php?latex=N&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='N' style='vertical-align:1%' class='tex' alt='N' \/> people having distinct birthdays. Then: <\/p>\n<ol>\n<li> For 2 people, the probability of distinct birthdays is 364\/365. (<img src='http:\/\/l.wordpress.com\/latex.php?latex=f%282%29%20%3D%20%5Cfrac%7B364%7D%7B365%7D&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='f(2) = \\frac{364}{365}' style='vertical-align:1%' class='tex' alt='f(2) = \\frac{364}{365}' \/>)<\/li>\n<li> For N>2 people, the probability of distinct birthdays is<br \/>\n<img src='http:\/\/l.wordpress.com\/latex.php?latex=%5Cfrac%7B365-%28N-1%29%7D%7B365%7D%20times%20f%28n-1%29&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='\\frac{365-(N-1)}{365} times f(n-1)' style='vertical-align:1%' class='tex' alt='\\frac{365-(N-1)}{365} times f(n-1)' \/>. <\/li>\n<\/ol>\n<p> Convert that to a closed form, and you get: <img src='http:\/\/l.wordpress.com\/latex.php?latex=f%28n%29%20%3D%20%5Cfrac%7B365%21%7D%7B%28365-%28n-1%29%29%21365%5En%7D&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='f(n) = \\frac{365!}{(365-(n-1))!365^n}' style='vertical-align:1%' class='tex' alt='f(n) = \\frac{365!}{(365-(n-1))!365^n}' \/>. For 30 people, that&#8217;s<br \/>\n<img src='http:\/\/l.wordpress.com\/latex.php?latex=%5Cfrac%7B365%21%7D%7B%28365-29%29%21%2A365%5E%7B30%7D%7D&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='\\frac{365!}{(365-29)!*365^{30}}' style='vertical-align:1%' class='tex' alt='\\frac{365!}{(365-29)!*365^{30}}' \/>. Work it out, and that&#8217;s<br \/>\n0.29 &#8211; so the probability of everyone having distinct<br \/>\nbirthdays is 29% &#8211; which means that the probability of at least<br \/>\ntwo people in a group of 30 having the same birthday is 71%!<\/p>\n<p> You can see why our intuitions are so bad? We&#8217;re talking about something where one factor in the computation is the factorial of 365!<\/p>\n<p> Let&#8217;s look a bit further: how many people do you need to have, before there&#8217;s a 50% chance of 2 people sharing a birthday? Use the formulae we wrote up above, and it turns out to be 23. Here&#8217;s the numbers &#8211; remember that this is the reverse probability, the probability of all birthdays being distinct. <\/p>\n<table border=\"1\">\n<tr>\n<td>1<\/td>\n<td>1<\/dt>\n<\/tr>\n<tr>\n<td>2<\/td>\n<td>0.997260273973<\/dt>\n<\/tr>\n<tr>\n<td>3<\/td>\n<td>0.991795834115<\/dt>\n<\/tr>\n<tr>\n<td>4<\/td>\n<td>0.983644087533<\/dt>\n<\/tr>\n<tr>\n<td>5<\/td>\n<td>0.9728644263<\/dt>\n<\/tr>\n<tr>\n<td>6<\/td>\n<td>0.959537516351<\/dt>\n<\/tr>\n<tr>\n<td>7<\/td>\n<td>0.943764296904<\/dt>\n<\/tr>\n<tr>\n<td>8<\/td>\n<td>0.925664707648<\/dt>\n<\/tr>\n<tr>\n<td>9<\/td>\n<td>0.905376166111<\/dt>\n<\/tr>\n<tr>\n<td>10<\/td>\n<td>0.883051822289<\/dt>\n<\/tr>\n<tr>\n<td>11<\/td>\n<td>0.858858621678<\/dt>\n<\/tr>\n<tr>\n<td>12<\/td>\n<td>0.832975211162<\/dt>\n<\/tr>\n<tr>\n<td>13<\/td>\n<td>0.805589724768<\/dt>\n<\/tr>\n<tr>\n<td>14<\/td>\n<td>0.776897487995<\/dt>\n<\/tr>\n<tr>\n<td>15<\/td>\n<td>0.747098680236<\/dt>\n<\/tr>\n<tr>\n<td>16<\/td>\n<td>0.716395994747<\/dt>\n<\/tr>\n<tr>\n<td>17<\/td>\n<td>0.684992334703<\/dt>\n<\/tr>\n<tr>\n<td>18<\/td>\n<td>0.653088582128<\/dt>\n<\/tr>\n<tr>\n<td>19<\/td>\n<td>0.620881473968<\/dt>\n<\/tr>\n<tr>\n<td>20<\/td>\n<td>0.588561616419<\/dt>\n<\/tr>\n<tr>\n<td>21<\/td>\n<td>0.556311664835<\/dt>\n<\/tr>\n<tr>\n<td>22<\/td>\n<td>0.524304692337<\/dt>\n<\/tr>\n<tr>\n<td>23<\/td>\n<td>0.492702765676<\/dt>\n<\/tr>\n<tr>\n<td>24<\/td>\n<td>0.461655742085<\/dt>\n<\/tr>\n<tr>\n<td>25<\/td>\n<td>0.431300296031<\/dt>\n<\/tr>\n<tr>\n<td>26<\/td>\n<td>0.401759179864<\/dt>\n<\/tr>\n<tr>\n<td>27<\/td>\n<td>0.373140717737<\/dt>\n<\/tr>\n<tr>\n<td>28<\/td>\n<td>0.345538527658<\/dt>\n<\/tr>\n<tr>\n<td>29<\/td>\n<td>0.319031462522<\/dt>\n<\/tr>\n<tr>\n<td>30<\/td>\n<td>0.293683757281<\/dt>\n<\/tr>\n<tr>\n<td>31<\/td>\n<td>0.269545366271<\/dt>\n<\/tr>\n<tr>\n<td>32<\/td>\n<td>0.24665247215<\/dt>\n<\/tr>\n<tr>\n<td>33<\/td>\n<td>0.225028145824<\/dt>\n<\/tr>\n<tr>\n<td>34<\/td>\n<td>0.20468313538<\/dt>\n<\/tr>\n<tr>\n<td>35<\/td>\n<td>0.185616761125<\/dt>\n<\/tr>\n<tr>\n<td>36<\/td>\n<td>0.16781789362<\/dt>\n<\/tr>\n<tr>\n<td>37<\/td>\n<td>0.151265991784<\/dt>\n<\/tr>\n<tr>\n<td>38<\/td>\n<td>0.135932178918<\/dt>\n<\/tr>\n<tr>\n<td>39<\/td>\n<td>0.121780335633<\/dt>\n<\/tr>\n<tr>\n<td>40<\/td>\n<td>0.108768190182<\/dt>\n<\/tr>\n<tr>\n<td>41<\/td>\n<td>0.0968483885183<\/dt>\n<\/tr>\n<tr>\n<td>42<\/td>\n<td>0.0859695284381<\/dt>\n<\/tr>\n<tr>\n<td>43<\/td>\n<td>0.0760771443439<\/dt>\n<\/tr>\n<tr>\n<td>44<\/td>\n<td>0.0671146314486<\/dt>\n<\/tr>\n<tr>\n<td>45<\/td>\n<td>0.0590241005342<\/dt>\n<\/tr>\n<tr>\n<td>46<\/td>\n<td>0.0517471566327<\/dt>\n<\/tr>\n<tr>\n<td>47<\/td>\n<td>0.0452255971667<\/dt>\n<\/tr>\n<tr>\n<td>48<\/td>\n<td>0.0394020271206<\/dt>\n<\/tr>\n<tr>\n<td>49<\/td>\n<td>0.0342203906773<\/dt>\n<\/tr>\n<tr>\n<td>50<\/td>\n<td>0.029626420422<\/dt>\n<\/tr>\n<\/table>\n<p> With just 23 people, there&#8217;s a greater than 50% chance that two people will have the same birthday. By the time you get to just 50 people, there&#8217;s a greater than 97% chance that two people have the same birthday!<\/p>\n<p> As an amusing aside, the first time I saw this problem worked through was in an undergraduate discrete probability theory class, with 37 people in the class, and <em>no<\/em> duplicate birthdays!<\/p>\n<p> Now &#8211; remember at the beginning, I said that the trick to working probability problems is all about how you formulate the problem. There&#8217;s a much, much better way to formulate this.<\/p>\n<p> Think of the assignment of birthdays as a function from people to birthdays: <img src='http:\/\/l.wordpress.com\/latex.php?latex=f%3A%20P%20%5Crightarrow%20B&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='f: P \\rightarrow B' style='vertical-align:1%' class='tex' alt='f: P \\rightarrow B' \/>. The number of ways of assigning birthdays to people is the size of the set of functions from people to birthdays. How many possible functions are there? <img src='http:\/\/l.wordpress.com\/latex.php?latex=%7C%20B%20%7C%20%5E%7B%7C%20P%20%7C%7D&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='| B | ^{| P |}' style='vertical-align:1%' class='tex' alt='| B | ^{| P |}' \/>. <img src='http:\/\/l.wordpress.com\/latex.php?latex=%7C%20B%20%7C&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='| B |' style='vertical-align:1%' class='tex' alt='| B |' \/> is the number of days in the year &#8211; 365, and <img src='http:\/\/l.wordpress.com\/latex.php?latex=%7C%20P%20%7C&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='| P |' style='vertical-align:1%' class='tex' alt='| P |' \/> is the number of people in the group. <\/p>\n<p> The set of assignments to unique birthdays is the number of  <em>injective<\/em> functions. (An injective function is a function where <img src='http:\/\/l.wordpress.com\/latex.php?latex=f%28x%29%20%3D%20f%28y%29%20%5CLeftrightarrow%20x%20%3D%20y&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='f(x) = f(y) \\Leftrightarrow x = y' style='vertical-align:1%' class='tex' alt='f(x) = f(y) \\Leftrightarrow x = y' \/>.) How many injective functions are there? <img src='http:\/\/l.wordpress.com\/latex.php?latex=%5Cfrac%7B%7C%20B%20%7C%21%7D%7B%28%7C%20B%20%7C%20-%20%7C%20P%20%7C%29%21%7D&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='\\frac{| B |!}{(| B | - | P |)!}' style='vertical-align:1%' class='tex' alt='\\frac{| B |!}{(| B | - | P |)!}' \/>. <\/p>\n<p> The probability of all birthdays being unique is the size of the set of injective functions divided by the size of the set of all assignments: <img src='http:\/\/l.wordpress.com\/latex.php?latex=%5Cfrac%7B%5Cfrac%7B%7C%20B%20%7C%21%7D%7B%28%7C%20B%20%7C%20-%20%7C%20P%20%7C%29%21%7D%7D%7B%20%7C%20B%20%7C%20%5E%7B%7C%20P%20%7C%7D%20%7D%20%3D%20%5Cfrac%7B365%21%7D%7B365%5EP%5Ctimes%20%28365%20-%20P%29%21%7D&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='\\frac{\\frac{| B |!}{(| B | - | P |)!}}{ | B | ^{| P |} } = \\frac{365!}{365^P\\times (365 - P)!}' style='vertical-align:1%' class='tex' alt='\\frac{\\frac{| B |!}{(| B | - | P |)!}}{ | B | ^{| P |} } = \\frac{365!}{365^P\\times (365 - P)!}' \/>.<\/p>\n<p> So we&#8217;ve got the exact same result &#8211; but it&#8217;s a whole lot easier in term of the discrete functions!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>To me, the thing that makes probability fun is that the results are frequently surprising. We&#8217;ve got very strong instincts about how we expect numbers to work. But when you do anything that involves a lot of computations with big numbers, our intuition goes out the window &#8211; nothing works the way we expect it [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[53],"tags":[],"class_list":["post-2256","post","type-post","status-publish","format-standard","hentry","category-probability"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4lzZS-Ao","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/2256","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/comments?post=2256"}],"version-history":[{"count":2,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/2256\/revisions"}],"predecessor-version":[{"id":3293,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/2256\/revisions\/3293"}],"wp:attachment":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/media?parent=2256"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/categories?post=2256"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/tags?post=2256"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}