{"id":627,"date":"2008-04-14T13:19:43","date_gmt":"2008-04-14T13:19:43","guid":{"rendered":"http:\/\/scientopia.org\/blogs\/goodmath\/2008\/04\/14\/probability-distributions\/"},"modified":"2008-04-14T13:19:43","modified_gmt":"2008-04-14T13:19:43","slug":"probability-distributions","status":"publish","type":"post","link":"http:\/\/www.goodmath.org\/blog\/2008\/04\/14\/probability-distributions\/","title":{"rendered":"Probability Distributions"},"content":{"rendered":"<p> I&#8217;d like to start with a quick apology. Sorry that both the abstract algebra and the new game theory posts have been moving so slowly. I&#8217;ve been a bit overwhelmed lately with things that need doing <em>right away<\/em>, and<br \/>\nby the time I&#8217;m done with all that, I&#8217;m too tired to write anything that<br \/>\nrequires a lot of care. I&#8217;ve known the probability theory stuff for so long,<br \/>\nand the parts I&#8217;m writing about so far are so simple that it really doesn&#8217;t take nearly as much effort.<\/p>\n<p> With that out of the way, today, I&#8217;m going to write about probability distributions.<\/p>\n<p><!--more--><\/p>\n<p> Take a probabilistic event. That&#8217;s an event where we don&#8217;t know how<br \/>\nit&#8217;s going to turn out. But we know enough to describe the set of possible outcomes. We can formulate a random variable for the basic idea of the event, or we can consider it in isolation, and use our knowledge of it to describe<br \/>\nthe possible outcomes. A probability distribution describes the set of outcomes of some probabilistic event &#8211; and how likely each one is. A<br \/>\nprobability distribution always has the property that if S in the set<br \/>\nof possible outcomes, and &forall;s&isin;S:P(s) is the probability of outcome s, then &Sigma;<sub>s&isin;S<\/sub>P(s)=1 &#8211; which is really just another way<br \/>\nof saying that the probability distribution covers all possible outcomes.<\/p>\n<p> In the last post, about random variables, I described a formal abstract<br \/>\nversion of what we mean by &#8220;how likely&#8221; a given outcome is. From here on,<br \/>\nwe&#8217;ll mainly use the informal version, where an outcome O with a probability<br \/>\nP(O)=1\/N means that if watched the event N times, we&#8217;d expect to see O occur<br \/>\nonce.<\/p>\n<p> Probability distributions are incredibly important to understand statistics. If we know the type of distribution, we can make much stronger<br \/>\nstatements about what a given statistic <em>means<\/em>. We can also do all<br \/>\nsorts of interesting things if we understand what the distributions look like.<\/p>\n<p> For example, when people do work in computer networks, some of the key<br \/>\nconcerns are called bandwidth and utilization. One of the really<br \/>\ninteresting things about networks is that they&#8217;re not generally as fast<br \/>\nas we think they are. The bandwidth of a channel is the maximum amount of<br \/>\ninformation that can be transmitted through that channel in a given time. The utilization is the percentage of bandwidth actually being used at a particular moment. The utilization virtually never reaches 100%. In fact, on most networks, it&#8217;s <em>much<\/em> lower. The higher utilization gets, the harder it is for anyone else to use what&#8217;s left.<\/p>\n<p> For different network protocols, the peak utilization &#8211; the amount of the bandwidth  that can be used before performance starts to drop off &#8211; varies. There&#8217;s often a complex tradeoff. To give two extreme cases, you can build<br \/>\na protocol in which there is a &#8220;token&#8221; associated with the right to transmit on the wire, and the token is passed around the machines in the network. This<br \/>\nguarantees that everyone will get a turn to transmit, and prevents more than one machine from trying to send at the same time. On the other hand, there&#8217;s a family of protocols (including ethernet) called &#8220;Aloha&#8221; protocols, where you<br \/>\njust transmit whenever you want to &#8211; but if someone else happens to be<br \/>\ntransmitting at the same time, both of your messages will be corrupted, and will need to be resent.<\/p>\n<p> In the token-ring case, you&#8217;re increasing the amount of time it takes<br \/>\nto get a message onto the wire during low utilization, and you&#8217;re eating up a<br \/>\nslice of the bandwidth with all of those token-maintenance messages. But as capacity increases, in theory, you can scale smoothly up to the point where all of the bandwidth is being used by either the token maintenance or by real<br \/>\ntraffic. In the case of Aloha protocols, there&#8217;s no delay, and no token maintenance overhead &#8211; but when the utilization goes up, so does the chance<br \/>\nof two machines transmitting simultaneously. So in an Aloha network, you<br \/>\nreally <em>can&#8217;t<\/em> effectively use all of the bandwidth, because as<br \/>\nutilization goes up, the amount of bandwidth dedicated to actual messages<br \/>\ndecreases, because so much is being used by messages that had to be discarded due to collisions.<\/p>\n<p> From that brief overview, you can see why it&#8217;s important to be able to<br \/>\naccurately assess what the <em>effective<\/em> bandwidth of a network is. To<br \/>\ndo that, you get into something called <em>queueing theory<\/em>, which<br \/>\nuses probability distributions to determine how much of your bandwidth will<br \/>\nlikely be taken up by collisions\/token maintenance under various utilization scenarios. Without the probability distribution, you can&#8217;t do it; and if the probability distribution doesn&#8217;t match the real pattern of usage, it will<br \/>\ngive you results that won&#8217;t match reality.<\/p>\n<p> There are a collection of basic distributions that we like to work with,<br \/>\nand it&#8217;s worth taking a few moments to run through and describe<br \/>\nthem.<\/p>\n<ul>\n<li> Uniform: the simplest distribution. In a uniform distribution,<br \/>\nthere are N outcomes, o<sub>1<\/sub>,&#8230;,o<sub>n<\/sub>, and<br \/>\nthe probability of any one of them, P(o<sub>i<\/sub>)=1\/N. Rolling<br \/>\na fair die, picking a card from a well-shuffled deck, or picking a ticket in<br \/>\na raffle are all events with uniform probability distributions.<\/li>\n<li> Bernoulli: the Bernoulli distribution covers a binary<br \/>\nevent &#8211; that is, and event that has exactly two possible<br \/>\noutcomes, &#8220;1&#8221; and &#8220;0&#8221;, and there&#8217;s a fixed probability &#8220;p&#8221;<br \/>\nof an outcome of &#8220;1&#8221; &#8211; so P(1)=p, P(0)=1-p.<\/li>\n<li> Binomial: the binomial distribution describes the probability of<br \/>\na particular number of &#8220;1&#8221; outcomes  in a limited series of<br \/>\nindependent trials of an event with a Bernoulli distribution.<br \/>\nSo, for example, a binomial probability distribution will say<br \/>\nsomething like &#8220;Given 100 trials, there is a 1% probability of<br \/>\none success, a 3% probability of 2 successes, &#8230;&#8221;, etc.<br \/>\nThe binomial distribution is one of the sources of the<br \/>\nperfect bell curve. As the number of data points increases,<br \/>\na histogram of the Bernoulli distribution becomes closer and closer<br \/>\nto the perfect bell.<\/li>\n<li> Zipf distribution. As a guy married to a computational linguist,<br \/>\nI&#8217;m familiar with this one. It&#8217;s a <em>power law<\/em> distribution:<br \/>\ngiven an ordered set of outcomes, o<sub>i<\/sub>, &#8230;, o<sub>n<\/sub>,<br \/>\nthe probabilities work out so that P(o<sub>1<\/sub>)=2&times;P(o<sub>2<\/sub>); P(o<sub>2<\/sub>)=2&times;P(o<sub>3<\/sub>), and so on. The most<br \/>\nexample of this is if you take a huge volume of english text, you&#8217;ll<br \/>\nfind that the most common word is &#8220;the&#8221;; randomly picking<br \/>\na work from english text, the probability of &#8220;the&#8221; is about 7%. The<br \/>\nnumber 2 word is &#8220;of&#8221;, which has a probability of about 3.5%. And so<br \/>\non. If you plot a Zipf distribution on a log-log scale, with the<br \/>\nvertical axis descending powers of 10, and the horizontal axis<br \/>\nis ordered o<sub>i<\/sub>s, you&#8217;ll get a descending line.<\/li>\n<li> Poisson. The poisson distribution is the one used in computer<br \/>\nnetworks, as I described above. The poisson distribution describes<br \/>\nthe probability of a given number of events happening in a particular<br \/>\nlength of time, given that (a) the events have a fixed average<br \/>\nrate of occurence, and (b) the time to the next event is independent of<br \/>\nthe time since the previous event. It turns out that this is a pretty<br \/>\ngood model of network traffic: on a typical network, taken over<br \/>\na long period of time, each computer will generate traffic at a<br \/>\nparticular average rate. There&#8217;s an enormous amount of variation &#8211;<br \/>\nsometimes it&#8217;ll go hours without sending anything, and sometimes it&#8217;ll<br \/>\nsend a thousand messages in a second. But overall, it averages out<br \/>\nto something regular. And there&#8217;s no way of knowing what pattern it will<br \/>\nfollow: just because you&#8217;ve just seen 1,000 messages per second for 3<br \/>\nseconds, you can&#8217;t predict that it will be 1\/1000th of a second before<br \/>\nthe next message. <\/li>\n<li> Zeta. The zeta distribution is basically the Zipf distribution over<br \/>\nan infinite number of possible discrete outcomes.<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;d like to start with a quick apology. Sorry that both the abstract algebra and the new game theory posts have been moving so slowly. I&#8217;ve been a bit overwhelmed lately with things that need doing right away, and by the time I&#8217;m done with all that, I&#8217;m too tired to write anything that requires [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[61],"tags":[],"class_list":["post-627","post","type-post","status-publish","format-standard","hentry","category-statistics"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4lzZS-a7","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/627","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/comments?post=627"}],"version-history":[{"count":0,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/627\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/media?parent=627"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/categories?post=627"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/tags?post=627"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}