{"id":288,"date":"2007-01-25T14:35:31","date_gmt":"2007-01-25T14:35:31","guid":{"rendered":"http:\/\/scientopia.org\/blogs\/goodmath\/2007\/01\/25\/basics-correlation\/"},"modified":"2007-01-25T14:35:31","modified_gmt":"2007-01-25T14:35:31","slug":"basics-correlation","status":"publish","type":"post","link":"http:\/\/www.goodmath.org\/blog\/2007\/01\/25\/basics-correlation\/","title":{"rendered":"Basics: Correlation"},"content":{"rendered":"<p>Correlation and Causation<\/p>\n<p> Yet another of the most abused mathematical concepts is the concept of <em>correlation<\/em>, along with the related (but different) concept of <em>causation<\/em>.<\/p>\n<p> Correlation is actually a remarkably simple concept, which makes it all the more frustrating<br \/>\nto see the nonsense constantly spewed in talking about it. Correlation is a <em>linear relationship<\/em> between two random variables.<\/p>\n<p><!--more--><\/p>\n<p> Let&#8217;s look at the pieces of that, to really nail down the meaning of the concept precisely.<\/p>\n<p> What&#8217;s a <em>random variable<\/em>? <\/p>\n<p> A random variable is a rather dreadful misnomer &#8211; because it&#8217;s not actually a variable. A random variable is a <em>function<\/em> which maps the outcome of an experiment to a numeric value. That can be as simple as mapping, say, &#8220;3 milliliters&#8221; (a measured outcome) to the number 3. It can also<br \/>\nbe much more complicated &#8211; for example, the methods used in assigning a numeric value to the<br \/>\nquality of a chessboard position. But the thing to remember is that it&#8217;s a mapping from the outcome of an experiment to a numeric value.<\/p>\n<p> A random variable is typically described in terms of a probability distribution (to be more precise, a probably distribution <em>or<\/em> a probability density function.). The details<br \/>\naren&#8217;t appropriate for this post, but the basic idea is just that to understand a measured quantity<br \/>\nfrom an experiment, you need to understand how that quantity varies. The things you can meaningfully<br \/>\nconclude from measurements of a random variable with a normal distribution are quite different from things you can conclude from measurements of a random variable with a logarithmic distribution.<\/p>\n<p> So given two independent random variables, x and y, what does it mean to say that X and Y are <em>correlated<\/em>? It means that given a change in x, you can with high probability predict<br \/>\nan equivalent change in Y using a linear calculation. In informal use, we frequently drop the linear part: if when X increases, Y also increases; when X decreases, Y also decreases; and when X stays the same, then Y stays the same; then we say X and Y are correlated.<\/p>\n<p> As usual, an example helps. Suppose I&#8217;m measuring the shoe sizes and heights of a group of men. In general, (not always, but in general), measurements have shown that the size of your feet is<br \/>\nproportional to your maximum adult height. So if I did a mapping between the size of feet and height in adult men, I would find a strong correlation between height and foot size.<\/p>\n<p> On the other hand, if I were to measure the size of the vocabulary that people use in the course<br \/>\nof a normal day, and compare it to the length of their ring fingers, I would find that there&#8217;s no<br \/>\ncorrelation to speak of. (Yes, I have actually seen a study that measured this. The hypothesis was<br \/>\nthat the length of ring fingers (at least, I think it was the ring finger) is related to the quantity<br \/>\nof certain hormones that a fetus is exposed to in the womb, and that that hormone level also affects<br \/>\nthe development of the speech centers of the brain.)<\/p>\n<p> If you&#8217;re reading carefully, you&#8217;ll notice that I went from a definition that said when two<br \/>\nthings are correlated, but then in the example, I said &#8220;strong&#8221; correlation. The reason is that in<br \/>\npractice, things are rarely quite as clear as they are in theory. Even things that really do<br \/>\ncorrelate perfectly, if we&#8217;re measuring them in a series of experiments, experimental error is going<br \/>\nto create enough variation that in the measured data, the correlation won&#8217;t be perfect. So we have a<br \/>\nmeasurement, called the correlation coefficient, of a relation between two random variables X and Y (usually abbreviated <em>c<sub>x,y<\/sub><\/em> or <em>r<sub>x,y<\/sub><\/em>), which measures the<br \/>\n<em>strength<\/em> of a correlation between X and Y. <em>c<sub>x,y<\/sub><\/em> varies from -1 to +1. c<sub>x,y<\/sub>=0 indicates absolutely no correlation at all between X and Y; c<sub>x,y<\/sub>=+1 means that X and Y are perfectly correlated; c<sub>x,y<\/sub>=-1 means that X and -Y are perfectly correlated.<\/p>\n<p> The way that we compute the correlation coefficient of a set of data is as follows. Given a set of data with two random variables X and Y, where there is a set of measurements {x<sub>1<\/sub>&#8230;x<sub>n<\/sub>} of X with mean <span style=\"text-decoration: overline\">x<\/span> and standard deviation &sigma;<sub>x<\/sub>, and a series of measurements {y<sub>1<\/sub>,&#8230;,y<sub>n<\/sub>} of Y with mean <span style=\"text-decoration: overline\">y<\/span> and standard deviation &sigma;<sub>y<\/sub>, the correlation coefficient is:<\/p>\n<p><!-- insert equation image here --><br \/>\n<img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" alt=\"corr.mmf.jpg\" src=\"https:\/\/i0.wp.com\/scientopia.org\/img-archive\/goodmath\/img_138.jpg?resize=144%2C32\" width=\"144\" height=\"32\" \/><\/p>\n<p> The closer the correlation coefficient is to either 1 or -1, the stronger the linear relationship between the two random variables. If it&#8217;s close to +1, then it&#8217;s called a <em>strong positive correlation<\/em>; if it&#8217;s close to -1, it&#8217;s called a <em>strong negative correlation<\/em>.<\/p>\n<p> One thing you&#8217;ll constantly hear in discussions is &#8220;correlation does not imply causation&#8221;. Causation isn&#8217;t really a mathematical notion &#8211; and that&#8217;s the root of that confusion. Correlation means that as one value changes, another variable changes in the same way. Causation means that when one value changes, it <em>causes<\/em> the other to change. There is a very big difference between<br \/>\ncausation and correlation. To give a rather commonly abused example: the majority of children with autism are diagnosed between the ages of 18 months and three years old. That&#8217;s also the <em>same<\/em> period of time when children receive a large number of immunizations. So people see the <em>correlation<\/em> between receiving immunizations and the diagnosis of autism, and assume that<br \/>\nthat means that the immunizations <em>cause<\/em> autism. But in fact, there is no causal linkage. The causal factor in both cases is age: there is a particular age when a child&#8217;s intellectual development reaches a stage when autism becomes obvious; and there is a particular age when certain vaccinations are traditionally given.  It just happens that they&#8217;re <em>roughly<\/em> the same age. <\/p>\n<p> The catch &#8211; and it&#8217;s a big one &#8211; is that correlation does <em>strongly suggest<\/em> a causal relationship. (There&#8217;s a Yale professor of statistics who&#8217;s famous for saying something close to &#8220;Correlation is not the same thing as causation &#8211; but it&#8217;s a darn good hint!&#8221;. ) It may not be the case that X causes Y or Y causes X &#8211; but if there&#8217;s a strong correlation between them, you <em>should<\/em> suspect that there&#8217;s a causal relationship. It may be that both X and Y are dependent on some third factor (is in the vaccine case). But too often, the mantra &#8220;correlation does not imply causation&#8221; is a hand-waving way of dismissing data that the speaker doesn&#8217;t feel like dealing with.<\/p>\n<p> On the other hand, you constantly see people waving around statistics that show correlations, and<br \/>\nwho insist that the correlation implies causation. (Again, look at that vaccine example!) To show causation, you need to show a <em>mechanism<\/em> for the cause, and demonstrate that mechanism<br \/>\nexperimentally. So when someone shows you a correlation, what you <em>should<\/em> do is look for a plausible causal mechanism, and see if there&#8217;s any experimental data to support it. Without<br \/>\na demonstrable causal mechanism, you can&#8217;t be sure that there&#8217;s a causal relationship &#8211; it&#8217;s just<br \/>\na correlation.<\/p>\n<p> One more really  interesting example that I read about when my daughter was little. There was a study published a few years ago showing a pretty strong correlation between leaving a night-light on in a child&#8217;s room, and that child developing nearsightedness. This caused a big stir at the time; it got written up in newspapers, reported on by various television programs, etc. But other studies were unable to reproduce that correlation. Finally a group from Ohio State found that while they could not consistently reproduce a direct linkage between night-lights and nearsightedness, they <em>could<\/em> find a correlation between nearsighted parents and nearsighted children, and that there was <em>also<\/em> a correlation between nearsighted parents and the likelihood of having a night-light in their child&#8217;s room. In other words, children whose parents are nearsighted are more likely to have a night-light in their childrens&#8217; rooms, and children are also likely to inherit nearsightedness from their parents. Correlation, but not causation: both the night-lights and the nearsightedness are caused by the parents&#8217; nearsightedness; there&#8217;s no causal connection between the night-lights and the nearsightedness.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Correlation and Causation Yet another of the most abused mathematical concepts is the concept of correlation, along with the related (but different) concept of causation. Correlation is actually a remarkably simple concept, which makes it all the more frustrating to see the nonsense constantly spewed in talking about it. Correlation is a linear relationship between [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[74],"tags":[],"class_list":["post-288","post","type-post","status-publish","format-standard","hentry","category-basics"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4lzZS-4E","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/288","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/comments?post=288"}],"version-history":[{"count":0,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/288\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/media?parent=288"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/categories?post=288"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/tags?post=288"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}