{"id":536,"date":"2007-10-25T12:15:00","date_gmt":"2007-10-25T12:15:00","guid":{"rendered":"http:\/\/scientopia.org\/blogs\/goodmath\/2007\/10\/25\/tag-teaming-with-orac-bad-bad-breast-cancer-math-in-jpands\/"},"modified":"2007-10-25T12:15:00","modified_gmt":"2007-10-25T12:15:00","slug":"tag-teaming-with-orac-bad-bad-breast-cancer-math-in-jpands","status":"publish","type":"post","link":"http:\/\/www.goodmath.org\/blog\/2007\/10\/25\/tag-teaming-with-orac-bad-bad-breast-cancer-math-in-jpands\/","title":{"rendered":"Tag-Teaming with Orac: Bad, Bad Breast Cancer Math in JPANDS"},"content":{"rendered":"<p> My friend, fellow ScienceBlogger, and BlogFather Orac asked me to take a look at <a href=\"http:\/\/www.jpands.org\/vol12no3\/carroll.pdf\">a paper<\/a> that purportedly shows that abortion is a<br \/>\ncausative risk factor for breast cancer, which he <a href=\"http:\/\/scienceblogs.com\/insolence\/2007\/10\/abortion_and_breast_cancer_the_chicago_t.php\">posted about<br \/>\nthis morning<\/a>. When the person who motivated me to start what&#8217;s turned out to be a shockingly<br \/>\nsuccessful blog asks for something, how could I possibly say no? Especially when it&#8217;s such a great example<br \/>\nof the misuse of mathematics for political purposes?<\/p>\n<p><!--more--><\/p>\n<p> The paper is &#8220;The Breast Cancer Epidemic: Modeling and Forecasts Based on Abortion and Other Risk<br \/>\nFactors&#8221;, by Patrick S. Carroll, published in the Journal of American Physicians and Surgeons (JPANDS).<br \/>\nBefore getting to the meat of the paper, there&#8217;s a couple of preliminary things to say about it.<\/p>\n<p> In an ideal world, every time you read a paper, you&#8217;d study every bit of it in great, absorbing detail. But in the real world, you can&#8217;t do that. There are too many papers; if you tried to give every paper a full and carefully detailed reading, even if you never stopped to eat and sleep,<br \/>\nyou&#8217;d be falling further behind every day. So a major skill that you acquire when you learn<br \/>\nto do research is how to triage, and decide how much attention to give to different kinds of papers.<\/p>\n<p> One thing that you should always consider when you set out to look at a paper is look at<br \/>\nits conclusions. In general, there are a few basic kinds of papers. There are papers presenting<br \/>\nentirely new information; there are papers that are adding something new to an established<br \/>\nconsensus; there are papers that are just piling more data onto an established consensus; and there<br \/>\nare papers that are refuting an established consensus. The way that you read a paper depends on<br \/>\nwhat kind of paper it is.<\/p>\n<p> If a paper is just piling on more evidence, you look at the data that it presents &#8211; and don&#8217;t pay a lot attention to much else, because they&#8217;re rehashing what&#8217;s already been said. The only really interesting thing in a paper like that is the data. So you focus your attention on the data, how it<br \/>\nwas gathered and how it was analyzed, and what (if anything) it adds to what we already know.<\/p>\n<p> If a paper adds something new to a consensus, then you give it more careful attention. You&#8217;re<br \/>\nstill focused primarily on the data, but you also want to carefully look at how the data was<br \/>\ngathered and analyzed, to see if the new information that they&#8217;re adding is valid.<\/p>\n<p> The first and last kinds of paper: the ones that present something totally new, and the ones that refute something for which there is a lot of strongly supported data, you read with<br \/>\nmuch greater care and attention to detail.  These are the papers that make the strongest claims,<br \/>\nand which haven&#8217;t been carefully looked at by many different people yet, so they require the most<br \/>\ncareful attention and analysis. This paper is a member of that last class: it&#8217;s claiming to find<br \/>\na statistical link which many careful studies have <em>not<\/em> found. So it&#8217;s in line for a very<br \/>\ncareful reading.<\/p>\n<p> So what&#8217;s the source? Well, it&#8217;s published in JPANDS. JPANDS in a <em>terrible<\/em> journal. In fact, the first post on Good Math\/Bad Math was a critique of a JPANDS paper that used some of the worst statistics that I&#8217;ve ever seen published. That&#8217;s bad &#8211; the paper is appearing in a very low-credibility journal with a history of not carefully reviewing statistical analysis. That&#8217;s certainly not enough<br \/>\nto justify ignoring the paper &#8211; but the quality of the journal is a valid consideration. A paper about<br \/>\nthis topic that appears in a prestigious cancer our epidemiology journal has more credibility than<br \/>\na paper that appears in a journal known for publishing garbage.  It, quite naturally, brings<br \/>\nto mind the question &#8220;Why publish this work in a non-MEDLINE indexed, low quality journal?&#8221; Like I said, it&#8217;s not enough to ignore the paper, but it does raise red flags right away: this is a paper where you&#8217;re going to have to give the data and its analysis a very careful read.<\/p>\n<p> So, on to the paper. What the paper does is select a set of potential risk factors for breast cancer, and then compare the incidence of those risk factors in a group of populations with the incidence of<br \/>\nbreast cancer in those same populations. That&#8217;s a sort-of strange approach. At best, that approach can<br \/>\nshow a statistical correlation, but it&#8217;s going to be a weak one &#8211; because it doesn&#8217;t maintain any link<br \/>\nbetween <em>individuals<\/em> with risk factors and the incidence of disease. In general, you use<br \/>\na correlative study like that when you <em>can&#8217;t<\/em> associate risk factors and incidences with specific<br \/>\nindividuals. The author does address this point:  he says that it&#8217;s difficult for epidemiologists to<br \/>\nobtain information about whether a particular woman had an abortion. So that addresses that criticism, but<br \/>\nthe fact remains that it&#8217;s going to be much harder to establish a causal link rather than a correlative link using this methodology.<\/p>\n<p> To try to build a model, he selects a list of 7 risk factors: abortion, higher age at first live<br \/>\nbirth, childlessness, number of children, breastfeeding, hormonal contraceptive use, and hormone<br \/>\nreplacement therapy. This list raises some red flags. It omits a large number of well-known risk factors<br \/>\nwhich could easily outweigh the factors that are included in the list: smoking, alchohol, genetic risk,<br \/>\nrace. (Orac has more to say about that.) But what&#8217;s also important to notice is that these factors are<br \/>\n<em>not<\/em> independent. The number of women who breastfeed are, obviously, strongly correlated with the<br \/>\nnumber who&#8217;ve had children. The women who have a large number of children are much more likely to have<br \/>\ntheir first child at a younger age than the women who had only one or two children. And it ignores<br \/>\nimportant correlative factors: higher income women tend to have fewer children, later age at first birth,<br \/>\nand higher rates of breastfeeding. This list looks fishy. <\/p>\n<p> But what comes next is where things just totally go off the rails. He takes the 7 risk factors,<br \/>\nand using information from public health services, does a linear regression of risk factor versus cancer incidence over time. If the linear regression doesn&#8217;t produce a  strong positive correlation, <em>he throws it away<\/em>. The fact that this means that he&#8217;s asserting that well-known and well-supported<br \/>\ncorrelations should be discarded as invalid isn&#8217;t even mentioned. But what&#8217;s worse is, it&#8217;s clearly quite<br \/>\ndeliberate.<\/p>\n<p> On page two, he shows a graph of the data for &#8220;mean age of first live birth&#8221; plotted against breast<br \/>\ncancer risk. How does he assemble the graph for the linear regression? For each year, he takes the<br \/>\ncomplete set of women born that year. Then he computes the average age of first birth for all women born<br \/>\nthat year, and tries to correlate it with the breast cancer incidence for women born in that year. That&#8217;s<br \/>\n<em>ridiculous<\/em>. It is a <em>completely<\/em> unacceptable and invalid use of statistics. Anyone who&#8217;s<br \/>\neven taken a college freshman course in stats should know that that is absolutely ridiculous. It&#8217;s very<br \/>\ndeliberately ignoring independence from other variables, in obviously foolish ways. I just don&#8217;t even<br \/>\nknow how to mock this, because it&#8217;s so off-the-wall ridiculous.<\/p>\n<p> There&#8217;s another obvious problem with the whole methodology, which pales in comparison to<br \/>\nthe dreadful way that they selected data. But I&#8217;ll mention it anyway. Linear regression and correlation<br \/>\ncoefficient measures how well a <em>linear relationship<\/em> matches the data. It doesn&#8217;t test for<br \/>\nanything else. But there are numerous correlative and\/or causal relationships that don&#8217;t show a<br \/>\nsimple linear relationship. For example, if you look at alcohol consumption plotted against<br \/>\nvarious diseases, there&#8217;s often an initial <em>decrease<\/em> in risk, which bottoms out and is followed by a large <em>increase<\/em> in risk. There are often threshold effects, where something doesn&#8217;t start<br \/>\nto have an impact until beyond a minimum threshold. And so on. There&#8217;s a lot more to things that<br \/>\njust linear correlation. But all that the author considers is linear correlation. He gives no reason<br \/>\nfor that, and makes no attempt to justify it. It&#8217;s just presented as if it&#8217;s beyond question.<\/p>\n<p> Based on those linear regressions, he totally discards everything without a strong linear correlation<br \/>\nas being irrelevant factors that don&#8217;t need to be included in the model. That<br \/>\nleaves him with only two factors: fertility (number of live births) and abortion. So then, once<br \/>\nagain building on the assumption that linear relationships are the only things that matter, he says<br \/>\nthat they can model the breast cancer incidence via a simple linear equation:<\/p>\n<p>Y<sub>i<\/sub> = a + b<sub>1<\/sub>x<sub>1i<\/sub> + b<sub>2<\/sub>x<sub>2i<\/sub> + e<sub>i<\/sub><\/p>\n<p> In this, Y<sub>i<\/sub> is the breast cancer incidence in the group of women of age <em>i<\/em>;<br \/>\nx<sub>1i<\/sub> is a measure of the number of abortions;\tx<sub>2i<\/sub> is a measure of fertility.<br \/>\nThey then do another linear regression using this equation to come up with coefficients for<br \/>\nthe two measured quantities. The coefficient for fertility is -0.0047, with a  95% confidence interval<br \/>\nranging from -0.0135 to _0.0041. In other words, according to their measure, fertility &#8211; the rate<br \/>\nof live birth &#8211; is <em>not<\/em> a significant factor in breast cancer rates compared to<br \/>\nabortion.<\/p>\n<p> Right there, we can stop looking at the paper. When a mathematical model generates an<br \/>\nincredibly ridiculous result, something which is in direct and blatant contradiction with<br \/>\nthe known data, you throw the model right out the window, because it&#8217;s worthless. The notion that<br \/>\nabortion as a risk factor for breast cancer completely dwarfs the reduction in risk after<br \/>\nchildbirth &#8211; when we know that having children causes a dramatic decrease in the risk of<br \/>\nbreast cancer &#8211; is unquestionably wrong. If it were true, what it would mean is that the<br \/>\nnumber of cases of breast cancer among women who had no children but had an abortion (which,<br \/>\nfrom what I can estimate from data from a variety of websites is somewhere around 15%) is<br \/>\n<em>so high<\/em> that it can completely dwarf the risk reduction among women who did<br \/>\nhave children (&gt;80%). If that were the case, it would be incredibly obvious in the statistics<br \/>\nof breast cancer rates &#8211; you&#8217;d have a small sub-population causing an inordinately huge<br \/>\nportion of the breast cancer rates. We know that things like that are easily visible: that&#8217;s how<br \/>\nwe discovered the so-called &#8220;breast cancer genes&#8221; &#8211; a small group of women were<br \/>\ndramatically more likely to have breast cancer that the population at large.<\/p>\n<p> So we&#8217;ve got a model which doesn&#8217;t fit reality. What a real scientist does when this happens<br \/>\nis to say &#8220;Damn, I was wrong. Back to the ol&#8217; drawing board&#8221;, and try to find a new model that<br \/>\n<em>does<\/em> fit with reality.<\/p>\n<p> But not this intrepid author. He tries to handwave his way past the fact that his model is<br \/>\nwrong, by saying &#8220;The coefficient of fertility is rather small, with the 95% confidence interval straddling zero. Some improvement in breastfeeding may be offsetting fertility decline.&#8221; No, sorry, you can&#8217;t say &#8220;My mathematical model has absolutely no relation with reality, but that&#8217;s probably because one of the factors that I excluded is probably important, and so now I&#8217;m going to go on pretending that the model works.&#8221;<\/p>\n<p><p> The model is <em>wrong<\/em>. Invalid models to not produce valid results. Stop. Do not pass go. Do not collect $200. Do not get your paper published in a decent journal. <em>Do<\/em> get laughed at by people who aren&#8217;t clueless jackasses.<\/p>\n<p> At this point, we can see just why this paper appeared in a journal like JPANDS. Because it&#8217;s<br \/>\ncrap that&#8217;s just attempting to justify a political position using incredibly sloppy math; math so<br \/>\nbad that a college freshman should be able to see what&#8217;s wrong with it. But for the &#8220;reviewers&#8221; at JPANDS, apparently a college freshman level of knowledge of statistics isn&#8217;t necessary for reviewing<br \/>\na paper on statistical epidemiology.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>My friend, fellow ScienceBlogger, and BlogFather Orac asked me to take a look at a paper that purportedly shows that abortion is a causative risk factor for breast cancer, which he posted about this morning. When the person who motivated me to start what&#8217;s turned out to be a shockingly successful blog asks for something, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[8],"tags":[],"class_list":["post-536","post","type-post","status-publish","format-standard","hentry","category-bad-statistics"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4lzZS-8E","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/536","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/comments?post=536"}],"version-history":[{"count":0,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/536\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/media?parent=536"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/categories?post=536"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/tags?post=536"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}