Least Square Linear Regression

There’s one topic I’ve been asked about multiple times, but which I’ve never gotten around to writing about. It happens to be one of the first math things that my dad taught me about: linear regression.

Here’s the problem: you’re doing an experiment. You’re measuring one quantity as you vary another. You’ve got a good reason to believe that there’s a linear relationship between the two quantities. But your measurements are full of noise, so when you plot the data on a graph, you get a scattershot. How can you figure out what line is the best match to your data, and how can you measure how good the match is?

When my dad taught me this, he was working for RCA manufacturing semiconductor chips for military and satellite applications. The focus of his work was building chips that would survive in the high-radiation environment of space – in the jargon, he was building radiation hard components. They’d put together a set of masks for an assembly line, and do a test run. Then they’d take the chips from that run, and they’d expose them to gamma radiation until they failed. That would give them a good way of estimating the actual radiation hardness of the run, and whether it was good enough for their customers. Based on a combination of theory and experience, they knew that the relationship they cared about was nearly linear: for a given amount of radiation, the number of specific circuitry failures was proportional to the amount of gamma exposure.

graph For example, here’s a graph that I generated semi-randomly of data points. The distribution of the points isn’t really what you’d get from real observations, but it’s good enough for demonstration.

The way that we’d usually approach this is called least square linear regression. The idea is that what we want do do is create a line where the square of the vertical distance between the chosen line and the measured data points is a minimum.

For the purposes of this, we’ll say that one quantity is the independent value, and we’ll call that x, and the other quantity is the dependent variable, and we’ll call that y. In theory, the dependent variable, as its name suggests depends on the independent variable. In fact, we don’t always really know which value depends on the other, so we do our best to make an intelligent guess.

So what we want to do is find a linear equation, y = mx + b where the mean-square distance is minimal. All we need to do is find values for m (the slope of the line) and b (the point where the line crosses the y axis, also called the y intercept). And, in fact, b is relatively easy to compute once we know the slope of the line. So the real trick is to find the slope of the line.

The way that we do that is: first we compute the means of x and y, which we’ll call overline{x} and overline{y}. Then using those, we compute the slope as:

 m = frac{Sigma_{i=1}^n (x-hat{x})(y-hat{y})}{Sigma_{i=1}^{n} (x-hat{x})^2}

Then for the y intercept: b = hat{y} - mhat{x}.

In the case of this data: I set up the script so that the slope would be about 2.2 +/- 0.5. The slop in the figure is 2.54, and the y-intercept is 18.4.

Now, we want to check how good the linear relationship is. There’s several different ways of doing that. The simplest is called the correlation coefficient, or r.

 r = 	frac{left(Sigma (x-hat{x})right) left(Sigma (y - hat{y})right)}{	sqrt{ left(Sigma (x-hat{x})^2right) left(Sigma (y - hat{y})^2right)  }}

If you look at this, it’s really a check of how well the variation between the measured values and the expected values (according to the regression) match. On the top, you’ve got a set of products; on the bottom, you’ve got the square root of the same thing squared. The bottom is, essentially, just stripping the signs away. The end result is that if the correlation is perfect – that is, if the dependent variable increases linearly with the independent, then the correlation will be 1. If the dependency variable decreases linearly in opposition to the dependent, then the correlation will be -1. If there’s no relationship, then the correlation will be 0.

For this particular set of data, I generated it with a linear equation with a little bit of random noise. The correlation coefficient is slighly greater than 0.95, which is exctly what you’d expect.

When you see people use linear regression, there are a few common errors that you’ll see all the time.

  • No matter what your data set looks like, linear regression will find a line. It won’t tell you “Oops, I couldn’t find a match”. So the fact that you fit a line means absolutely nothing by itself. If you’re doing it right, you start off with a hypothesis based on prior plausibility for a linear relation, and you’re using regression as part of a process to test that hypothesis.
  • You don’t get to look at the graph before you do the analysis. What I mean by that is, if you look at the data, you’ll naturally notice some patterns. Humans are pattern seekers – we’re really good at noticing them. And almost any data set that you look at carefully enough will contain some patterns purely by chance. If you look at the data, and there’s a particular pattern that you want to see, you’ll probably find a way to look at the data that produces that pattern. For example, in the first post on this blog, I was looking at a shoddy analysis by some anti-vaxxers, who were claiming that they’d found an inflection point in the rate of autism diagnoses, and used linear regression to fit two lines – one before the inflection, one after. But that wasn’t supported in the data. It was random – the data was very noisy. You could fit different lines to different sections by being selective. If you picked one time, you’d get a steeper slope before that time, and a shallower one after. But by picking different points, you could get a steeping slope after. The point is, when you’re testing the data, you need to design the tests before you’ve seen the data, in order to keep your bias out!

  • A strong correlation doesn’t imply linear correlation. If you fit a line to a bunch of data that’s not really linear, you can still get a strong positive (or negative) correlation. Correlation is really testing whether the data is increasing the way you’d expect it to, not whether it’s truly linear. Random data will have a near-zero correlation. Data where the dependent variable doesn’t vary consistently with the independent will have near-zero correlation. But there are plenty of ways of getting data where the dependent and independent variables increase together that produce a strong correlation. You need to do other things to judge the strength of the fit. (I might do some more posts on this kind of thing to cover some of that.)

Not Really Free Energy from Silicon and Water

A reader sent me a link to an article about a new technique for producing hydrogen. His original request was for me to debunk it, because it looked like another free energy scam. But it isn’t, so I’m going to attempt to explain why.

BUFFALO, N.Y. — Super-small particles of silicon react with water to produce hydrogen almost instantaneously, according to University at Buffalo researchers.

In a series of experiments, the scientists created spherical silicon particles about 10 nanometers in diameter. When combined with water, these particles reacted to form silicic acid (a nontoxic byproduct) and hydrogen — a potential source of energy for fuel cells.

The reaction didn’t require any light, heat or electricity, and also created hydrogen about 150 times faster than similar reactions using silicon particles 100 nanometers wide, and 1,000 times faster than bulk silicon, according to the study.

Producing hydrogen from water, with no light, heat, or electricity! I can’t blame you if you read that, and think that it sounds like a free-energy scam. We’re talking about producing hydrogen from water without using any energy!

But here’s the catch: it’s a chemical reaction. It consumes the silicon, producing silicic acid.

There are all sorts of chemical reactions that can release energy. That’s why we like gasoline so much: it’s a chemical that reacts easily with oxygen, and releases lots of energy when it does. There’s no cheating going on there. As thermodynamics predicts, everything trends towards the lowest energy state – if you look at the thermodynamic system containing a reaction that releases energy, it properly ends up in a lower energy state.

In the case of this silicon bead technique, all that we’re seeing is a typical entropy increasing chemical reaction.

If the silicon wasn’t consumed by the reaction, but we were still able to break water molecules producing free hydrogen, which we could then burn to produce energy, that would be a free energy system: you could take water, dump in the silicon, produce hydrogen, burn the hydrogen, releasing energy and producing water, and then take the resulting water, and dump it back into the silicon mixture. But that’s not what’s going on here. Instead, the silicon is getting consumed, and if you wanted to re-use it, you would need to use energy to extract it from the silicic acid – which would be a typical lossy thermodynamic process.

Recipe: Kimchi-Marinated Steak Sandwich

Last weekend, I was watching cooking shows on PBS. One of my favorite chefs, Ming Tsai, came on. Along with a guest, he ad-libbed a dish that looked absolutely phenomenal: a kimchi marinated cheesesteak sandwich.

I didn’t write down the recipe; I just took the idea, and ran with it. I mean come on, what could be better than a well-seared medium-rare flank steak with kimchi and Tallegio? My version is, I think, a bit lighter than what they made, but it’s not exactly a light snack!

For my tastes, this is an amazing combination. Start with flank steak, which is one of my favorite cuts of beef. It’s not the tenderest piece of beef, so it needs some work, but it’s got an absolutely fantastic flavor, and as long as you prepare it properly, don’t overcook it, and slice it thin, it comes out great.

Then you’ve got kimchi. I only discovered kimchi fairly recently, and I absolutely love the stuff, and I’m always looking for more things to eat it with.

Finally, tallegio cheese. I adore soft cheeses, and tallegio is probably the nicest soft cheese that I’ve had from Italy. It’s very creamy, with a nice funkiness to it, but not an overpowering stinkiness.

Put all three of those together in a sandwich, and I’m in heaven.


  • One largish (1 1/2 pound) flank steak.
  • 1/2 cup kimchi, finely minced, plus 1 tablespoon of
    the liquid from the kimchi.
  • Some more kimchi, left whole.
  • salt and pepper
  • one teaspoon sugar.
  • one half of a large onion, cut into thin slices.
  • 1/2 pound tallegio cheese.
  • Bagette.

One note about the ingredients: I recently made homemade kimchi for the first time. It’s really easy to make, and it is really a totally different thing from the stuff out of a jar. I don’t see why it’s so different: kimchi is a fermented preserved cabbage, which is exactly the kind of thing that should be really easy to make in quantity, and then sell in jars. But the fact is, nothing from the store remotely compares to homemade. I basically used this recipe, except that I couldn’t get any daikon, so I left that out.


  1. Take a couple of forks, and go up and down the steak, poking holes. Don’t be lazy: the more you do this, the better. You’re tenderizing the meat, and making a way for the marinade to help penetrate.
  2. Salt the meat generously, then coat both sides with the minced kimchi. Refrigerate overnight.
  3. Scrape the kimchi off the steak, but do not throw it away. We’re going to cook it, and use it as a topping.
  4. Heat a cast-iron pan on high heat until it’s smoking hot.
  5. Sprinkle half the sugar and some more salt and pepper onto the top of the steak, and then put the sugared side down in the pan, for about 3 minutes. While it’s cooking on that side, put the rest of the sugar and salt onto the other side. Then turn it over, and cook the other side for 3 minutes.
  6. Transfer the steak into a 350 degree oven for 8-10 minutes. (I did eight; it was exactly the way I like it, nice and medium rare, but I think a lot of people would prefer it a bit more well done. Take it out after the ten minutes, and leave it alone for a while. Don’t slice it immediately!
  7. While the steak is resting, heat up a pan, and add some oil. Toss in the onions with a bit of salt, and stir them around. When they just start to turn brown, add in the kimchi and any liquid left from the marinade, and turn down the heat. Cook it for about five minutes, adding a bit of water if it gets too dry.
  8. Slice the break in half, and spread a thin layer of the taleggio cheese over both halves.
  9. Slice the steak on a bias against the grain into thin slices.
  10. Take any drippings from slicing the steak, and pour them into the onions and kimchi.
  11. Put a heap of steak slices onto the bread, and then put a pile of the cooked onions and kimchi, and then a piece of fresh uncooked kimchi on top.
  12. Close up the sandwich, and eat!

Philadelphia, eat your heart out. Your cheesesteaks got nothing on this!

Back to an old topic: Bad Vaccine Math

The very first Good Math/Bad Math post ever was about an idiotic bit of antivaccine rubbish. I haven’t dealt with antivaccine stuff much since then, because the bulk of the antivaccine idiocy has nothing to do with math. But the other day, a reader sent me a really interesting link from what my friend Orac calls a “wretched hive of scum and quackery”, naturalnews.com, in which they try to argue that the whooping cough vaccine is an epic failure:

(NaturalNews) The utter failure of the whooping cough (pertussis) vaccine to provide any real protection against disease is once again on display for the world to see, as yet another major outbreak of the condition has spread primarily throughout the vaccinated community. As it turns out, 90 percent of those affected by an ongoing whooping cough epidemic that was officially declared in the state of Vermont on December 13, 2012, were vaccinated against the condition — and some of these were vaccinated two or more times in accordance with official government recommendations.

As reported by the Burlington Free Press, at least 522 cases of whooping cough were confirmed by Vermont authorities last month, which was about 10 times the normal amount from previous years. Since that time, nearly 100 more cases have been confirmed, bringing the official total as of January 15, 2013, to 612 cases. The majority of those affected, according to Vermont state epidemiologist Patsy Kelso, are in the 10-14-year-old age group, and 90 percent of those confirmed have already been vaccinated one or more times for pertussis.

Even so, Kelso and others are still urging both adults and children to get a free pertussis shot at one of the free clinics set up throughout the state, insisting that both the vaccine and the Tdap booster for adults “are 80 to 90 percent effective.” Clearly this is not the case, as evidenced by the fact that those most affected in the outbreak have already been vaccinated, but officials are apparently hoping that the public is too naive or disengaged to notice this glaring disparity between what is being said and what is actually occurring.

It continues in that vein. The gist of the argument is:

  1. We say everyone needs to be vaccinated, which will protect them from getting the whooping cough.
  2. The whooping cough vaccine is, allagedly, 80 to 90% effective.
  3. 90% of the people who caught whooping cough were properly vaccinated.
  4. Therefore the vaccine can’t possibly work.

What they want you to do is look at that 80 to 90 percent effective rate, and see that only 10-20% of vaccinated people should be succeptible to the whooping cough, and compare that 10-20% to the 90% of actual infected people that were vaccinated. 20% (the upper bound of the succeptible portion of vaccinated people according to the quoted statistic) is clearly much smaller than 90% – therefore it’s obvious that the vaccine doesn’t work.

Of course, this is rubbish. It’s a classic apple to orange-grove comparison. You’re comparing percentages, when those percentages are measuring different groups – groups with wildly difference sizes.

Take a pool of 1000 people, and suppose that 95% are properly vaccinated (the current DTAP vaccination rate in the US is around 95%). That gives you 950 vaccinated people and 50 unvaccinated people who are unvaccinated.

In the vaccinated pool, let’s assume that the vaccine was fully effective on 90% of them (that’s the highest estimate of effectiveness, which will result in the lowest number of succeptible vaccinated – aka the best possible scenario for the anti-vaxers). That gives us 95 vaccinated people who are succeptible to the whooping cough.

There’s the root of the problem. Using numbers that are ridiculously friendly to the anti-vaxers, we’ve still got a population of twice as many succeptible vaccinated people as unvaccinated. so we’d expect, right out of the box, that better than 2/3rds of the cases of whooping cough would be among the vaccinated people.

In reality, the numbers are much worse for the antivax case. The percentage of people who were ever vaccinated is around 95%, because you need the vaccination to go to school. But that’s just the childhood dose. DTAP is a vaccination that needs to be periodically boosted or the immunity wanes. And the percentage of people who’ve had boosters is extremely low. Among adolescents, according to the CDC, only a bit more than half have had DTAP boosters; among adults, less that 10% have had a booster within the last 5 years.

What’s your succeptibility if you’ve gone more than 5 years without vaccination? Somewhere 40% of people who didn’t have boosters in the last five years are succeptible.

So let’s just play with those numbers a bit. Assume, for simplicity, than 50% of the people are adults, and 50% children, and assume that all of the children are fully up-to-date on the vaccine. Then you’ve got 10% of the children (10% of 475), 10% of the adults that are up-to-date (10% of 10% of 475), and 40% of the adults that aren’t up-to-date (40% of 90% of 475) is the succeptible population. That works out to 266 succeptible people among the vaccinated, which is 85%: so you’d expect 85% of the actual cases of whooping cough to be among people who’d been vaccinated. Suddenly, the antivaxers case doesn’t look so good, does it?

Consider, for a moment, what you’d expect among a non-vaccinated population. Pertussis is highly contagious. If someone in your household has pertussis, and you’re succeptible, you’ve got a better than 90% chance of catching it. It’s that contagious. Routine exposure – not sharing a household, but going to work, to the store, etc., with people who are infected still gives you about a 50% chance of infection if you’re succeptible.

In the state of Vermont, where NaturalNews is claiming that the evidence shows that the vaccine doesn’t work, how many cases of Pertussis have they seen? Around 600, out of a state population of 600,000 – an infection rate of one tenth of one percent. 0.1 percent, from a virulently contagious disease.

That’s the highest level of Pertussis that we’ve seen in the US in a long time. But at the same time, it’s really a very low number for something so contagious. To compare for a moment: there’s been a huge outbreak of Norovirus in the UK this year. Overall, more than one million people have caught it so far this winter, out of a total population of 62 million, for a rate of about 1.6% or sixteen times the rate of infection of pertussis.

Why is the rate of infection with this virulently contagious disease so different from the rate of infection with that other virulently contagious disease? Vaccines are a big part of it.

Hensel's Lemma: Newton's Method for p-Adic numbers

This is, sadly, the last part of my posts about P-adic arithmetic. It’s not for lack of information, but rather for lack of competence on my part. The area of math that I’m worst at is analysis, and an awful lot of the interesting things that you can do with p-adic numbers is analysis.

For an explanation of what p-adic numbers are, and how arithmetic on them works, you can refer to my first post on the p-adics, and for an explanation of p-adic metrics and distance, you can look at this post.

Today, we’re going to look at Hensel’s lemma. This is really cool. Hensel’s lemma shows you a simple way of finding polynomial roots in P-adic arithmetic. It’s sometimes called Newton’s method for p-adics, which is both a good description of how it’s used (but a complete misnomer because the lemma is a description of of a property, and Newton’s method is a description of a procedure).

Hensel’s lemma says that if you’ve got a polynomial, and for a prime number p, you can find a root using modulo arithmetic using base p, then you can use the base-p root to find the corresponding root using modulo base p2, and given the root using base p2, you can find it for modulo p3, and so on!

So far, this is just a statement about modulo arithmetic – not about p-adic arithmetic. But what ends up happening is that the result modulo p gives you a first approximation of the root in p-adic. Using the process defined by Hensel’s lemma, you can take that root and extend it to a better approximation. It ends up being the case that each time you lift the result to a higher exponent of p, you’re performing the exactly analog of one step in Newton’s method of finding roots! But it’s actually better than that. In you look at Hensel’s lemma in the p-adic numbers, then each step extends the approximation by exactly one digit!

Let’s look at the lemma in formal terms:

Suppose that you have a polynomial f(x).

If there is an integer, i and a prime number p such that for f(i) = 0 in mod p, j such that f(j) = 0 in mod p^2. In general, if there’s an exponent k where f(i) = 0 in mod p^k, then there’s a j where f(j) = 0 in mod p^{k+1}.

In fact, we can do even better than that! If we know the root i for an exponent k, then we can easily compute j using a process called lifting:

j = i + (p^k)-frac{f(i)}{p^k times f

(In this, f is the first derivative of f at i.)

You can see, looking at that equation, that the derivative of f at i can’t be zero mod p^k, or else the denominator of the fraction would be zero, and everything would go kablooie in your face!

In P-adic arithemetic, this becomes absolutely amazing. If i is the root mod p^k, then the root mod p^{k+1} is:

 j = i - frac{f(i)}{f

It can’t be simpler. And you can clearly see the relation to Newton’s method.

For fun, let’s try looking at an example: let’s take the square root of 2 in 7-adic. In base-10, the square root of 2 is roughly 1.4142135624. In p-adic, it’s going to bo something completely different. We’ll start by phrasing it as a polynomial: x^2 - 2 = 0. So, what’s the solution to that mod-7? 3: in mod-7, 3*3=2.

So, our first approximation of a square root of 2 is 3.

We can extend to the next digit using Hensel’s lemma, and we get j = 3 - 7/6 =   (9-2)/6 = 7/6. We’re looking at mod 7 arithmetic here, so 6 = -1. So j = 3 + 7 = 10 = 13.

Repeated, we’d get 213, 6213, and so on.

Good Math: the Book

Ladies and Gentlemen! I’ve got a very big announcement!

mcmath At long last, I’ve written a book based on this blog. It’s being published by the Pragmatic Bookshelf. The way that the Prags work, you can pre-order the book now, and immediately download the current draft. As I update the rest of the book, you’ll get all of the updates, up to the final printed version of the book.

I’m very proud of this thing. It’s the result of a whole lot of work, and I think it should be a lot of fun for mathematically curious folks to read.

For this book, I’ve collected up a bunch of my favorite posts from the history of this blog, and revised/rewritten them in a much more polished, coherent form. The book includes:

  1. Numbers: some basic number theory, what numbers mean, how they’re constructed.
  2. Funny numbers: a collection of strange but interesting numbers: 0, e, i, …
  3. Writing Numbers: Roman numerals, egyptian fractions, continued fractions, …
  4. Logic: first order predicate logic, proofs, temporal logic, and prolog.
  5. Set theory
  6. Computation and computing machines

Aside from being a fun read, buying it will support Scientopia. From the time that Scientopia got started, I’ve been paying all of the hosting bills. That’s about $250 a month, for the last three years. Making some money from this book will make it possible for me to continue supporting it, and maybe even pay for some help with technical support!

Not to mention the fact that if this book does well, there will be more volumes down the line. In particular, I’ve got my eye on doing some stuff about the math of programming: type theory, category theory, algebraic data structures, etc.


Vortex Garbage

A reader who saw my earlier post on the Vortex math talk at a TEDx conference sent me a link to an absolutely dreadful video that features some more crackpottery about the magic of vortices.

It says:

The old heliocentric model of our solar system,
planets rotating around the sun, is not only boring,
but also incorrect.

Our solar system moves through space at 70,000km/hr.
Now picture this instead:

(Image of the sun with a rocket/comet trail propelling
it through space, with the planets swirling around it.)

The sun is like a comet, dragging the planets in its wake.
Can you say “vortex”?

The science of this is terrible. The sun is not a rocket. It does not propel itself through space. It does not have a tail. It does not leave a significant “wake”. (There is interstellar material, and the sun moving through it does perturb it, but it’s not a wake: the interstellar material is orbiting the galactic center just like the sun. Gravitational effects do cause pertubations, but it’s not like a boat moving through still water, producing a wake.) Even if you stretch the definition of “wake”, the sun certainly does not leave a wake large enough to “drag” the planets. In fact, if you actually look at the solar system, the plane the ecliptic – the plane where the planets orbit the sun – is at a roughly 60 degree angle to the galactic ecliptic. If planetary orbits were a drag effect, then you would expect the orbits to be perpendicular to the galactic ecliptic. But they aren’t.

If you look at it mathematically, it’s even worse. The video claims to be making a distinction between the “old heliocentric” model of the solar system, and their new “vortex” model. But in fact, mathematically, they’re exactly the same thing. Look at it from a heliocentric point of view, and you’ve got the heliocentric model. Look at the exact same system from point that’s not moving relative to galactic center, and you get the vortex. They’re the same thing. The only difference is how you look at it.

And that’s just the start of the rubbish. Once they get past their description of their “vortex” model, they go right into the woo. Vortex is life! Vortex is sprirituality! Oy.

If you follow their link to their website, it gets even sillier, and you can start to see just how utterly clueless the author of this actually is:

(In reference to a NASA image showing the interstellar “wind” and the heliopause)

Think about this for a minute. In this diagram it seems the Solar System travel to the left. When the Earth is also traveling to the left (for half a year) it must go faster than the Sun. Then in the second half of the year, it travels in a “relative opposite direction” so it must go slower than the Sun. Then, after completing one orbit, it must increase speed to overtake the Sun in half a year. And this would go for all the planets. Just like any point you draw on a frisbee will not have a constant speed, neither will any planet.

See, it’s a problem that the planets aren’t moving at a constant speed. They speed up and slow down! Oh, the horror! The explanation is that they’re caught by the sun’s wake! So they speed up when they get dragged, until they pass the sun (how does being dragged by the sun ever make them faster than the sun? Who knows!), and then they’re not being dragged anymore, so they slow down.

This is ignorance of physics and of the entire concept of frame of reference and symmetry that is absolutely epic.

There’s quite a bit more nonsense, but that’s all I can stomach this evening. Feel free to point out more in the comments!