# Least Square Linear Regression

There’s one topic I’ve been asked about multiple times, but which I’ve never gotten around to writing about. It happens to be one of the first math things that my dad taught me about: linear regression.

Here’s the problem: you’re doing an experiment. You’re measuring one quantity as you vary another. You’ve got a good reason to believe that there’s a linear relationship between the two quantities. But your measurements are full of noise, so when you plot the data on a graph, you get a scattershot. How can you figure out what line is the best match to your data, and how can you measure how good the match is?

When my dad taught me this, he was working for RCA manufacturing semiconductor chips for military and satellite applications. The focus of his work was building chips that would survive in the high-radiation environment of space – in the jargon, he was building radiation hard components. They’d put together a set of masks for an assembly line, and do a test run. Then they’d take the chips from that run, and they’d expose them to gamma radiation until they failed. That would give them a good way of estimating the actual radiation hardness of the run, and whether it was good enough for their customers. Based on a combination of theory and experience, they knew that the relationship they cared about was nearly linear: for a given amount of radiation, the number of specific circuitry failures was proportional to the amount of gamma exposure.

For example, here’s a graph that I generated semi-randomly of data points. The distribution of the points isn’t really what you’d get from real observations, but it’s good enough for demonstration.

The way that we’d usually approach this is called least square linear regression. The idea is that what we want do do is create a line where the square of the vertical distance between the chosen line and the measured data points is a minimum.

For the purposes of this, we’ll say that one quantity is the independent value, and we’ll call that x, and the other quantity is the dependent variable, and we’ll call that y. In theory, the dependent variable, as its name suggests depends on the independent variable. In fact, we don’t always really know which value depends on the other, so we do our best to make an intelligent guess.

So what we want to do is find a linear equation, $y = mx + b$ where the mean-square distance is minimal. All we need to do is find values for $m$ (the slope of the line) and $b$ (the point where the line crosses the $y$ axis, also called the y intercept). And, in fact, $b$ is relatively easy to compute once we know the slope of the line. So the real trick is to find the slope of the line.

The way that we do that is: first we compute the means of $x$ and $y$, which we’ll call $\overline{x}$ and $\overline{y}$. Then using those, we compute the slope as:

$m = \frac{\Sigma_{i=1}^n (x-\hat{x})(y-\hat{y})}{\Sigma_{i=1}^{n} (x-\hat{x})^2}$

Then for the y intercept: $b = \hat{y} - m\hat{x}$.

In the case of this data: I set up the script so that the slope would be about 2.2 +/- 0.5. The slop in the figure is 2.54, and the y-intercept is 18.4.

Now, we want to check how good the linear relationship is. There’s several different ways of doing that. The simplest is called the correlation coefficient, or $r$.

$r = \frac{\left(\Sigma (x-\hat{x})\right) \left(Sigma (y - \hat{y})\right)}{\sqrt{ \left(\Sigma (x-\hat{x})^2\right) \left(\Sigma (y - \hat{y})^2\right) }}$

If you look at this, it’s really a check of how well the variation between the measured values and the expected values (according to the regression) match. On the top, you’ve got a set of products; on the bottom, you’ve got the square root of the same thing squared. The bottom is, essentially, just stripping the signs away. The end result is that if the correlation is perfect – that is, if the dependent variable increases linearly with the independent, then the correlation will be 1. If the dependency variable decreases linearly in opposition to the dependent, then the correlation will be -1. If there’s no relationship, then the correlation will be 0.

For this particular set of data, I generated it with a linear equation with a little bit of random noise. The correlation coefficient is slighly greater than 0.95, which is exctly what you’d expect.

When you see people use linear regression, there are a few common errors that you’ll see all the time.

• No matter what your data set looks like, linear regression will find a line. It won’t tell you “Oops, I couldn’t find a match”. So the fact that you fit a line means absolutely nothing by itself. If you’re doing it right, you start off with a hypothesis based on prior plausibility for a linear relation, and you’re using regression as part of a process to test that hypothesis.
• You don’t get to look at the graph before you do the analysis. What I mean by that is, if you look at the data, you’ll naturally notice some patterns. Humans are pattern seekers – we’re really good at noticing them. And almost any data set that you look at carefully enough will contain some patterns purely by chance. If you look at the data, and there’s a particular pattern that you want to see, you’ll probably find a way to look at the data that produces that pattern. For example, in the first post on this blog, I was looking at a shoddy analysis by some anti-vaxxers, who were claiming that they’d found an inflection point in the rate of autism diagnoses, and used linear regression to fit two lines – one before the inflection, one after. But that wasn’t supported in the data. It was random – the data was very noisy. You could fit different lines to different sections by being selective. If you picked one time, you’d get a steeper slope before that time, and a shallower one after. But by picking different points, you could get a steeping slope after. The point is, when you’re testing the data, you need to design the tests before you’ve seen the data, in order to keep your bias out!
• A strong correlation doesn’t imply linear correlation. If you fit a line to a bunch of data that’s not really linear, you can still get a strong positive (or negative) correlation. Correlation is really testing whether the data is increasing the way you’d expect it to, not whether it’s truly linear. Random data will have a near-zero correlation. Data where the dependent variable doesn’t vary consistently with the independent will have near-zero correlation. But there are plenty of ways of getting data where the dependent and independent variables increase together that produce a strong correlation. You need to do other things to judge the strength of the fit. (I might do some more posts on this kind of thing to cover some of that.)

# Not Really Free Energy from Silicon and Water

A reader sent me a link to an article about a new technique for producing hydrogen. His original request was for me to debunk it, because it looked like another free energy scam. But it isn’t, so I’m going to attempt to explain why.

BUFFALO, N.Y. — Super-small particles of silicon react with water to produce hydrogen almost instantaneously, according to University at Buffalo researchers.

In a series of experiments, the scientists created spherical silicon particles about 10 nanometers in diameter. When combined with water, these particles reacted to form silicic acid (a nontoxic byproduct) and hydrogen — a potential source of energy for fuel cells.

The reaction didn’t require any light, heat or electricity, and also created hydrogen about 150 times faster than similar reactions using silicon particles 100 nanometers wide, and 1,000 times faster than bulk silicon, according to the study.

Producing hydrogen from water, with no light, heat, or electricity! I can’t blame you if you read that, and think that it sounds like a free-energy scam. We’re talking about producing hydrogen from water without using any energy!

But here’s the catch: it’s a chemical reaction. It consumes the silicon, producing silicic acid.

There are all sorts of chemical reactions that can release energy. That’s why we like gasoline so much: it’s a chemical that reacts easily with oxygen, and releases lots of energy when it does. There’s no cheating going on there. As thermodynamics predicts, everything trends towards the lowest energy state – if you look at the thermodynamic system containing a reaction that releases energy, it properly ends up in a lower energy state.

In the case of this silicon bead technique, all that we’re seeing is a typical entropy increasing chemical reaction.

If the silicon wasn’t consumed by the reaction, but we were still able to break water molecules producing free hydrogen, which we could then burn to produce energy, that would be a free energy system: you could take water, dump in the silicon, produce hydrogen, burn the hydrogen, releasing energy and producing water, and then take the resulting water, and dump it back into the silicon mixture. But that’s not what’s going on here. Instead, the silicon is getting consumed, and if you wanted to re-use it, you would need to use energy to extract it from the silicic acid – which would be a typical lossy thermodynamic process.

# Recipe: Kimchi-Marinated Steak Sandwich

Last weekend, I was watching cooking shows on PBS. One of my favorite chefs, Ming Tsai, came on. Along with a guest, he ad-libbed a dish that looked absolutely phenomenal: a kimchi marinated cheesesteak sandwich.

I didn’t write down the recipe; I just took the idea, and ran with it. I mean come on, what could be better than a well-seared medium-rare flank steak with kimchi and Tallegio? My version is, I think, a bit lighter than what they made, but it’s not exactly a light snack!

For my tastes, this is an amazing combination. Start with flank steak, which is one of my favorite cuts of beef. It’s not the tenderest piece of beef, so it needs some work, but it’s got an absolutely fantastic flavor, and as long as you prepare it properly, don’t overcook it, and slice it thin, it comes out great.

Then you’ve got kimchi. I only discovered kimchi fairly recently, and I absolutely love the stuff, and I’m always looking for more things to eat it with.

Finally, tallegio cheese. I adore soft cheeses, and tallegio is probably the nicest soft cheese that I’ve had from Italy. It’s very creamy, with a nice funkiness to it, but not an overpowering stinkiness.

Put all three of those together in a sandwich, and I’m in heaven.

Ingredients

• One largish (1 1/2 pound) flank steak.
• 1/2 cup kimchi, finely minced, plus 1 tablespoon of
the liquid from the kimchi.
• Some more kimchi, left whole.
• salt and pepper
• one teaspoon sugar.
• one half of a large onion, cut into thin slices.
• 1/2 pound tallegio cheese.
• Bagette.

One note about the ingredients: I recently made homemade kimchi for the first time. It’s really easy to make, and it is really a totally different thing from the stuff out of a jar. I don’t see why it’s so different: kimchi is a fermented preserved cabbage, which is exactly the kind of thing that should be really easy to make in quantity, and then sell in jars. But the fact is, nothing from the store remotely compares to homemade. I basically used this recipe, except that I couldn’t get any daikon, so I left that out.

Instructions

1. Take a couple of forks, and go up and down the steak, poking holes. Don’t be lazy: the more you do this, the better. You’re tenderizing the meat, and making a way for the marinade to help penetrate.
2. Salt the meat generously, then coat both sides with the minced kimchi. Refrigerate overnight.
3. Scrape the kimchi off the steak, but do not throw it away. We’re going to cook it, and use it as a topping.
4. Heat a cast-iron pan on high heat until it’s smoking hot.
5. Sprinkle half the sugar and some more salt and pepper onto the top of the steak, and then put the sugared side down in the pan, for about 3 minutes. While it’s cooking on that side, put the rest of the sugar and salt onto the other side. Then turn it over, and cook the other side for 3 minutes.
6. Transfer the steak into a 350 degree oven for 8-10 minutes. (I did eight; it was exactly the way I like it, nice and medium rare, but I think a lot of people would prefer it a bit more well done. Take it out after the ten minutes, and leave it alone for a while. Don’t slice it immediately!
7. While the steak is resting, heat up a pan, and add some oil. Toss in the onions with a bit of salt, and stir them around. When they just start to turn brown, add in the kimchi and any liquid left from the marinade, and turn down the heat. Cook it for about five minutes, adding a bit of water if it gets too dry.
8. Slice the break in half, and spread a thin layer of the taleggio cheese over both halves.
9. Slice the steak on a bias against the grain into thin slices.
10. Take any drippings from slicing the steak, and pour them into the onions and kimchi.
11. Put a heap of steak slices onto the bread, and then put a pile of the cooked onions and kimchi, and then a piece of fresh uncooked kimchi on top.
12. Close up the sandwich, and eat!

# Back to an old topic: Bad Vaccine Math

The very first Good Math/Bad Math post ever was about an idiotic bit of antivaccine rubbish. I haven’t dealt with antivaccine stuff much since then, because the bulk of the antivaccine idiocy has nothing to do with math. But the other day, a reader sent me a really interesting link from what my friend Orac calls a “wretched hive of scum and quackery”, naturalnews.com, in which they try to argue that the whooping cough vaccine is an epic failure:

(NaturalNews) The utter failure of the whooping cough (pertussis) vaccine to provide any real protection against disease is once again on display for the world to see, as yet another major outbreak of the condition has spread primarily throughout the vaccinated community. As it turns out, 90 percent of those affected by an ongoing whooping cough epidemic that was officially declared in the state of Vermont on December 13, 2012, were vaccinated against the condition — and some of these were vaccinated two or more times in accordance with official government recommendations.

As reported by the Burlington Free Press, at least 522 cases of whooping cough were confirmed by Vermont authorities last month, which was about 10 times the normal amount from previous years. Since that time, nearly 100 more cases have been confirmed, bringing the official total as of January 15, 2013, to 612 cases. The majority of those affected, according to Vermont state epidemiologist Patsy Kelso, are in the 10-14-year-old age group, and 90 percent of those confirmed have already been vaccinated one or more times for pertussis.

Even so, Kelso and others are still urging both adults and children to get a free pertussis shot at one of the free clinics set up throughout the state, insisting that both the vaccine and the Tdap booster for adults “are 80 to 90 percent effective.” Clearly this is not the case, as evidenced by the fact that those most affected in the outbreak have already been vaccinated, but officials are apparently hoping that the public is too naive or disengaged to notice this glaring disparity between what is being said and what is actually occurring.

It continues in that vein. The gist of the argument is:

1. We say everyone needs to be vaccinated, which will protect them from getting the whooping cough.
2. The whooping cough vaccine is, allagedly, 80 to 90% effective.
3. 90% of the people who caught whooping cough were properly vaccinated.
4. Therefore the vaccine can’t possibly work.

What they want you to do is look at that 80 to 90 percent effective rate, and see that only 10-20% of vaccinated people should be succeptible to the whooping cough, and compare that 10-20% to the 90% of actual infected people that were vaccinated. 20% (the upper bound of the succeptible portion of vaccinated people according to the quoted statistic) is clearly much smaller than 90% – therefore it’s obvious that the vaccine doesn’t work.

Of course, this is rubbish. It’s a classic apple to orange-grove comparison. You’re comparing percentages, when those percentages are measuring different groups – groups with wildly difference sizes.

Take a pool of 1000 people, and suppose that 95% are properly vaccinated (the current DTAP vaccination rate in the US is around 95%). That gives you 950 vaccinated people and 50 unvaccinated people who are unvaccinated.

In the vaccinated pool, let’s assume that the vaccine was fully effective on 90% of them (that’s the highest estimate of effectiveness, which will result in the lowest number of succeptible vaccinated – aka the best possible scenario for the anti-vaxers). That gives us 95 vaccinated people who are succeptible to the whooping cough.

There’s the root of the problem. Using numbers that are ridiculously friendly to the anti-vaxers, we’ve still got a population of twice as many succeptible vaccinated people as unvaccinated. so we’d expect, right out of the box, that better than 2/3rds of the cases of whooping cough would be among the vaccinated people.

In reality, the numbers are much worse for the antivax case. The percentage of people who were ever vaccinated is around 95%, because you need the vaccination to go to school. But that’s just the childhood dose. DTAP is a vaccination that needs to be periodically boosted or the immunity wanes. And the percentage of people who’ve had boosters is extremely low. Among adolescents, according to the CDC, only a bit more than half have had DTAP boosters; among adults, less that 10% have had a booster within the last 5 years.

What’s your succeptibility if you’ve gone more than 5 years without vaccination? Somewhere 40% of people who didn’t have boosters in the last five years are succeptible.

So let’s just play with those numbers a bit. Assume, for simplicity, than 50% of the people are adults, and 50% children, and assume that all of the children are fully up-to-date on the vaccine. Then you’ve got 10% of the children (10% of 475), 10% of the adults that are up-to-date (10% of 10% of 475), and 40% of the adults that aren’t up-to-date (40% of 90% of 475) is the succeptible population. That works out to 266 succeptible people among the vaccinated, which is 85%: so you’d expect 85% of the actual cases of whooping cough to be among people who’d been vaccinated. Suddenly, the antivaxers case doesn’t look so good, does it?

Consider, for a moment, what you’d expect among a non-vaccinated population. Pertussis is highly contagious. If someone in your household has pertussis, and you’re succeptible, you’ve got a better than 90% chance of catching it. It’s that contagious. Routine exposure – not sharing a household, but going to work, to the store, etc., with people who are infected still gives you about a 50% chance of infection if you’re succeptible.

In the state of Vermont, where NaturalNews is claiming that the evidence shows that the vaccine doesn’t work, how many cases of Pertussis have they seen? Around 600, out of a state population of 600,000 – an infection rate of one tenth of one percent. 0.1 percent, from a virulently contagious disease.

That’s the highest level of Pertussis that we’ve seen in the US in a long time. But at the same time, it’s really a very low number for something so contagious. To compare for a moment: there’s been a huge outbreak of Norovirus in the UK this year. Overall, more than one million people have caught it so far this winter, out of a total population of 62 million, for a rate of about 1.6% or sixteen times the rate of infection of pertussis.

Why is the rate of infection with this virulently contagious disease so different from the rate of infection with that other virulently contagious disease? Vaccines are a big part of it.

# Hensel's Lemma: Newton's Method for p-Adic numbers

This is, sadly, the last part of my posts about P-adic arithmetic. It’s not for lack of information, but rather for lack of competence on my part. The area of math that I’m worst at is analysis, and an awful lot of the interesting things that you can do with p-adic numbers is analysis.

For an explanation of what p-adic numbers are, and how arithmetic on them works, you can refer to my first post on the p-adics, and for an explanation of p-adic metrics and distance, you can look at this post.

Today, we’re going to look at Hensel’s lemma. This is really cool. Hensel’s lemma shows you a simple way of finding polynomial roots in P-adic arithmetic. It’s sometimes called Newton’s method for p-adics, which is both a good description of how it’s used (but a complete misnomer because the lemma is a description of of a property, and Newton’s method is a description of a procedure).

Hensel’s lemma says that if you’ve got a polynomial, and for a prime number p, you can find a root using modulo arithmetic using base p, then you can use the base-p root to find the corresponding root using modulo base p2, and given the root using base p2, you can find it for modulo p3, and so on!

So far, this is just a statement about modulo arithmetic – not about p-adic arithmetic. But what ends up happening is that the result modulo p gives you a first approximation of the root in p-adic. Using the process defined by Hensel’s lemma, you can take that root and extend it to a better approximation. It ends up being the case that each time you lift the result to a higher exponent of $p$, you’re performing the exactly analog of one step in Newton’s method of finding roots! But it’s actually better than that. In you look at Hensel’s lemma in the p-adic numbers, then each step extends the approximation by exactly one digit!

Let’s look at the lemma in formal terms:

Suppose that you have a polynomial $f(x)$.

If there is an integer, $i$ and a prime number $p$ such that for $f(i) = 0$ in mod $p$, $j$ such that $f(j) = 0$ in mod $p^2$. In general, if there’s an exponent $k$ where $f(i) = 0$ in mod $p^k$, then there’s a $j$ where $f(j) = 0$ in mod $p^{k+1}$.

In fact, we can do even better than that! If we know the root $i$ for an exponent $k$, then we can easily compute $j$ using a process called lifting:

(In this, is the first derivative of $f$ at $i$.)

You can see, looking at that equation, that the derivative of $f$ at $i$ can’t be zero mod $p^k$, or else the denominator of the fraction would be zero, and everything would go kablooie in your face!

In P-adic arithemetic, this becomes absolutely amazing. If $i$ is the root mod $p^k$, then the root mod $p^{k+1}$ is:

It can’t be simpler. And you can clearly see the relation to Newton’s method.

For fun, let’s try looking at an example: let’s take the square root of 2 in 7-adic. In base-10, the square root of 2 is roughly 1.4142135624. In p-adic, it’s going to bo something completely different. We’ll start by phrasing it as a polynomial: $x^2 - 2 = 0$. So, what’s the solution to that mod-7? 3: in mod-7, 3*3=2.

So, our first approximation of a square root of 2 is 3.

We can extend to the next digit using Hensel’s lemma, and we get $j = 3 - 7/6 = (9-2)/6 = 7/6$. We’re looking at mod 7 arithmetic here, so 6 = -1. So $j = 3 + 7 = 10 = 13$.

Repeated, we’d get 213, 6213, and so on.

# Good Math: the Book

Ladies and Gentlemen! I’ve got a very big announcement!

At long last, I’ve written a book based on this blog. It’s being published by the Pragmatic Bookshelf. The way that the Prags work, you can pre-order the book now, and immediately download the current draft. As I update the rest of the book, you’ll get all of the updates, up to the final printed version of the book.

I’m very proud of this thing. It’s the result of a whole lot of work, and I think it should be a lot of fun for mathematically curious folks to read.

For this book, I’ve collected up a bunch of my favorite posts from the history of this blog, and revised/rewritten them in a much more polished, coherent form. The book includes:

1. Numbers: some basic number theory, what numbers mean, how they’re constructed.
2. Funny numbers: a collection of strange but interesting numbers: 0, e, i, …
3. Writing Numbers: Roman numerals, egyptian fractions, continued fractions, …
4. Logic: first order predicate logic, proofs, temporal logic, and prolog.
5. Set theory
6. Computation and computing machines

Aside from being a fun read, buying it will support Scientopia. From the time that Scientopia got started, I’ve been paying all of the hosting bills. That’s about \$250 a month, for the last three years. Making some money from this book will make it possible for me to continue supporting it, and maybe even pay for some help with technical support!

Not to mention the fact that if this book does well, there will be more volumes down the line. In particular, I’ve got my eye on doing some stuff about the math of programming: type theory, category theory, algebraic data structures, etc.

Enjoy!

# Vortex Garbage

A reader who saw my earlier post on the Vortex math talk at a TEDx conference sent me a link to an absolutely dreadful video that features some more crackpottery about the magic of vortices.

It says:

The old heliocentric model of our solar system,
planets rotating around the sun, is not only boring,
but also incorrect.

Our solar system moves through space at 70,000km/hr.

(Image of the sun with a rocket/comet trail propelling
it through space, with the planets swirling around it.)

The sun is like a comet, dragging the planets in its wake.
Can you say “vortex”?

The science of this is terrible. The sun is not a rocket. It does not propel itself through space. It does not have a tail. It does not leave a significant “wake”. (There is interstellar material, and the sun moving through it does perturb it, but it’s not a wake: the interstellar material is orbiting the galactic center just like the sun. Gravitational effects do cause pertubations, but it’s not like a boat moving through still water, producing a wake.) Even if you stretch the definition of “wake”, the sun certainly does not leave a wake large enough to “drag” the planets. In fact, if you actually look at the solar system, the plane the ecliptic – the plane where the planets orbit the sun – is at a roughly 60 degree angle to the galactic ecliptic. If planetary orbits were a drag effect, then you would expect the orbits to be perpendicular to the galactic ecliptic. But they aren’t.

If you look at it mathematically, it’s even worse. The video claims to be making a distinction between the “old heliocentric” model of the solar system, and their new “vortex” model. But in fact, mathematically, they’re exactly the same thing. Look at it from a heliocentric point of view, and you’ve got the heliocentric model. Look at the exact same system from point that’s not moving relative to galactic center, and you get the vortex. They’re the same thing. The only difference is how you look at it.

And that’s just the start of the rubbish. Once they get past their description of their “vortex” model, they go right into the woo. Vortex is life! Vortex is sprirituality! Oy.

If you follow their link to their website, it gets even sillier, and you can start to see just how utterly clueless the author of this actually is:

(In reference to a NASA image showing the interstellar “wind” and the heliopause)

Think about this for a minute. In this diagram it seems the Solar System travel to the left. When the Earth is also traveling to the left (for half a year) it must go faster than the Sun. Then in the second half of the year, it travels in a “relative opposite direction” so it must go slower than the Sun. Then, after completing one orbit, it must increase speed to overtake the Sun in half a year. And this would go for all the planets. Just like any point you draw on a frisbee will not have a constant speed, neither will any planet.

See, it’s a problem that the planets aren’t moving at a constant speed. They speed up and slow down! Oh, the horror! The explanation is that they’re caught by the sun’s wake! So they speed up when they get dragged, until they pass the sun (how does being dragged by the sun ever make them faster than the sun? Who knows!), and then they’re not being dragged anymore, so they slow down.

This is ignorance of physics and of the entire concept of frame of reference and symmetry that is absolutely epic.

There’s quite a bit more nonsense, but that’s all I can stomach this evening. Feel free to point out more in the comments!

# Types Gone Wild! SKI at Compile-Time

Over the weekend, a couple of my Foursquare coworkers and I were chatting on twitter, and one of my smartest coworkers, a great guy named Jorge Ortiz, pointed out that type inference in Scala (the language we use at Foursquare, and also pretty much my favorite language) is Turing complete.

Somehow, I hadn’t seen this before, and it absolutely blew my mind. So I asked Jorge for a link to the proof. The link he sent me is a really beautiful blog post. It doesn’t just prove that Scala type inference is Turing complete, but it does it in a remarkably beautiful way.

Before I get to the proof, what does this mean?

A system is Turing complete when it can perform any possible computation that could be performed on any other computing device. The Turing machine is, obviously, Turing complete. So is lambda calculus, the Minsky machine, the Brainfuck computing model, and the Scala programming language itself.

If type inference is Turing complete, then that means that you can write a Scala program where, in order to type-check the program, the compiler has to run an arbitrary program to completion. It means that there are, at least theoretically, Scala programs where the compiler will take forever – literally forever – to determine whether or not a given program contains a type error. Needless to say, I consider this to be a bad thing. Personally, I’d really prefer to see the type system be less flexible. In fact, I’d go so far as to say that this is a fundamental error in the design of Scala, and a big strike against it as a language. Having a type-checking system which isn’t guaranteed to terminate is bad.

But let’s put that aside: Scala is pretty entrenched in the community that uses it, and they’ve accepted this as a tradeoff. How did the blog author, Michael Dürig, prove that Scala type checking is Turing complete? By showing how to implement a variant of lambda calculus called SKI combinator calculus entirely with types.

SKI calculus is seriously cool. We know that lambda calculus is Turing complete. It turns out that for any lambda calculus expression, there’s a way rewriting it without any variables, and without any lambdas at all, using three canonical master functions. If you’ve got those three, then you can write anything, anything at all. The three are called S, K, and I.

• The S combinator is: $S x y z = x z (y z)$.
• The K combinator is: $K x y = x$.
• The I combinator is: $I x = x$.

They come from intuitionistic logic, where they’re fundamental axioms that describe how intuitionistic implication works. K is the rule $A Rightarrow (B Rightarrow A)$; S is the rule $(A Rightarrow (B Rightarrow C)) Rightarrow ((A Rightarrow B) Rightarrow C)$; and I is $(A Rightarrow A)$.

Given any lambda calculus expression, you can rewrite it as a chain of SKIs. (If you’re interested in seeing how, please just ask in the comments; if enough people are interested, I’ll write it up.) What the author of the post id is show how to implement the S, K, and I combinators in Scala types.

trait Term {
type ap[x <: Term] <: Term
type eval <: Term
}

He’s created a type Term, which is the supertype of any computable fragment written in this type-SKI. Since everything is a function, all terms have to have two methods: one of them is a one-parameter “function” which applies the term to a parameter, and the second is a “function” which simplifies the term into canonical form.

He implements the S, K, and I combinators as traits that extend Term. We’ll start with the simplest one, the I combinator.

trait I extends Term {
type ap[x <: Term] = I1[x]
type eval = I
}

trait I1[x <: Term] extends Term {
type ap[y <: Term] = eval#ap[y]
type eval = x#eval
}

I needs to take a parameter, so its apply type-function takes a parameter x, and returns a new type I1[x] which has the parameter encoded into it. Evaluating I1[x] does exactly what you’d want from the I combinator with its parameter – it returns it.

The apply “method” of I1 looks strange. What you have to remember is that in lambda calculus (and in the SKI combinator calculus), everything is a function – so even after evaluating I.ap[x] to some other type, it’s still a type function. So it still needs to be applicable. Applying it is exactly the same thing as applying its parameter.

So if have any type A, if you write something like var a : I.ap[A].eval, the type of a will evaluate to A. If you apply I.ap[A].ap[Z], it’s equivalent to taking the result of evaluating I.ap[A], giving you A, and then applying that to Z.

The K combinator is much more interesting:

// The K combinator
trait K extends Term {
type ap[x <: Term] = K1[x]
type eval = K
}

trait K1[x <: Term] extends Term {
type ap[y <: Term] = K2[x, y]
type eval = K1[x]
}

trait K2[x <: Term, y <: Term] extends Term {
type ap[z <: Term] = eval#ap[z]
type eval = x#eval
}

It's written in curried form, so it's a type trait K, which returns a type trait K1, which takes a parameter and returns a type trait K2.

The implementation is a whole lot trickier, but it's the same basic mechanics. Applying K.ap[X] gives you K1[X]. Applying that to Y with K1[X].ap[Y] gives you K2[K, Y]. Evaluating that gives you X.

The S combinator is more of the same.

// The S combinator
trait S extends Term {
type ap[x <: Term] = S1[x]
type eval = S
}

trait S1[x <: Term] extends Term {
type ap[y <: Term] = S2[x, y]
type eval = S1[x]
}

trait S2[x <: Term, y <: Term] extends Term {
type ap[z <: Term] = S3[x, y, z]
type eval = S2[x, y]
}

trait S3[x <: Term, y <: Term, z <: Term] extends Term {
type ap[v <: Term] = eval#ap[v]
type eval = x#ap[z]#ap[y#ap[z]]#eval
}

Michid then goes on to show examples of how to use these beasts. He implements equality testing, and then shows how to test if different type-expressions evaluate to the same thing. And all of this happens at compile time. If the equality test fails, then it's a type error at compile time!

It's a brilliant little proof. Even if you can't read Scala syntax, and you don't really understand Scala type inference, as long as you know SKI, you can look at the equality comparisons, and see how it works in SKI. It's really beautiful.

# Static Typing: Give me a break!

I’m a software engineer. I write code for a living. I’m also a programming language junkie. I love programming languages. I’m obsessed with programming languages. I’ve taught myself more programming languages than any sane person has any reason to know.

Learning that many languages, I’ve developed some pretty strong opinions about what makes a good language, and what kind of things I really want to see in the languages that I use.

My number one preference: strong static typing. That’s part of a more general preference, for preserving information. When I’m programming, I know what kind of thing I expect in a parameter, and I know what I expect to return. When I’m programming in a weakly typed language, I find that I’m constantly throwing information away, because I can’t actually say what I know about the program. I can’t put my knowledge about the expected behavior into the code. I don’t think that that’s a good thing.

But… (you knew there was a but coming, didn’t you?)

This is my preference. I believe that it’s right, but I also believe that reasonable people can disagree. Just because you don’t think the same way that I do doesn’t mean that you’re an idiot. It’s entirely possible for someone to know as much as I do about programming languages and have a different opinion. We’re talking about preferences.

Sadly, that kind of attitude is something that is entirely too uncommon. I seriously wonder somethings if I’m crazy, because it seems like everywhere I look, on every topic, no matter how trivial, most people absolutely reject the idea that it’s possible for an intelligent, knowledgeable person to disagree with them. It doesn’t matter what the subject is: politics, religion, music, or programming languages.

What brought this little rant on is that someone sent me a link to a comic, called “Cartesian Closed Comic”. It’s a programming language geek comic. But what bugs me is this comic. Which seems to be utterly typical of the kind of attitude that I’m griping about.

See, this is a pseudo-humorous way of saying “Everyone who disagrees with me is an idiot”. It’s not that reasonable people can disagree. It’s that people who disagree with me only disagree because they’re ignorant. If you like static typing, you probably know type theory. If you don’t like static typing, that there’s almost no chance that you know anything about type theory. So the reason that those stupid dynamic typing people don’t agree with people like me is because they just don’t know as much as I do. And the reason that arguments with them don’t go anywhere isn’t because we have different subjective preferences: it’s because they’re just too ignorant to understand why I’m right and they’re wrong.

Most programmers – whether they prefer static typing or not – don’t know type theory. Most of the arguments about whether to use static or dynamic typing aren’t based on type theory. It’s just the same old snobbery, the “you can’t disagree with me unless you’re an idiot”.

Among intelligent skilled engineers, the static versus dynamic typing thing really comes down to a simple, subjective argument:

Static typing proponents believe that expressing intentions in a static checkable form is worth the additional effort of making all of the code type-correct.

Dynamic typing proponents believe that it’s not: that strong typing creates an additional hoop that the programmer needs to jump through in order to get a working system.

Who’s right? In fact, I don’t think that either side is universally right. Building a real working system is a complex thing. There’s a complex interplay of design, implementation, and testing. What static typing really does is take some amount of stuff that could be checked with testing, and allows the compiler the check it in an abstract way, instead of with specific tests.

Is it easier to write code with type declarations, or with additional tests? Depends on the engineers and the system that they’re building.

# Friday Random Ten

I haven’t done an FRT in a while. I’m really trying to get back to posting more regularly, and I’ve always enjoyed talking a bit about the crazy stuff I love to listen to.

1. Marillion, “If my Heart were a Ball”: Very appropriate that in my first FRT in a while, Marillion would come up first. There’s just something about these guys, I can’t get enough of them. This is the live version of this song, from the “Live at High Voltage 2010” recording. I love this song on the original recording, but the version they played here is even better.
2. Gordian Knot, “Shaman’s Whisper”: Wow, but I haven’t listened to this in a while. Gordian Knot is what instrumental prog rock should be. It’s got complexity and skill, but it’s also got soul. This is music, not wankery.
3. Spock’s Beard, “The 39th Street Blues”: not a bad track, but this album always leaves a bad taste in my mouth. It’s heavy-handed preachy stuff, written by Neil Morse right after his turn to born again christianity.
4. Parallel or 90 Degrees, “Threesome” : Interesting track from Andy Tillison and Po90. I really love the album that this came from, but this is my least favorite track. It’s got a very heavy electric fuzz effect which always gives me a headache.
5. Squackett, “Stormchaser”: Here’s something really super. Steve Hackett (of old Genesis) and Chris Squire (of Yes) got together and recorded an album. Amazingly, it really does sound like a blend of Hackett-era Genesis mixed with early Yes. I really like this.
6. Sonic Youth, “Stil” : I’ve adored Sonic Youth from the first time I heard them. I’ve never been a big fan of much of anything punk-related, except for these guys. I particularly like it when they’re screwing around experimenting with free improvs. This piece comes from one of their collections of experiments, called “SYR2”. Not something that mainstream rock fans will enjoy, but I think it’s brilliant.
7. Takenobu, “Dark in the City” : This is something that a friend of mine from work turned me on to. I don’t really know how to describe it. Sort-of mellow alt-rock with layered string accompaniment? I don’t know, but it’s so beautiful.
8. Steven Wilson, “Significant Other” : I learned about Steven Wilson because of Porcupine Tree. But I’ve learned to basically buy everything that he does. From prog-rock to psychedelia to experimental jazz, from original music to remixes of classic old albums, everything that he does is pure gold. This is from Insurgentes, an album that I just can’t stop listening to. You can definitely hear that it’s the same guy who led Porcupine Tree, but it’s got a really unique dark sound to it.
9. vonFrickle, “Cranium Controller” : Remember what I said about Gordian Knot? vonFrickle is an instrumental prog band that frequently strays into the wankery area. A lot of their stuff is fantastic, but it’s often complex showoff because they want to show off how they can do this awesomely complex stuff, not because the awesomely complex stuff actually sounds good. This track is one of those that feels like complexity for the sake of showing off.
10. Steve Hogarth and Steven Wilson, “A Cat with Seven Souls” : What do you get when you take the lead singer of my favorite band, and blend his style with one of my favorite all-around musicians? It’s what you’d expect, which is pretty amazing.