A failed attempt to prove P == NP

I wasn’t originally going to write about this, but people keep sending it to me asking for comments.

In computer science, we have one really gigantic open question about complexity. In the lingo, we ask “Does P == NP?”. (I’ll explain what that means below.) On March 9th, a guy named Michael LaPlante posted a paper to ArXiv that purports to prove, once and for all, that P == NP. If this were the case, if Mr. LaPlante (I’m assuming Mr.; if someone knows differently, ie. that it should be Doctor, or Miss, please let me know!) had in fact proved that P==NP, it would be one of the most amazing events in computer science history. And it wouldn’t only be a theoretical triumph – it would have real, significant practical results! I can’t think of any mathematical proof that would be more exciting to me: I really, really wish that this would happen. But Mr. LaPlante’s proof is, sadly, wrong. Trivially wrong, in fact.

In order to understand what all of this means, why it matters, and where he went wrong, we need to take a step back, and briefly look at computational complexity, what P and NP mean, and what are the implications of P == NP? (Some parts of the discussion that follows are re-edited versions of sections of a very old post from 2007.)

Before we can get to the meat of this, which is talking about P versus NP, we need to talk about computational complexity. P and NP are complexity classes of problems – that is, groups of problems that have similar bounds on their performance.

When we look at a computation, one of the things we want to know is: “How long will this take?”. A specific concrete answer to that depends on all sorts of factors – the speed of your computer, the particular programming language you use to run the program, etc. But independent of those, there’s a basic factor that describes something important about how long a computation will take – the algorithm itself fundamental requires some minimum number of operations. Computational complexity is an abstract method of describing how many operations a computation will take, expressed in terms of the size or magnitude of the input.

For example: let’s take a look at insertion sort. Here’s some pseudocode for insertion sort.

def insertion_sort(lst):
  result = []
  for i in lst:
    for j in result:
      if i < j:
        insert i into result before j
      if i wasn't inserted, add it to the end of result
   return result

This is, perhaps, the simplest sorting algorithm to understand - most of us figured it out on our own in school, when we had an assignment to alphebetize a list of words. You take the elements of the list to be sorted one at a time; then you figure out where in the list they belong, and insert them.

In the worst possible case, how long does this take?

  1. Inserting the first element requires 0 comparisons: just stick it into the list.
  2. Inserting the second element takes exactly one comparison: it needs to be compared to the one element in the result list, to determine whether it goes before or after it.
  3. Inserting the third element could take either one or two comparisons. (If it's smaller than the first element of the result list, then it can be inserted in front without any more comparisons; otherwise, it needs to be compared against the second element of the result list. So in the worst case, it takes 2 comparisons.
  4. In general, for the Nth element of the list, it will take at most n-1 comparisons.

So, in the worst case, it's going to take 0 + 1 + 2 + ... + n-1 comparisons to produce a sorted list of N elements. There's a nice shorthand for computing that series: \frac{(n-1)(n-2)}{2}, which simplifies to \frac{n^2 -3n + 2}{2}, which is O(n2).

So while we can't say "computing a list of 100 elements will take 2.3 seconds" (because that depends on a ton of factors - the specific implementation of the code, the programming language, the machine it's running on, etc.), we can say that the time it takes to run increase roughly proportionally to the square of the size of the input - which is what it means when we say that insertion sort is O(n2).

That's the complexity of the insert sort algorithm. When we talk about complexity, we can talk about two different kinds of complexity: the complexity of an algorithm, and the complexity of a problem. The complexity of an algorithm is a measure of how many steps the algorithm takes to execute on an input of a particular size. It's specific to the algorithm, that is, the specific method used to solve the the problem. The complexity of the problem is a bound that bounds the best case of the complexity of any possible algorithm that can solve that problem.

For example, when you look at sort, you can say that there's a minimum number of steps that's needed to compute the correct sorted order of the list. In fact, you can prove that to sort a list of elements, you absolutely require n lg n bits of information: there's no possible way to be sure you have the list in sorted order with less information that that. If you're using an algorithm that puts things into sorted order by comparing values, that means that you absolutely must do O(n lg n) comparisons, because each comparison gives you one bit of information. That means that sorting is an O(n log n) problem. We don't need to know which algorithm you're thinking about - it doesn't matter. There is no possible comparison-based sorting algorithm that takes less than O(n \log n) steps. (It's worth noting that there's some weasel-words in there: there are some theoretical algorithms that can sort in less than O(n lg n), but they do it by using algorithms that aren't based on binary comparisons that yield one bit of information.)

We like to describe problems by their complexity in that way when we can. But it's very difficult. We're very good at finding upper bounds: that is, we can in general come up with ways of saying "the execution time will be less than O(something)", but we are very bad at finding ways to prove that "the minimum amount of time needed to solve this problem is O(something)". That distinction, between the upper bound (maximum time needed to solve a problem), and lower bound (minimum time needed to solve a problem) is the basic root of the P == NP question.

When we're talking about the complexity of problems, we can categorize them into complexity classes. There are problems that are O(1), which means that they're constant time, independent of the size of the input. There are linear time problems, which can be solved in time proportional to the size of the input. More broadly, there are two basic categories that we care about: P and NP.

P is the collection of problems that can be solved in polynomial time. That means that in the big-O notation for the complexity, the expression inside the parens is a polynomial: the exponents are all fixed values. Speaking very roughly, the problems in P are the problems that we can at least hope to solve with a program running on a real computer.

NP is the collection of problems that can be solved in non-deterministic polynomial time. We'll just gloss over the "non-deterministic" part, and say that for a problem in NP, we don't know of a polynomial time algorithm for producing a solution, but given a solution, we can check if it's correct in polynomial time. For problems in NP, the best solutions we know of have worst-case bounds that are exponential - that is, the expression inside of the parens of the O(...) has an exponent containing the size of the problem.

NP problems are things that we can't solve perfectly with a real computer. The real solutions take an amount of time that's exponential in the size of their inputs. Tripling the size of the problem increases its execution time by a factor of 27; quadrupling the input size increases execution time by at least a factor of 256; increasing the input by a factor of 10 increases execution time by at least a factor of 10,000,000,000. For NP problems, we're currently stuck using heuristics - shortcuts that will quickly produce a good guess at the real solution, but which will sometimes be wrong.

NP problems are, sadly, very common in the real world. For one example, there's a classic problem called the travelling salesman. Suppose you've got a door-to-door vacuum cleaner salesman. His territory has 15 cities. You want to find the best route from his house to those 15 cities, and back to his house. Finding that solution isn't just important from a theoretical point of view: the time that the salesman spends driving has a real-world cost! We don't know how to quickly produce the ideal path.

The big problem with NP is that we don't know lower bounds for anything in it. That means that while we know of slow algorithms for finding the solution to problems in NP, we don't know if those algorithms are actually the best. It's possible that there's a fast solution - a solution in polynomial time which will give the correct answer. Many people who study computational complexity believe that if you can check a solution in polynomial time, then computing a solution should also be polynomial time with a higher-order polynomial. (That is, they believe that there should be some sort of bound like "the time to find a solution is no more than the cube of the time to check a solution".) But so far, no one has been able to actually prove a relationship like that.

When you look at NP problems, some of them have a special, amazing property called NP completeness. If you could come up with a polynomial time solution for any single NP-complete problem, then you'd also discover exactly how to come up with a polynomial time solution for every other problem in NP..

In Mr. LaPlante's paper, he claims to have implemented a polynomial time solution to a problem called the maximum clique problem. Maximum clique is NP complete - so if you could find a P-time solution to it, you'd have proven that P == NP, and that there are polynomial time solutions to all NP problems.

The problem that Mr. LaPlante looked at is the maximal clique problem:

  • Given:
    1. a set of V atomic objects called vertices;
    2. a set of E of objects called edges, where each edge is an unordered pair (x, y), where x and y are vertices.
  • Find:
    • The largest set of vertices C=\{v_1, ..., v_n\} where for any v_i, there is an edge between v_i to every other vertex in C.

Less formally: given a bunch of dots, where some of the dots are connected by lines, find the largest set of dots where every dot in the set is connected to every other dot in the set.

The author claims to have come up with a simple P-time solution to that.

The catch? He's wrong. His solution isn't P-time. It's sloppy work.

His algorithm is pretty easy to understand. Each vertex has a finite set of edges connecting it to its neighbors. You have each node in the graph send its list of its neighbors to its neighbors. With that information, each node knows what 3-cliques its a part of. Every clique of size larger than 3 is made up of overlapping 3-cliques - so you can have the cliques merge themselves into ever larger cliques.

If you look at this, it's still basically considering every possible clique. But His "analysis" of the complexity of his algorithm is so shallow and vague that it's easy to get things wrong. It's a pretty typical example of a sloppy analysis. Complexity analysis is hard, and it's very easy to get wrong. I don't want to be too hard on Mr. LaPlante, because it's an extremely easy mistake to make. Analyzing algorithmic complexity needs to be done in a careful, exacting, meticulous way - and while Mr. LaPlante didn't do that, most people who are professional programmers could easily make a similar mistake! But the ultimate sloppiness of it is that he never bothers to finish computing the complexity. He makes vague hand-wavy motions at showing the complexity of certain phases of his algorithm, but he never even bothers to combine them and come up with an estimate of the full upper-bound of his algorithm!

I'm not going to go into great detail about this. Instead, I'll refer you to a really excellent paper by Patrick Prosser, which looks at a series of algorithms that compute exact solutions to the maximum clique problem, and how they're analyzed. Compare their analysis to Mr. LaPlante's, and you'll see quite clearly how sloppy LaPlante was. I'll give you a hint about one thing LaPlante got wrong: he's taking some steps that take significant work, and treating them as if they were constant time.

But we don't even really need to look at the analysis. Mr. LaPlante provides an implementation of his supposedly P-time algorithm. He should be able to show us execution times for various randomly generated graphs, and show how that time grows as the size of the graph grows, right? I mean, if you're making claims about something like this, and you've got real code, you'll show your experimental verification as well as your theoretical analysis, right?

Nope. He doesn't. And I consider that to be a really, really serious problem. He's claiming to have reduced an NP-complete problem to a small-polynomial complexity: where are the numbers?

I'll give you a good guess about the answer: the algorithm doesn't complete in a reasonable amount of time for moderately large graphs. You could argue that even if it's polynomial time, you're looking at exponents that are no smaller than 3 (exactly what he claims the bound to be is hard to determine, since he never bothers to finish the analysis!) - a cubic algorithm on a large graph takes a very long time. But... not bothering to show any runtime data? Nothing at all? That's ridiculous. If you look at the Prosser paper above, he manages to give actual concrete measurements of the exponential time algorithms. LaPlante didn't bother to do that. And I can only conclude that he couldn't gather actual numbers to support his idea.

Big Bang Bogosity

One of my long-time mantras on this blog has been “The worst math is no math”. Today, I’m going to show you yet another example of that: a recent post on Boing-Boing called “The Big Bang is Going Down”, by a self-proclaimed genius named Rick Rosner.

First postulated in 1931, the Big Bang has been the standard theory of the origin and structure of the universe for 50 years. In my opinion, (the opinion of a TV comedy writer, stripper and bar bouncer who does physics on the side) the Big Bang is about to collapse catastrophically, and that’s a good thing.

According to Big Bang theory, the universe exploded into existence from basically nothing 13.7-something billion years ago. But we’re at the beginning of a wave of discoveries of stuff that’s older than 13.7 billion years.

We’re constantly learning more about our universe, how it works, and how it started. New information isn’t necessarily a catastrophe for our existing theories; it’s just more data. There’s constantly new data coming in – and as yet, none of it comes close to causing the big bang theory to catastrophically collapse.

The two specific examples cited in the article are:

  1. one quasar that appears to be younger than we might expect – it existed just 900 million years after the current estimate of when the big bang occurred. That’s very surprising, and very exciting. But even in existing models of the big bang, it’s surprising, but not impossible. (No link, because the link in the original article doesn’t work.)
  2. an ancient galaxy – a galaxy that existed only 700 million years after the big bang occurred – contains dust. Cosmic dust is made of atoms much larger than hydrogen – like carbon, silicon, and iron, which are (per current theories) the product of supernovas. Supernovas generally don’t happen to stars younger than a couple of billion years – so finding dust in a galaxy less than a billion years after the universe began is quite surprising. But again: impossible under the big bang? No.

The problem with both of these arguments against the big bang is: they’re vague. They’re both handwavy arguments made about crude statements about what “should” be possible or impossible according to the bing bang theory. But neither comes close to the kind of precision that an actual scientific argument requires.

Scientists don’t use math because they like to be obscure, or because they think all of the pretty symbols look cool. Math is a tool used by scientists, because it’s useful. Real theories in physics need to be precise. They need to make predictions, and those predictions need to match reality to the limits of our ability to measure them. Without that kind of precision, we can’t test theories – we can’t check how well they model reality. And precise modelling of reality is the whole point.

The big bang is an extremely successful theory. It makes a lot of predictions, which do a good job of matching observations. It’s evolved in significant ways over time – but it remains by far the best theory we have – and by “best”, I mean “most accurate and successfully predictive”. The catch to all of this is that when we talk about the big bang theory, we don’t mean “the universe started out as a dot, and blew up like a huge bomb, and everything we see is the remnants of that giant explosion”. That’s an informal description, but it’s not the theory. That informal description is so vague that a motivated person can interpret it in ways that are consistent, or inconsistent with almost any given piece of evidence. The real big bang theory isn’t a single english statement – it’s many different mathematical statements which, taken together, produce a description of an expansionary universe that looks like the one we live in. For a really, really small sample, you can take a look at a nice old post by Ethan Siegel over here.

If you really want to make an argument that it’s impossible according to the big bang theory, you need to show how it’s impossible. The argument by Mr. Rosner is that the atoms in the dust in that galaxy couldn’t exist according to the big bang, because there wasn’t time for supernovas to create it. To make that argument, he needs to show that that’s true: he needs to look at the math that describes how stars form and how they behave, and then using that math, show that the supernovas couldn’t have happened in that timeframe. He doesn’t do anything like that: he just asserts that it’s true.

In contrast, if you read the papers by the guys who discovered the dust-filled galaxy, you’ll notice that they don’t come anywhere close to saying that this is impossible, or inconsistent with the big bang. All they say is that it’s surprising, and that we made need to revise our understanding of the behavior of matter in the early stages of the universe. The reason that they say that is because there’s nothing there that fundamentally conflicts with our current understanding of the big bang.

But Mr. Rosner can get away with the argument, because he’s being vague where the scientists are being precise. A scientist isn’t going to say “Yes, we know that it’s possible according to the big bang theory”, because the scientist doesn’t have the math to show it’s possible. At the moment, we don’t have sufficient precise math either way to come to a conclusion; we don’t know. But what we do know is that millions of other observations in different contexts, different locations, observed by different methods by different people, are all consistent with the predictions of the big bang. Given that we don’t have any evidence to support the idea that this couldn’t happen under the big bang, we continue to say that the big bang is the theory most consistent with our observations, that it makes better predictions than anything else, and so we assume (until we have evidence to the contrary) that this isn’t inconsistent. We don’t have any reason to discard the big bang theory on the basis of this!

Mr. Rosner, though, goes even further, proposing what he believes will be the replacement for the big bang.

The theory which replaces the Big Bang will treat the universe as an information processor. The universe is made of information and uses that information to define itself. Quantum mechanics and relativity pertain to the interactions of information, and the theory which finally unifies them will be information-based.

The Big Bang doesn’t describe an information-processing universe. Information processors don’t blow up after one calculation. You don’t toss your smart phone after just one text. The real universe – a non-Big Bang universe – recycles itself in a series of little bangs, lighting up old, burned-out galaxies which function as memory as needed.

In rolling cycles of universal computation, old, collapsed, neutron-rich galaxies are lit up again, being hosed down by neutrinos (which have probably been channeled along cosmic filaments), turning some of their neutrons to protons, which provides fuel for stellar fusion. Each calculation takes a few tens of billions of years as newly lit-up galaxies burn their proton fuel in stars, sharing information and forming new associations in the active center of the universe before burning out again. This is ultra-deep time, with what looks like a Big Bang universe being only a long moment in a vast string of such moments across trillions or quadrillions of giga-years.

This is not a novel idea. There are a ton of variations of the “universe as computation” that have been proposed over the years. Just off the top of my head, I can rattle off variations that I’ve read (in decreasing order of interest) by Minsky (can’t find the paper at the moment; I read it back when I was in grad school), by Fredkin, by Wolfram, and by Langan.

All of these theories assert in one form or another that our universe is either a massive computer or a massive computation, and that everything we can observe is part of a computational process. It’s a fascinating idea, and there are aspects of it that are really compelling.

For example, the Minsky model has an interesting explanation for the speed of light as an absolute limit, and for time dilation. Minksy’s model says that the universe is a giant cellular automaton. Each minimum quanta of space is a cell in the automaton. When a particle is located in a particular cell, that cell is “running” the computation that describes that particle. For a particle to move, the data describing it needs to get moved from its current location to its new location at the next time quanta. That takes some amount of computation, and the cell can only perform a finite amount of computation per quanta. The faster the particle moves, the more of its time quantum are dedicated to motion, and the less it has for anything else. The speed of light, in this theory, is the speed where the full quanta for computing a particle’s behavior is dedicated to nothing but moving it to its next location.

It’s very pretty. Intuitively, it works. That makes it an interesting idea. But the problem is, no one has come up with an actual working model. We’ve got real observations of the behavior of the physical universe that no one has been able to describe using the cellular automaton model.

That’s the problem with all of the computational hypotheses so far. They look really good in the abstract, but none of them come close to actually working in practice.

A lot of people nowadays like to mock string theory, because it’s a theory that looks really ogood, but has no testable predictions. String theory can describe the behavior of the universe that we see. The problem with it isn’t that there’s things we observe in the universe that it can’t predict, but because it can predict just about anything. There are a ton of parameters in the theory that can be shifted, and depending on their values, almost anything that we could observe can be fit by string theory. The problem with it is twofold: we don’t have any way (yet) of figuring out what values those parameters need to have to fit our universe, and we don’t have any way (yet) of performing an experiment that tests a prediction of string theory that’s different from the predictions of other theories.

As much as we enjoy mocking string theory for its lack of predictive value, the computational hypotheses are far worse! So far, no one has been able to come up with one that can come close to explaining all of the things that we’ve already observed, much less to making predictions that are better than our current theories.

But just like he did with his “criticism” of the big bang, Mr. Rosner makes predictions, but doesn’t bother to make them precise. There’s no math to his prediction, because there’s no content to his prediction. It doesn’t mean anything. It’s empty prose, proclaiming victory for an ill-defined idea on the basis of hand-waving and hype.

Boing-Boing should be ashamed for giving this bozo a platform.