The Halting Problem

Some people expressed interest in seeing a full-out, formal presentation of the Halting problem proof. So, I’m going to walk through it. There are actually a lot of different versions of this proof; the one that I’m going to use is based on the one used by my grad-school theory professor, Dr. John Case. To be clear, I’m doing it from memory, so any errors are completely my own fault, not John’s!

To start off, we need to define what a computing device is. In formal mathematical terms, we don’t care how it works – all we care about is what it can do in abstract tems. So what we do is define something called an effective computing device. An effective computing device is any Turing equivalent computing device. An ECS is modelled as a two parameter function phi : {cal N} times {cal N} rightarrow {cal N}. The first parameter is an encoding of a program as a natural number; the second parameter is the input to the program. It’s also a natural number, which might seem limiting – but we can encode any finite data structure as an integer, so it’s really not a problem. The return value is the result of the program, if the program halts. If it doesn’t halt, then we say that the pair of program and input aren’t in the domain of phi. So if you wanted to describe running the program “f” on the input 7, you’d write that as phi(f, 7). And, finally, the way that we would write that a program p doesn’t halt for input i as phi(p, i) = bot.

So now we’ve got our basic effective computing device. There’s one thing we still need before we can formulate the halting problem. We need to be able to deal with more parameters. After all – a halting oracle is a program that takes two inputs – another program, snd the input to that program. the easiest way to do that is to use something called a pairing function. A pairing functions is a one-to-one function from an ordered pair to an integer. There are lots of possible pairing functions – for example, you can convert both numbers to binary, left-pad the smaller until they’re equal length, and then interleave the bits. Given (9,3), you convert 9 to 1001, and 3 to 11; then you pad 3 to 0011, and interleave them to give you 10001011 – 139. We’ll write our pairing function as angle brackets around the two values: langle x,yrangle.

With the help of the pairing function, we can now express the halting problem. The question is, does there exist a program O, called a halting oracle, such that:

forall p, forall i: (varphi(O,langle p,irangle) = left{    begin{array}{l}	0 mbox{ if } varphi(p, i) = bot \	1 mbox{ if } varphi(p,i) neq bot   end{array}right.

In english, does there exist a program O such that for all pairs of programs p and inputs i, the oracle returns 1 if varphi(p, i) halts, and 0 if it doesn’t?

Time, finally, for the proof. Suppose that we do have a halting oracle, O. That means that for any program p and input i, varphi(O, langle p, i rangle) = 0 iff varphi(p,i)=bot.

So, can we devise a program $p_d$ and input i where varphi(p_d, i) != bot,
but varphi(O, langle p, i rangle) = 0?

Of course we can. We need a p_d which takes two parameters: an oracle, and an input. So it should be really simple right? Well, not quite as easy as it might seem. You see, the problem is, p_d needs to be able to pass itself to the oracle. But how can it do that? A program can’t pass itself into the oracle. Why not? Because we’re working with the program as a Gödel number – and a program can’t contain its own Gödel number. If it contained it, it would be larger than itself. And that’s rather a problem.

But there’s a nice workaround. What we care about is: is there any combination of program and input such that O will incorrectly predict the halting status? So what we’ll do is just turn p_d into a parameter to itself. That is, we’ll look at a program p_d like the following:

def deceiver(input):
   (oracle, program) =  unpair(input)
   if oracle(program, input):
      while(True): continue
   else:
      halt

Then we’ll be interested in the case where the value of the program parameter is a Gödel number of the deceiver itself.

(As an aside, there are a variety of tricks to work around this. One of the more classical ones is based on the fact that for any given program, p, there are an infinite number of versions of the same program with different Gödel numbers. Using that property, you can embed a program p_{d_2} into another program p_{d}. But there are a few other tricks involved in getting it right. It’s not simple – Alan Turing screwed it up in the first published version of the proof!)

Now, when input = langle O, p_d rangle, then O will make the wrong prediction about what p_d will do. So – once again, we’re back where we were in the simpler version of the proof. A halting oracle is a program which, given any pair of program and input, will correctly determine whether that program will halt on that input. We’re able to construct a pair of program and input where the oracle doesn’t make the right prediction, and therefore, it isn’t a halting oracle.

This version is more abstract, but it’s still got that wonderfully concrete property. Even in the most abstract way of talking about a computing device, if you’ve got something that you believe is a halting oracle, this shows you how to create a program that will prove that the halting oracle is wrong. So you can’t possibly create a halting oracle.

And to be extra clear: this doesn’t rely on any ambiguities about definitions, or any distinction between values and meanings. It shows a way of producing a real, concrete counterexample for any purported halting oracle. No trickery, no fudging – if you think you have a halting oracle, you’re wrong, and this proof shows you exactly how to create a program that will demonstrate that.

29 thoughts on “The Halting Problem

  1. James Sweet

    Mark, I think you need to cover one other “Yes but…” that people have been throwing around. Namely, the objection that perhaps you can make a halting Oracle that is correct as long as the input never refers to itself. You dealt with that in a comment in the other thread, but you should state it explicitly in this post I think, so people don’t miss it.

    I’ll take a stab. As you say, the oracle is something with the property:

    forall p, forall i : (phi(O, left langle p, i  right rangle)) = left{begin{matrix}0 & if & phi(p,i) = perp )\1 & if & phi(p,i) neq perp )end{matrix}right.

    What people want to do, I think, is that instead of forall i, they want to restrict i such that it does not contain O anywhere in it. If we even attempt to write that restrictions mathematically, the problem becomes obvious. Here’s a lame attempt at it:

    forall x : i neq left langle O, x right rangle wedge  i neq left langle x, O right rangle

    First of all, as you said, there are many (an infinite number, in fact!) of pairing functions, so p_d could just use a different pairing function than we are assuming here. Moreover, I could just pair O more deeply down in the hierarchy or something.

    To state it more generally, there are an infinite number of ways of encoding O within i, and if you have some way to “test for” that — then you already have a Halting Oracle!

    Reply
    1. James Sweet

      Heh, let me try and formalize that last paragraph.

      What we need for our Oracle O is a Deceiver Oracle D, which we can represent thusly:

      forall p, forall i : (phi(C,langle  p,i  rangle )) = left{begin{matrix}0 & if & (phi(O, langle left p, i right rangle) = 0) Leftrightarrow (phi(p,i) = perp)))\1 & otherwiseend{matrix}right.

      If we had that, we could restrict langle left p, i right rangle in the original description of the Halting Oracle so that it excluded the Deceiver… but it should be clear at this point that one could easily construct a Deceiver Oracle Deceiver — and so on ad infinitum for any further recursive attempts to patch this hole.

      Reply
      1. MarkCC Post author

        Sorry. I will eventually, but I’ve been dealing with some bigger issues. Right now, the site URLs don’t work properly, so that we’ve got a plethora of broken links in articles that predate the move, and in peoples’ bookmarks. That needs to be fixed before I can think about anything else. But comment preview is in the to-do list.

        Reply
        1. James Sweet

          Now that I’m making unreasonable demands on your time, you haven’t been posting enough since the switch to Scientopia. So I expect you’ll have the URLs fixed, the preview button implemented, and three more posts written by EOB tomorrow.

          Also, you should not be interfering with John A. Davidson’s discussions on your blog. Just because it’s your blog doesn’t give you a right to post comments to it!

          Reply
  2. GrayGaffer

    Excluding inputs containing Deceivers seems at first glance to be satisfactory, but there is a problem: the proof does not say that only Deceivers will foil your Oracle, only that for every Oracle there exists at least one input for which it will give the wrong answer, or itself fail to Halt. So falsification is achieved with the one example, but nothing is known about how many other examples there are of inputs to a purported Oracle that are not Deceivers yet will cause the Oracle to fail. I suspect the set of all inputs that will cause all Oracles to fail is itself infinite, also larger than the set of all Deceivers. I’m not about to enumerate them myself – I’m pretty sure my personal Oracle says that attempt will not halt.

    Reply
  3. khms

    Actually, you don’t need to bother with definitions of O being inside some i.

    See, if you have one encoding function where – say – the dangerous encoding of the parameter is x, then you can generate a new encoding function where this particular parameter instead encodes to y, and whatever encoded to y before (if anything) encodes to x instead.

    And you can do that for every y you can think of.

    So O is inside *every single* i.

    Reply
    1. James Sweet

      Exactly. You try to plug one hole, and an infinite number of new ones open up.

      My “i != “, etc., was meant as an example of how someone might try to plug the hole, just to show how ludicrous it was.

      Unfortunately my follow-up post got lost with LateX parse problems, but you can pretty much show that if you had a way of identifying that the pair had a halting oracle in it, you’d already have a halting oracle.

      Reply
  4. Michael Ralston

    Because we’re working with the program as a Gödel number – and a program can’t contain its own Gödel number. If it contained it, it would be larger than itself. And that’s rather a problem.

    This is untrue! A program can compute its own Gödel number, even though it can’t represent it literally -after all, it can perform various types of mathematical operations to get there. That’s the basis of the Kleene Recursion Theorem.

    Reply
    1. david

      It’s actually quite easy for a program to obtain a description of itself. For instance, the following code in C (without any line breaks) prints out its own source code.

      int main() {char* s = “int main() {char* s = %c%s%c; printf(s,34,s,34); return 0;}”; printf(s,34,s,34); return 0;}

      It’s easy to see how to modify this code too. If you want it to obtain its own description and do further computation on it rather than just printing it, change it to use sprintf and make sure to add any additional desired code to the corresponding parts of s as well as to the rest of the program.

      Reply
    2. James Sweet

      A program still can’t contain its own Godel number though. That was the point Mark was making, is that it has to get at it indirectly.

      preceded by itself in quotation marks is a valid sentence.

      Reply
  5. sriram srinivasan

    Apropos to this discussion is Byron Cook’s research project at Microsoft Research, Cambridge: “TERMINATOR”, an attempt to prove termination programs in practice (which tend not to have a deceiver!)

    Reply
  6. Sean

    Can you explain more about the effective computing device? In particular, what exactly does “halting” mean for an effective computing device? It doesn’t seem sufficient just to consider an effective computing device to be some function from N x N to N union perp (sorry, don’t know how to tex). There must be more to it.

    Also, how do you know that your deceiver program can be implemented with the given computing device?

    Finally, you just start using the term Godel number without explaining it. What does that mean?

    Reply
    1. Michael Ralston

      Halting means “ceases computation with an output” – the details depend on the formal definition of the ECS in question, but this is a definition of the properties an ECS must have to be an ECS – it’s such a function, AND it is capable of representing some other ECS via an encoding. That’s all it needs.

      The deciever program can be implemented because it’s simply a combination of the oracle (that we assume, for the purpose of the proof, exists) and basic primitives that we can implement in any ECS – we know this, since we can implement it in one of them. Anything that can be implemented in one ECS can be implemented in all of them, since we can translate between all ECS’es.

      A Godel number is simply an encoding from programs or proofs or whatever, to natural numbers that is 1-1 and (I believe?) onto. … and of course, is itself computable. The details of the encoding don’t matter, since of course there’s an infinitude of them.

      Reply
      1. James Sweet

        If it helps you visualize it, I ever-so-roughly envision the Godel number being like if you took the machine code for a program and concatenated all the bits together into one gigantic number. That’s not quite right, but it’s close enough, and easy to visualize for us programmers 🙂

        Reply
      2. Sean

        “… it’s such a function, AND it is capable of representing some other ECS via an encoding. That’s all it needs.”

        I don’t get this. It seems like you’re trying to define what an ECS is in terms of an ECS. Can you give a more formal definition of an ECS? Also, you say

        ” … basic primitives that we can implement in any ECS – we know this, since we can implement it in one of them.”

        Which one? Is there some explicit example of an ECS on which all others are built?

        Reply
        1. Michael Ralston

          The way we formally define an ECS is to first formally define one ECS (it doesn’t matter which we pick – Dr Case likes Script-L for his classes, the lambda calculus works, turing machines work, it doesn’t matter.) then say that “anything capable of computing all programs in this ECS is itself an ECS.”

          The actual formal definition of any specific ECS can get long, though.

          There’s no actual “base” ECS, since they’re all equivalently powerful; I think the most common one used in reduction proofs is the Turing Machine – it’s relatively simple, and thus relatively easy to prove that a potential ECS can encode all turing machines.

          As for “in one”, the answer is: the one MarkCC used. We simply prove that something can be done in any ECS and we’ve then proven it can be done in all ECS’s – so we can just pick whatever ECS we find easiest to program in for the sake of our proof.

          Reply
      3. Tobias

        I don’t really get why the number of programs is countable.
        I mean the number of possible states of an Infinite tape is clearly uncountable, so there should be uncountably many programs.

        Can someone tell me what I have overlooked?

        Reply
        1. MarkCC Post author

          The tape of a turing machine is unbounded, not infinite.

          One way of thinking of it is that when you’re using a Turing machine, any time that you try to move the head past anything that was read or written before, the machine adds a new blank cell. If you keep moving the head, you’ll keep adding cells. You’ll never run out of cells, because every time you move past the end, it’ll add a new cell to the tape. But at any time, the current length of the tape is finite. Since any possible state of the tape is finite, the size of the set of possible tape states is countably infinite.

          It’s similar to representations of the natural numbers. Given a particular representation, there’s no maximum length to the representation of a natural number. You can always make a bigger number by adding another digit. But obviously, the set of natural numbers is countable – and if the representation has one representation per natural number, then even though the length is unbounded, the representations are countable.

          Reply
  7. Carlos Licea

    I’ve always loved this beautiful proof. However, not being a mathematician, I’ve always wondered, doesn’t this only prove that you can’t build an oracle machine for every program? I mean, why isn’t it possible to get an oracle machine that can predict the halting of all the machines except for itself?

    Reply
      1. Carlos Licea

        I’m afraid you’ll have to break it down for me. Doesn’t that essentially does what I said you can’t?

        Reply
        1. Michael Ralston

          Carlos: It does it indirectly. The thing is, determining if two machines are equivalent is actually harder than determining if they halt – so no oracle that tries to just plug the loophole can work, because we can always make a machine that will exploit it more and more indirectly and the oracle can’t know.

          (For instance, we can write a machine that emulates some other computing system, then runs, in that other computing system, a machine that emulates this computing system and runs the oracle. etc etc.)

          Reply
  8. Jonathan Vos Post

    Clear for me. Carlos Licea, with all due respect, is an exemplar on why one needs to slow down and break every step into two steps, for those not used to this level of thought. That was my meta-lesson when I stopped teaching Math in University and started teaching it in urban middle schools and high schools, to disheartened youngsters who’d failed the remedial math classes two or three times before. The good news is, patient and careful teaching, which debugs their false assumptions, and fits their sensory modality, and builds from their strengths, almost always works.

    Reply

Leave a Reply to Michael Ralston Cancel reply