# Paxos, a really beautiful protocol for distributed consensus

The work that I do in real life is all focused on tools for other developers. In todays environment, that means that I’ve spent a lot of time working on tools that, in one way or another, help other developers deal with distributed systems. In that work, I’ve noticed that there are some really key things that straddle the line between pure math and pure engineering. That’s really interesting to someone like me!

A good example of that is something called paxos. My first exposure to paxos was very interesting. I’d just been hired by Google, and was working on their build tool. At the time, engineers in Google had a problem. Google’s codebase was contained in one massive version control repository. Doing things that way has a ton of really neat advantages – most importantly, the fact that it makes it really convenient to reuse code written by anyone else at the company. The problem was that code reuse can become very confusing. Project A reuses a bit of code written by people from project B. B’s code reused some stuff from C, and C from D, and D from E. So now project A is using code from project E, and they don’t know why!

In this case, I had someone from a storage project coming to me trying to figure out just why his system had a dependency on a plan9 database system called paxos. I had to built a tool that would allow people to ask questions like “Why does A depend on E?”.

As it turned out, paxos was a really important thing, and it was widely reused through the Google codebase. Once I learned about it, I started seeing it everywhere. Since I left Google nearly four years ago, I didn’t stop seeing it. It’s ubiquitous in distributed systems. Outside of Google, we weren’t using that friendly old plan9 paxos implementation – but the paxos model has been reimplemented dozens of times, because it’s so darned useful!

paxos is a system for managing consensus.

In distributed systems, there a collection of hard problems that you constantly need to deal with.

1. Things fail. You can never count on anything being reliable. Even if you have perfectly bug-free software, and hardware that never breaks, you’ve still got to deal with the fact that network connections can break, or messages within a network can get lost, or that some bozo might sever your network connection with a bulldozer. (That really happened while I was at Google!)
2. Given (1), you can never rely on one copy of anything, because that copy might become unavailable due to a failure. So you need to keep multiple copies, and those copies need to be consistent – meaning that at any time, all of the copies agree about their contents.
3. There’s no way to maintain a single completely consistent view of time between multiple computers. Due to inconsistencies in individual machine performance, and variable network delays, variable storage latency, and several other factors, there’s no canonical way of saying that for two events X and Y, “X happened before Y”. What that means is that when you try to maintain a consistent set of data, you can’t just say “Run all of the events in order”, because while one server maintaining one copy might “know” that X happened before Y, another server maintaining another copy might be just as certain that Y happened before X.

In a world where you can’t count on different agents seeing events in the same order, and where you expect things to be constantly failing, how can you make sure that any distributed system you build ends up with a consistent view of reality?

The answer is a consensus protocol. You need to create a mechanism based on communication between the copies of your system that allows them to mantain a consistent consensus of what the current state of the world is, even in the presence of failures of machines, storage systems, and communications.

paxos is a very elegant, reasonably simple consensus protocol.

Let’s get a bit more precise. Paxos is built on a model of storage. The original application of it was a consistent database, so it’s built around the idea of keeping data consistent. In paxos, the state of the storage is modelled as a sequence of transactions. Each transaction is a pair (t, v), where t is a numeric transaction identifier, and a v is a transaction value.

The state of the system being modelled is a sequence of transaction pairs, [(ti, vi), (tj, vj), ..], where the t values are increasing as you progress through the sequence. As time passes, new transaction pairs can be added to the state.

The focus of the paxos protocol is ensuring that in a collection of 2n+1 participants, all surviving participants will agree on the current value of the state, even if up to n participants fail, and even if messages can be delivered in arbitrary order.

Before I go further into the description of paxos, we need to look at the basic assumptions that underlie it. Like any formal model, it’s not defined in terms of real computers. It’s defined in terms of an abstraction that approximates reality. In this case, the approximation is quite good, but we still need to go through the basic assumptions that make up its model of the universe.

1. Processors (aka participants, servers, computers):
1. operate at any speed. No two processors necessarily operate at the same speed.
2. may fail without warning.
3. may rejoin after recovering from a failure.
4. are cooperative (in the sense that they do not attempt to cause failures).
2. Network:
1. Delivers messages between any pair of processors.
2. Transmits messages asynchronously.
3. Delivers messages at arbitrary speeds.
4. Does not guarantee that messages will be delivered in the order in which they were transmitted.
5. Does guarantee that a message, if delivered, will be delivered correctly, without any changes.
6. May fail to deliver a message.
7. May deliver multiple copies of the same message.

In short, everything can fail at any time; after failure, participants can recover and rejoin the system; any no part of the system acts in an actively adversarial way.

The protocol describes the behavior of the system in terms of a collection of roles. A participant can have more than one role in the system – in fact, in most implementations of paxos, all partipants do have multiple roles. The roles are:

Client
The client is not part of the paxos cluster. It’s an external entity whose actions trigger state changes by making requests to the paxos system. Each state update in paxos is initiated by a client request, and completed by a reply to the client.
Acceptor
An acceptor (also called a voter) is a participant in the maintanence of distributed storage. A state change in a paxos cluster does not occur until a majority (quorum) of acceptors agree upon it.
Proposer
A proposer recieves a request from the client, and attempts to get a quorum of acceptors to agree on it.
One of the proposers is special. It is the single proposer who most recently had a proposal accepted. In many paxos implementations, there is only one active proposer serving client requests: the only time the other proposers send proposals is when the current leader fails, and a new one needs to be selected.
Learner
The learner is the real service provided by the paxos cluster. Once a proposal is accepted, a learner processes the request from the client, and sends it the result.

In a typical paxos cluster, the client sends requests to a proposer. The proposer sends a proposal to update the state with the new client request, and attempts to convince a majority of the acceptors to accept it. Once a majority accepts it, the client request is processed by the learner, and a result is returned to the client.

The meat of paxos the protocol that the proposer gets a majority of acceptors to agree on a proposal, and how that protocol process ensures that the collection of acceptors maintains a consistent state.

The protocol itself is pretty simple. Each round is effectively independent, and consists of a process of attempting to reach consensus. Within each round, finding consensus is a two-phase process, where each phase consists of a message sent from a proposer to a group of acceptors, and a reply from the acceptors to the proposer.

1. Phase One: Prepare/Promise
• Proposer: A proposer attempts to start setting a new consensus by sending a Prepare(N) message to a quorum of acceptors. It can send to any group of acceptors, so long as that group forms a majority of the acceptors. The prepare message specifies a numeric identifier N for its proposal, which is larger than any proposal that’s been sent by this proposer.
• Acceptors:
Each acceptor, upon receiving the proposal, checks if the N-value from the prepare message is greater than any proposal from the current round that it has accepted. If so, it sends a reply called a Promise to the proposer, promising that it will never accept any proposal with a number less that N. If the acceptor has accepted a proposal with number less than $N$ in the current round, then it includes the pair $(v, n_v)$ consisting of the proposed consensus value $v$ and the number $n_v$ of the accepted proposal that proposed $v$.
The acceptor thus sends a message Promise(N, (v, nv)) (if it has accepted a proposal this round) or Promise(N, null) (if it has not yet accepted a proposal with number less than N).

Once it’s sent a promise message, it must not accept any request for a proposal with number less that N. Note though that this does not mean that the acceptor promises to accept the proposal: all it’s doing is promising not to accept any proposal with number less than N! If in receives a message Prepare(N+1), it’s free to promise that – but if it does, it will no longer be able to accept the proposal for N.

(If N is smaller that the number of any proposal promised or accepted by the acceptor, then in the original version of paxos, the acceptor does nothing; in some optimizations of the protocol, it replies Reject(nv).)

What this phase does is allow a proposer to determine whether or not a new proposal is even worth considering. If a quorum (majority) of acceptors send promises, then it can move on to phase 2.

2. Phase Two: Accept!/Accepted

When a proposer recieves promises from a quorum of acceptors, then it moves forward to try to actually commit the proposal. In order to do this, it needs to choose a value for the proposal. If any of the Promise messages contained a value, then the value of this proposal must be set to the value of the highest proposal number in any of the promises. If all of the promises were empty, then the proposer can choose any value that it wants for the proposal.

Once the proposer has chosen a value, then it sends a message Accept!(N, V) to a quorum of acceptors. This is typically written with the exclamation point, because it’s really a command to the acceptors: they’re being told to accept the proposal, if they can.

When an acceptor receives an Accept!(N, v) message, if it has not issued a promise for a proposal with number greater than N, then it must accept the message. It accepts the proposal by sending a message Accepted(N, V) to both the original proposer, and all of the learners.

When Accepted messages have been received from a quorum of acceptors, the new value V becomes the consensus value for the paxos cluster, and the new proposal number N is fully committed.

As with so many things, this is easier to understand when you think about an example. One use of paxos that I’ve worked with is in a cluster scheduling service. In that system:

• a client is a user attempting to run a new job on the cluster. It sends a request to the scheduler detailing the set of resources that it wants to request.
• Each duplicate of the scheduler is a proposer, an acceptor, and a learner. There’s one active instance of the scheduler, which is the leader. When a client wants to schedule a job, its request gets sent to the leading scheduler.
• In the normal non-error case, this works as follows:
1. When a scheduling request is received, the leader proposes scheduling the job, by sending a message to all of the other schedulers saying that it wants to schedule job N.
2. The other schedulers, if they haven’t seen a proposal for a job with number greater than i, make promises to accept that proposal.
3. The leading scheduler chooses resources for the job, and then sends an Accept! message to the other schedulers.
4. The other schedulers reply accepting the scheduling. The non-leader schedulers, acting as learnings, record the scheduling information, and the leader actually starts the job.
• Errors occur when there was some kind of failure. In that case, we don’t necessarily know who the leader is – so we get multiple schedulers trying to act as if they’re the leader. So they each send proposals. Whichever proposal had the largest proposal number will eventually get accepted, and its proposer becomes the new leader.

It’s a pretty simple thing – the core concept is simply that no consensus proposal is considered “committed” until it’s been accepted by a majority of the participants. And if it’s been accepted by a majority of the participants, that means that no conflicting proposal can ever reach consensus – because that would require at least one participant to accept 2 conflicting proposals.

But there’s still a bit of formality that’s working looking at. Exactly what guarantees does paxos give? What properties does paxos-style consensus have?

Even the formal properties of paxos are easy to understand. Paxos provides two key properties: validity, and agreement.

Validity
No value ever reaches consensus without first being proposed, and having its proposal accepted.
Agreement
No two distinct values ever reach consensus at the same time.

You an easily prove those two properties. In fact, the proof is completely obvious once you recognize that the paxos protocol has two invariants (and those invariants are themselves clear from the definition of the protocol!):

1. An acceptor can only accept a proposal $p$ if and only if it has not yet made a promise
for a proposal $n$ where $v$ is the consensus value of the
highest numbered proposal that has been accepted before this proposal.

Getting back to the beginning: the point of all of this is to have a system in which we can be sure that things work correctly even in the presence of failures. In paxos, as long as at some point there was a quorum of machines that come to agreement, then any failure that leaves a surviving quorum of machines must have overlapped with the previous quorum – which means that the previous consensus still remains in effect, and will be propagated to the remaining participants. If you’ve got 5 machines, then two can fail, and you won’t lose consistency among the remaining ones.

# The ABC conjecture – aka the soap opera of the math world.

Sorry for the silence of this blog for the last few months. This spring, my mother died, and I was very depressed about it. Depression is a difficult thing, and it left me without the energy or drive to do the difficult work of writing this kind of material. I’m trying to get back into the cycle of writing. I’m trying to make some progress in writing about type theory, but I’m starting with a couple of easier posts.

In the time when I was silent, I had a couple of people write to me to ask me to explain something called the ABC conjecture.

The ABC conjecture is a mathematical question about number theory that was proposed in the 1980s – so it’s relatively new as number theory problems go. It’s gotten a lot of attention recently, due to an almost soap-operatic series of events.

It’s a very hard problem, and no one had made any significant progress on it until about five years ago, when a well respected Japanese mathematician named Shinichi Mochizucki published a series of papers containing a proof of the conjecture.

Normally, when a proof of a hard problem gets published, mathematicians go nuts! Everyone starts poring over it, trying to figure it out, and see if it’s valid. That’s what happened the previous time someone thought they’d prooved it. But this time, no one has been able to make sense out of the proof!

The problem is that in order to build his proof, professor Mochizucki created a whole new mathematical theory, called inter-universal Teichmüller theory. The entire ABC conjecture proof is built in this new theory, and no one other than professor Mochizucki himself understands Teichmüller theory. Before anyone else can actually follow the proof, they need to understand the theory. Professor Mochizucki is a bit of a recluse – he has declined to travel anywhere to teach his new mathematical system. So in the five years since he first published it, no one has been able to understand it well enough to determine whether or not the proof is correct. One error in it was found, but corrected, and the whole proof remains in question.

Exactly why the proof remains unchecked after five years is a point of contention. Lots of mathematicians are angry at Professor Mochizucki for not being willing to explain or teach his theory. A common statement among critics is that if you create a new mathematical theory, you need to be willing to actually explain it to people: work with a group of mathematicians to teach it to them, so that they’ll be able to use it to verify the proof. But Professor Mochizuchki’s response has been that he has explained it: he’s published a series of papers describing the theory. He doesn’t want to travel and take time away from his work for people who haven’t been willing to take the time to read what’s he’s written. He’s angry that after five years, no one has bothered to actually figure out his proof.

I’m obviously not going to attempt to weigh in on whether or not Professor Mochizuki’s proof is correct or not. That’s so far beyond the ability of my puny little brain that I’d need to be a hundred times smarter before it would even be laughable! Nor am I going to take sides about whether or not the Professor should be travelling to teach other mathematicians his theory. But what I can do is explain a little bit about what the ABC conjecture is, and why people care so much about it.

It’s a conjecture in number theory. Number theorists tend to be obsessed with prime numbers, because the structure of the prime numbers is a huge and fundamental part of the structure and behavior of numbers as a whole. The ABC conjecture tries to describe one property of the structure of the set of prime numbers within the system of the natural numbers. Mathematicians would love to have a proof for it, because of what it would tell them about the prime numbers.

Before I can explain the problem, there’s a bit of background that we need to go through.

1. Any non-prime number N is the product of some set of prime numbers. Those numbers are called the prime factors of N. For example, 8 is 2×2×2 – so the set of prime factors of 8 is {2}. 28 is 2×2×7, so the prime factors of 28 are {2, 7}. 360 = 8 × 45 = 2×2×2×(9×5) = 2×2×2×3×3×5, so the prime factors of 360 are {2, 3, 5}.
3. Given two positive integers N and M, N and M are coprime if they have no common prime factors. A tiny bit more formally, if pf(N) is the set of prime factors of N, and M and N are coprime if and only if pf(N) ∩ pu(M) = ∅. (Also, if M and N are coprime, then rad(M×N) = ram(M)×rad(N).)

The simplest way of saying the ABC conjecture is that for the vast majority of integers A, B, and C, where A + B = C and A and B are coprime, C must be smaller than rad(A*B).

Of course, that’s hopelessly imprecise for mathematicians! What does “the vast majority” mean?

The usual method at times like these is to find some way of characterizing the size of the relative sizes of the set where the statement is true and where the statement is false. For most mathematicians, the sizes of sets that are interesting are basically 0, 1, finite, countably infinite, and uncountably infinite. For the statement of the ABC conjecture, they claim that the set of values for which the statement is true is infinite, but that the set of values for which it is false are finite. Specifically, they want to be able to show that the set of numbers for which rad(A*B)>C is finite.

To do that, they pull out a standard trick. Sadly, I don’t recall the proper formal term, but I’ll call it epsilon bounding. The idea is that you’ve got a statement S about a number (or region of numbers) N. You can’t prove your statement about N specifically – so you prove it about regions around N.

As usual, it’s clearest with an example. We want to say that C > rad(A*B) for most values of A and B. The way we can show that is by saying that for any value ε, the set of values (A, B, C) where A and B are coprime, and A + B = C, and rad(A*B) > C + ε is finite.

What this formulation does is give us a formal idea of how rare this is. It’s possible that there are some values for A and B where rad(A*B) is bigger that 1,000,000,000,000,000,000 + C. But the number of places where that’s true is finite. Since the full system of numbers is infinite, that means that in the overwhelming majority of cases, rad(A*B) < C. The size of the set of numbers where that's not true is so small that it might at well be 0 in comparison to the size of the set of numbers where it is true.

Ultimately, it seems almost trivial once you understand what the conjecture is. It's nothing more that the hypothesis that that if A + B = C, then most of the time, pf(A)*pf(B) < C. Once you've got that down, the question is, what's the big deal? Professor Mochuzuki developed five hundred pages of theory for this? People have spent more than five years trying to work through his proof just to see if it’s correct for a statement like this? Why does anybody care so much?

One answer is: mathematicians are crazy people!

The better answer is that simple statements like this end up telling us very profound things about the deep structure of numbers. The statements reduce to something remarkably simple, but the meaning underneath it is far more complex than it appears.

Just to give you one example of what this means: If the conjecture is true, then there’s a three-line proof of Fermat’s last theorem. (The current proof of Fermat’s last theorem, by Andrew Wiles, is over 150 pages of dense mathematics.) There’s quite a number of things that number theoreticians care about that would fall out of a successful proof.

# Intuitionistic Logic for Type Theory

I’ve written about intuitionistic logic before. In fact, there’s a whole section about it in my book. But now that I’m reading a lot about type theory, I’m starting to look at it diferently.

When you study classical axiomatic set theory, you’re necessarily also studying classical first order predicate logic. You have to be doing that, because classical axiomatic set theory is deeply and intimately intertwined with FOPL. Similarly, the semantics of FOPL as it’s used in modern math are inextricably tangled with set theory. Sets are specified by predicates; predicates get their meaning from the sets of objects that they satisfy.

You can view type theory – or at least Martin-Loff’s intuitionistic type theory – as having nearly the same relationship to intuitionistic logic. We’ll see that in detail in later posts, but for now, intuitionistic type theory is a fundamental mathematical framework which is built on intuitionistic logic. So you can’t talk about this kind of type theory unless you understand the basics of the logic.

In this post, I’m going to try to explain what intuitionistic logic is, and how it differs from FOPL. (We’ll see all of this in more detail later.)

Intuitionistic logic is a modal predicate logic, which is built around a constructivist idea of truth. The intuitionistic idea of truth ends up being much stronger than what most of us are used to from standard FOPL: it means that nothing exists unless there is a concrete way of constructing it.

For a concrete example of what that means: in standard FOPL with the ZFC axioms, you can prove the Banach-Tarski paradox. Banach-Tarski (which I wrote about HERE) says that it’s possible to take a sphere the size of an orange, cut it into pieces, and then re-assemble those pieces into two spheres the same size as the original orange. Or, alternatively, that you can take those pieces that you sliced an orange-sized sphere into, and re-assemble them into a sphere the size of the sun.

Many people would say that this is, clearly, ridiculous. Others would point out a variety of rationalizations: that a sphere the size of an orange and a sphere the size of the sun contain the same number of points; or that the slicing process transitioned from a metric topology to a collection of non-metric topologies, or several other possible explanations.

But what no one can dispute is that there is one very important property of this proof. Those slices are unconstructable. That is, they exist based on a proof using the axiom of choice, but the sets of points in those topologies can’t be constructed by any process. They exist as a necessary implication of the axiom of choice, but we can’t construct them, and even given a pair of sets, one of which is one of those slices, and one of which isn’t, we can’t identify which one is.

According to intuitionism, this is ridiculous. Saying that something exists, but that it is forever beyond our reach is foolishness. If we can’t construct it, if we can’t describe how to identify it, what does it mean to say that it must exist?

When you’re working in intuitionistic logic, every proof that a type of thing exists consists of either a concrete example of the thing, or a process for constructing an example of the thing. A proof of a negative is a concrete counterexample, or a process for creating one. In computer-sciency-terms, the process doesn’t need to terminate. You don’t have to be able to construct something in finite time. But you need to have a process that describes how to contsruct it. So you can, for example, still do Cantor’s diagonalization in intuitionistic logic: if someone gives you an alleged complete 1:1 mapping between the real numbers and the integers, the proof tells you how to create a counterexample. But you can’t do the proof of Banach-Tarski, because it relies on an axiom-of-choice existence proof of something non-constructable.

The way that intuitionistic logic creates that constructivist requirement is not what you might expect. When I first heard about it, I assumed that it was based on a statement of principle: a proof has to create a concrete example. But that approach has an obvious problem: how do you mathematically define it? Logic is supposed to be purely symbolic. How can you take an abstract statement about what a proof should be, and make it work in logic?

Logic is built on inference rules. You have a collection of statements, and a collection of rules about how to use those statements to produce proofs. It turns out that by making a couple of simple changes to the rules of inference that you can get exactly the constructivist requirements that we’d want. It’s based on two real changes compared to standard FOPL.

Intuitionistic logic is modal. In FOPL, any given statement is either true or false. If it’s not true, then it’s false. If it’s true, it’s always true, and always was true. There’s no other choice. In intuitionistic logic, that’s not really the case: intuitionistic logic has three states: true, false, and unknown. If you know nothing about it, then it’s formally unknown, and it will stay unknown until there’s a proof about it; once you find a proof, it’s truth value changes from unknown to either true or false. All of the inference rules of intuitionistic logic only allow inference from proven statements. You can’t reason about an unknown – you need to have a proof that moves it from unknown to either true or false first.

The semantics of this are quite simple: it’s a tiny change in the definition of truth. In FOPL, a statement is true if there exists a proof of that statement, and it’s false if there’s a proof of the negation of that statement. In intuitionistic logic, a statement is true if you have a proof of that statement; and it’s false if you can prove that there is no proof of the statement If you haven’t proven $A$, then $A$ is unknown. If $A$ is unknown, then $\lnot A$ is also unknown. $A \lor \lnot A$ is, similarly, not true until you have a proof of either $A$ or $\lnot A$: it means that either “There is a proof of A or there is a proof of $\lnot A$“. But if we don’t know if there’s a proof of either one, then it’s unknown! You could argue that this is true in FOPL as well – but in FOPL, you can rely on the fact that $A \lor \lnot A$, and you can use that in a proof, and explore both options. In intuitionistic logic, you can’t: you can’t do anything with $A \lor \lnot A$ until you’ve got a proof.

It’s amazing how small the change to FOPL is to produce something that is so strongly constructionist. The easiest way to appreciate it is to just look at the rules, and how they change. To do that, I’m going to quickly walk through the inference rules of intuitionistic logic, and then show you what you’d need to change to get classical FOPL. Most of the time, when I’ve written about logics, I used sequents to write the inference rules; for ease of typesetting (and for the fun of doing something just a bit different), this time, I’m going to use Hilbert calculus (the same method that Gödel used in his incompleteness proof.) In HC, you define axioms and inference rules. For intuitionistic logic, we need to define three inference rules:

1. Modus Ponens: Given $\alpha$ and $\alpha \rightarrow \beta$, you can infer $\alpha$.
2. Universal Generation: Given $\alpha \rightarrow \beta$, you can infer $\alpha \rightarrow (\forall x: \beta)$ if $x$ is not free in $\beta$.
3. Existential Generation: Given $\alpha \rightarrow \beta$, you can infer $(\exists x: \alpha) \rightarrow \beta$, if $x$ is not free in $\beta$.

With the inference rules out of the way, there’s a collection of axioms. Each axiom is actually a schema: you can substitute any valid statement for any of the variables in the axioms.

1. Then-1: $\alpha \rightarrow (\beta \rightarrow \alpha)$.
2. Then-2: $(\alpha \rightarrow (\beta \rightarrow \gamma)) \rightarrow ((\alpha \rightarrow \beta) \rightarrow (\alpha \rightarrow \gamma))$
3. And-1: $\alpha \land \beta \rightarrow \alpha$
4. And-2: $\alpha \land \beta \rightarrow \beta$
5. And-3: $\alpha \rightarrow (\beta \rightarrow (\alpha \land \beta))$
6. Or-1: $\alpha \rightarrow \alpha \lor \beta$
7. Or-2: $\beta \rightarrow \alpha \lor \beta$
8. Or-3: $(\alpha \rightarrow \beta) \rightarrow ((\gamma \rightarrow \beta) \rightarrow (\alpha \lor \gamma) \rightarrow \beta)$
9. False: $\text{False} \rightarrow \alpha$. (For a bit of explanation, this rule means that we don’t need to have $\lnot$ rules – $\lnot \alpha$ can be treated as $\text{False} \rightarrow \alpha$.)
10. Universal: $(\forall x: \alpha(x)) \rightarrow \alpha(t)$, if $t$ is not bound by instantiating $\alpha(t)$.
11. Existential: $\alpha(x) \rightarrow (\exists x \alpha(x)$ if $t$ is not bound by instantiating $\alpha(t)$.

That’s intuitionistic logic. What’s the difference between that and FOPL? What kinds of powerful reasoning features did you need to give up from FOPL to get this strongly constructivist logic?

Just one simple axiom: the law of the excluded middle, $\alpha \lor \lnot \alpha$.

That’s it. Get rid of the excluded middle, and you’ve got the beautiful constructivist intuitionistic logic. All we had to give up is one of the most intuitionnally obvious rules in all of logic.

# GÖDEL PART 4: The Payoff

After a bit of a technical delay, it’s time to finish the repost of incompleteness! Finally, we’re at the end of our walkthrough of Gödel great incompleteness proof. As a refresher, the basic proof sketch is:

1. Take a simple logic. We’ve been using a variant of the Principia Mathematica’s logic, because that’s what Gödel used.
2. Show that any statement in the logic can be encoded as a number using an arithmetic process based on the syntax of the logic. The process of encoding statements numerically is called Gödel numbering.
3. Show that you can express meta-mathematical properties of logical statements in terms of arithemetic properties of their Gödel numbers. In particular, we need to build up the logical infrastructure that we need to talk about whether or not a statement is provable.
4. Using meta-mathematical properties, show how you can create an unprovable statement encoded as a Gödel number.

What came before:

1. Gödel numbering: The logic of the Principia, and how to encode it as numbers. This was step 1 in the sketch.
2. Arithmetic Properties: what it means to say that a property can be expressed arithemetically. This set the groundwork for step 2 in the proof sketch.
3. Encoding meta-math arithmetically: how to take meta-mathematical properties of logical statements, and define them as arithmetic properties of the Gödel numberings of the statements. This was step 2 proper.

So now we can move on to step three, where we actually see why mathematical logic is necessarily incomplete.

What I did in the last post was walk through a very laborious process that showed how we could express meta-mathematical properties of logical statements as primitive recursive functions and relations. Using that, we were able to express a non-primitive-recursive predicate provable, which is true for a particular number if and only if that number is the Gödel number representation of a statement which is provable.

pred provable(x) =
some y {
proofFor(y, x)
}
}


The reason for going through all of that was that we really needed to show how we could capture all of the necessary properties of logical statements in terms of arithmetic properties of their Gödel numbers.

Now we can get to the target of Gödel’s effort. What Gödel was trying to do was show how to defeat the careful stratification of the Principia’s logic. In the principia, Russell and Whitehead had tried to avoid problems with self-reference by creating a very strict type-theoretic stratification, where each variable or predicate had a numeric level, and could only reason about objects from lower levels. So if natural numbers were the primitive objects in the domain being reasoned about, then level-1 objects would be things like specific natural numbers, and level-1 predicates could reason about specific natural numbers, but not about sets of natural numbers or predicates over the natural numbers. Level-2 objects would be sets of natural numbers, and level-2 predicates could reason about natural numbers and sets of natural numbers, but not about predicates over sets of natural numbers, or sets of sets of natural numbers. Level-3 objects would be sets of sets of natural numbers… and so on.

The point of this stratification was to make self-reference impossible. You couldn’t make a statement of the form “This predicate is true”: the predicate would be a level-N predicate, and only a level N+1 predicate could reason about a level-N predicate.

What Gödel did in the arithmetic process we went through in the last post is embed a model of logical statements in the natural numbers. That’s the real trick: the logic of the principia is designed to work with a collection of objects that are a model of the natural numbers. By embedding a model of logical statements in the natural numbers, he made it possible for a level-1 predicate (a predicate about a specific natural number) to reason about any logical statement or object. A level-1 predicate can now reason about a level-7 object! A level-1 predicate can reason about the set defined by a level-1 predicate: a level-1 predicate can reason about itself!. A level-1 predicate can, now, reason about any logical statement at all – itself, a level-2 predicate, or a level-27 predicate. Gödel found a way to break the stratification.

Now, we can finally start getting to the point of all of this: incompleteness! We’re going to use our newfound ability to nest logical statements into numbers to construct an unprovable true statement.

In the last post, one of the meta-mathematical properties that we defined for the Gödel-numbered logic was immConseq, which defines when some statement x is an immediate consequence of a set of statements S. As a reminder, that means that x can be inferred from statements in S in one inferrence step.

We can use that property to define what it means to be a consequence of a set of statements: it’s the closure of immediate consequence. We can define it in pseudo-code as:

def conseq(κ) = {
K = κ + axioms
do {
for all c in immConseq(K) {
if c not in K {
}
}
return K
}


In other words, Conseq(κ) is the complete set of everything that can possibly be inferred from the statements in κ and the axioms of the system. We can say that there’s a proof for a statement x in κ if and only if x ∈ Conseq(κ).

We can take the idea of Conseq use that to define a strong version of what it means for a logical system with a set of facts to be consistent. A system is ω-consistent if and only if there is not a statement a such that: a ∈ Conseq(κ) ∧ not(forall(v, a)) ∈ Conseq(κ).

In other words, the system is ω-consistent as long as it’s never true that both a universal statement and it. But for our purposes, we can treat it as being pretty much the same thing. (Yes, that’s a bit hand-wavy, but I’m not trying to write an entire book about Gödel here!)

(Gödel’s version of the definition of ω-consistency is harder to read than this, because he’s very explicit about the fact that Conseq is a property of the numbers. I’m willing to fuzz that, because we’ve shown that the statements and the numbers are interchangable.)

Using the definition of ω-consistency, we can finally get to the actual statement of the incompleteness theorem!

Gödel’s First Incompleteness Theorem: For every ω-consistent primitive recursive set κ of formulae, there is a primitive-recursive predicate r(x) such that neither forall(v, r) nor not(forall(v, r)) is provable.

To prove that, we’ll construct the predicate r.

First, we need to define a version of our earlier isProofFigure that’s specific to the set of statements κ:

pred isProofFigureWithKappa(x, kappa) = {
all n in 1 to length(x) {
isAxiom(item(n, x)) or
item(n, x) in kappa or
some p in 0 to n {
some q in 0 to n {
immedConseq(item(n, x), item(p, x), item(q, x))
}
}
} and length(x) > 0
}


This is the same as the earlier definition – just specialized so that it ensures that every statement in the proof figure is either an axiom, or a member of κ.

We can do the same thing to specialize the predicate proofFor and provable:

pred proofForStatementWithKappa(x, y, kappa) = {
isProofFigureWithKappa(x, kappa) and
item(length(x), x) = y
}

pred provableWithKappa(x, kappa) = {
some y {
proofForStatementWithKappa(y, x, kappa)
}
}


If κ is the set of basic truths that we can work with, then provable in κ is equivalent to provable.

Now, we can define a predicate UnprovableInKappa:

pred NotAProofWithKappa(x, y, kappa) = {
not (proofForKappa(x, subst(y, 19, number(y))))
}


Based on everything that we’ve done so far, NotAProofWithKappa is primitive recursive.

This is tricky, but it’s really important. We’re getting very close to the goal, and it’s subtle, so let’s take the time to understand this.

• Remember that in a Gödel numbering, each prime number is a variable. So 19 here is just the name of a free variable in y.
• Using the Principia’s logic, the fact that variable 19 is free means that the statement is parametric in variable 19. For the moment, it’s an incomplete statement, because it’s got an unbound parameter.
• What we’re doing in NotAProofWithKappa is substituting the numeric coding of y for the value of y‘s parameter. When that’s done, y is no longer incomplete: it’s unbound variable has been replaced by a binding.
• With that substitution, NotAProofWithKappa(x, y, kappa) is true when x does not prove that y(y) is true.

What NotAProofWithKappa does is give us a way to check whether a specific sequence of statements x is not a proof of y.

We want to expand NotAProofWithKappa to something universal. Instead of just saying that a specific sequence of statements x isn’t a proof for y, we want to be able to say that no possible sequence of statements is a proof for y. That’s easy to do in logic: you just wrap the statement in a “∀ x ( )”. In Gödel numbering, we defined a function that does exactly that. So the universal form of provability is: ∀ a (NotAProofWithKappa(a, y, kappa)).

In terms of the Gödel numbering, if we assume that the Gödel number for the variable a is 17, and the variable y is numbered as 19, we’re talking about the statement p = forall(17, ProvableInKappa(17, 19, kappa).

p is the statement that for some logical statement (the value of variable 19, or y in our definition), there is no possible value for variable 17 (a) where a proves y in κ.

All we need to do now is show that we can make p become self-referential. No problem: we can just put number(p) in as the value of y in UnprovableInKappa. If we let q be the numeric value of the statement UnprovableInKappa(a, y), then:

r = subst(q, 19, p)

i = subst(p, 19, r)

i says that there is no possible value x that proves p(p). In other words, p(p) is unprovable: there exists no possible proof that there is no possible proof of p!

This is what we’ve been trying to get at all this time: self-reference! We’ve got a predicate y which is able to express a property of itself. Worse, it’s able to express a negative property of itself!

Now we’re faced with two possible choices. Either i is provable – in which case, κ is inconsistent! Or else i is unprovable – in which case κ is incomplete, because we’ve identified a true statement that can’t be proven!

That’s it: we’ve shown that in the principia’s logic, using nothing but arithmetic, we can create a true statement that cannot be proven. If, somehow, it were to be proven, the entire logic would be inconsistent. So the principia’s logic is incomplete: there are true statements that cannot be proven true.

We can go a bit further: the process that we used to produce this result about the Principia’s logic is actually applicable to other logics. There’s no magic here: if your logic is powerful enough to do Peano arithmetic, you can use the same trick that we demonstrated here, and show that the logic must be either incomplete or inconsistent. (Gödel proved this formally, but we’ll just handwave it.)

Looking at this with modern eyes, it doesn’t seem quite as profound as it did back in Gödel’s day.

When we look at it through the lens of today, what we see is that in the Principia’s logic, proof is a mechanical process: a computation. If every true statement was provable, then you could take any statement S, and write a program to search for a proof of either S or ¬ S, and eventually, that program would find one or the other, and stop.

In short, you’d be able to solve the halting problem. The proof of the halting problem is really an amazingly profound thing: on a very deep level, it’s the same thing as incompleteness, only it’s easier to understand.

But at the time that Gödel was working, Turing hadn’t written his paper about the halting problem. Incompletess was published in 1931; Turing’s halting paper was published in 1936. This was a totally unprecedented idea when it was published. Gödel produced one of the most profound and surprising results in the entire history of mathematics, showing that the efforts of the best mathematicians in the world to produce the perfection of mathematics were completely futile.

# Gödel part 3: Meta-Logic with Arithmetic

On to the next part of Gödel’s proof of incompleteness. To refresh your memory, here’s a sketch of the proof:

1. Take a simple logic. We’ve been using a variant of the Principia Mathematica’s logic, because that’s what Gödel used.
2. Show that any statement in the logic can be encoded as a number using an arithmetic process based on the syntax of the logic. The process of encoding statements numerically is called Gödel numbering.
3. Show that you can express meta-mathematical properties of logical statements in terms of arithemetic properties of their Gödel numbers. In particular, we need to build up the logical infrastructure that we need to talk about whether or not a statement is provable.
4. Using meta-mathematical properties, show how you can create an unprovable statement encoded as a Gödel number.

What we’ve done so far is the first two steps, and part of the third. In this post, we saw the form of the Principia’s logic that we’re using, and how to numerically encode it as a Gödel numbering. We’ve start started on the third point in this post, by figuring out just what it means to say that things are encoded arithmetically. Now we can get to the part where we see how to encode meta-mathematical properties in terms of arithmetic properties of the Gödel numbering. In this post, we’re going to build up everything we need to express syntactic correctness, logical validity, and provability in terms of arithmetical properties of Gödel numbers. (And, as a reminder, I’ve been using this translation on Gödel’s original paper on incompleteness.)

This is the most complex part of the incompleteness proof. The basic concept of what we’re doing is simple, but the mechanics are very difficult. What we want to do is define a set of predicates about logical statements, but we want those predicates to be expressed as arithmetic properties of the numerical representations of the logical statements.

The point of this is that we’re showing that done in the right way, arithmetic is logic – that doing arithmetic on the Gödel numbers is doing logical inference. So what we need to do is build up a toolkit that shows us how to understand and manipulate logic-as-numbers using arithmetic. As we saw in the last post, primitive recursion is equivalent to arithmetic – so if we can show how all of the properties/predicates that we define are primitive recursive, then they’re arithmetic.

This process involves a lot of steps, each of which is building the platform for the steps that follow it. I struggled quite a bit figuring out how to present these things in a comprehensible way. What I ended up with is writing them out as code in a pseudo-computer language. Before inventing this language, I tried writing actual executable code, first in Python and then in Haskell, but I wasn’t happy with the clarity of either one.

Doing it in an unimplemented language isn’t as big a problem as you might think. Even if this was all executable, you’re not going to be able to actually run any of it on anything real – at least not before you hair turns good and gray. The way that this stuff is put together is not what any sane person would call efficient. But the point isn’t to be efficient: it’s to show that this is possible. This code is really all about searching; if we wanted to be efficient, this could all be done in a different representation, with a different search method that was a lot faster – but that wolud be harder to understand.

So, in the end, I threw together a simple language that’s easy to read. This language, if it were implemented, wouldn’t really even be Turing complete – it’s a primitive recursive language.

### Basics

We’ll start off with simple numeric properties that have no obvious connection to the kinds of meta-mathematical statements that we want to talk about, but we’ll use those to define progressively more and more complex and profound properties, until we finally get to our goal.

# divides n x == True if n divides x without remainder.
pred divides(n, x) = x mod n == 0

pred isPrime(0) = False
pred isPrime(1) = False
pred isPrime(2) = True
pred isPrime(n) = {
all i in 2 to n {
not divides(i, n)
}
}

fun fact(0) = 1
fun fact(n) = n * fact(n - 1)


Almost everything we’re going to do here is built on a common idiom. For anything we want to do arithmetically, we’re going to find a bound – a maximum numeric value for it. Then we’re going to iterate over all of the values smaller than that bound, searching for our target.

For example, what’s the nth prime factor of x? Obviously, it’s got to be smaller than x, so we’ll use x as our bound. (A better bound would be the square root of x, but it doesn’t matter. We don’t care about efficiency!) To find the nth prime factor, we’ll iterate over all of the numbers smaller than our bound x, and search for the smallest number which is prime, which divides x, and which is larger than the n-1th prime factor of x. We’ll translate that into pseudo-code:

fun prFactor(0, x) = 0
fun prFactor(n, x) = {
first y in 1 to x {
isPrime(y) and divides(y, x) and prFactor(n - 1, x) < y
}
}



Similarly, for extracting values from strings, we need to be able to ask, in general, what's the nth prime number? This is nearly identical to prFactor above. The only difference is that we need a different bound. Fortunately, we know that the nth prime number can't be larger than the factorial of the previous prime plus 1.

fun nthPrime(0) = 0
fun nthPrime(n) = {
first y in 1 to fact(nthPrime(n - 1)) + 1  {
isPrime(y) and y > nthPrime(n - 1))
}
}



In composing strings of Gödel numbers, we use exponentiation. Given integers x and n, xn, we can obviously compute them via primitive recursion. I'll define them below, but in the rest of this post, I'll write them as an operator in the language:

fun pow(n, 0) = 1
fun pow(n, i) = n * pow(n, i - 1)



### String Composition and Decomposition

With those preliminaries out of the way, we can get to the point of defining something that's actually about one of the strings encoded in these Gödel numbers. Given a number n encoding a string, item(n, x) is the value of the nth character of x. (This is slow. This is really slow! We're getting to the limit of what a very powerful computer can do in a reasonable amount of time. But this doesn't matter. The point isn't that this is a good way of doing these things: it's that these things are possible. To give you an idea of just how slow this is, I started off writing the stuff in this post in Haskell. Compiled with GHC, which is a very good compiler, using item to extract the 6th character of an 8 character string took around 10 minutes on a 2.4Ghz laptop. In the stuff that follows, we'll be using this to extract characters from strings that could be hundreds of characters long!)

fun item(n, x) = {
first y in 1 to x {
divides(prFactor(n, x) ** y, y) and
not divides(prFactor(n, x)**(y+1), x)
}
}



Given a string, we want to be able to ask how long it is; and given two strings, we want to be able to concatenate them.

fun length(x) = {
first y in 1 to x {
prFactor(y, x) > 0 and prFactor(y + 1, x) == 0
}
}

fun concat(x, y) = {
val lx = length(x)
val ly = length(y)

first z in 1 to nthprime(lx + ly)**(x + y) {
(all n in 1 to lx {
item(n, z) == item(n, x)
}) and (all n in 1 to ly {
item(n + lx, z) == item(n, y)
})
}
}

fun concatl([]) = 0
fun concatl(xs) = {
}

fun seq(x) = 2**x


We want to be able to build statements represented as numbers from other statements represented as numbers. We'll define a set of functions that either compose new strings from other strings, and to check if a particular string is a particular kind of syntactic element.

# x is a variable of type n.
pred vtype(n, x) = {
some z in 17 to x {
isPrime(z) and x == n**z
}
}

# x is a variable
pred isVar(x) = {
some n in 1 to x {
vtype(n, x)
}
}

fun paren(x) =
concatl([gseq(11), x, gseq(13)])

# given the Gödel number for a statement x, find
# the Gödel number for not x.
fun gnot(x) =
concat(gseq(5), paren(x))

# Create the number for x or y.
fun gor(x, y) =
concatl([paren(x), seq(7), paren(y)])

# Create the number for 'forall x(y)'.
fun gforall(x, y) =
concatl([seq(9), seq(x), paren(y)])

# Create the number for x with n invocations of the primitive
# successor function.
fun succn(0, x) = x
fun succn(n, x) = concat(seq(3), succn(n - 1, x))

# Create the number n using successor and 0.
fun gnumber(n) = succn(n, seq(1))

# Check if a statement is type-1.
pred stype_one(x) = {
some m in 1 to x {
m == 1 or (vtype(1, m) and x == succn(n, seq(m))
}
}

# Check if a statement is type n.
pred fstype(1, x) = stype_one(x)
pred fstype(n, x) =
some v in 1 to x {
vtype(n, v) and R(v)
}
}



That last function contains an error: the translation of Gödel that I'm using says R(v) without defining R. Either I'm missing something, or the translator made an error.

### Formulae

Using what we've defined so far, we're now ready to start defining formulae in the basic Principia logic. Forumlae are strings, but they're strings with a constrained syntax.

pred elFm(x) = {
some y in 1 to x {
some z in 1 to x {
some n in 1 to x {
stype(n, y) and stype(n+1, z) and x == concat(z, paren(y))
}
}
}
}



All this is doing is expressing the grammar rule in arithmetic form: an elementary formula is a predicate: P(x), where x is a variable on level n, and P is a variable of level x + 1.

The next grammar rule that we encode this way says how we can combine elementary formulae using operators. There are three operators: negation, conjunction, and universal quantification.

pred op(x, y, z) = {
x == gnot(y) or
x == gor(y, z) or
(some v in 1 to x { isVar(v) and x == gforall(v, y) })
}



And now we can start getting complex. We're going to define the idea of a valid sequence of formulae. x is a valid sequence of formulae when it's formed from a collection of formulae, each of which is either an elementary formula, or is produced from the formulae which occured before it in the sequence using either negation, logical-or, or universal quantification.

In terms of a more modern way of talking about it, the syntax of the logic is a grammar. A formula sequence, in this system, is another way of writing the parse-tree of a statement: the sequence is the parse-tree of the last statement in the sequence.

pred fmSeq(x) = {
all p in 0 to length(x) {
elFm(item(n, x)) or
some p in 0 to (n - 1) {
some q in 0 to (n - 1) {
op(item(n,x), item(p, x), item(q, x))
}
}
}
}



The next one bugs me, because it seems wrong, but it isn't really! It's a way of encoding the fact that a formula is the result of a well-defined sequence of formulae. In order to ensure that we're doing primitive recursive formulae, we're always thinking about sequences of formulae, where the later formulae are produced from the earlier ones. The goal of the sequence of formula is to produce the last formula in the sequence. What this predicate is really saying is that a formula is a valid formula if there is some sequence of formulae where this is the last one in the sequence.

Rephrasing that in grammatical terms, a string is a formula if there is valid parse tree for the grammar that produces the string.

pred isFm(x) = {
some n in 1 to nthPrime(length(x)**2)**(x*length(x)**2) {
fmSeq(n)
}
}



So, now, can we say that a statement is valid because it's parsed according to the grammar? Not quite. It's actually a familiar problem for people who write compilers. When you parse a program in some language, the grammar doesn't usually specify variables must be declared before they're used. It's too hard to get that into the grammar. In this logic, we've got almost the same problem: the grammar hasn't restricted us to only use bound variables. So we need to have ways to check whether a variable is bound in a Gödel-encoded formula, and then use that to check the validity of the formula.

# The variable v is bound in formula x at position n.
pred bound(v, n, x) = {
isVar(v) and isFm(x) and
(some a in 1 to x {
some b in 1 to x {
some c in 1 to x {
x == concatl([a, gforall(v, b), c]) and
isFm(b) and
length(a) + 1 ≤ n ≤ length(a) + length(forall(v, b))
}
}
})
}

# The variable v in free in formula x at position n
pred free(v, n, x) = {
isVar(v) and isFm(x) and
(some a in 1 to x {
some b in 1 to x {
some c in 1 to x {
v == item(n, x) and n ≤ length(x) and not bound(v, n, x)
}
}
})
}

pred free(v, x) = {
some n in 1 to length(x) {
free(v, n, x)
}
}



To do logical inference, we need to be able to do things like replace a variable with a specific infered value. We'll define how to do that:

# replace the item at position n in x with y.
fun insert(x, n, y) = {
first z in 1 to nthPrime(length(x) + length(y))**(x+y) {
some u in 1 to x {
some v in 1 to x {
x == concatl([u, seq(item(n, x)), v]) and
z == concatl([u, y, v]) and
n == length(u) + 1
}
}
}
}



There are inference operations and validity checks that we can only do if we know whether a particular variable is free at a particular position.

# freePlace(k, v, k) is the k+1st place in x (counting from the end)
# where v is free.
fun freePlace(0, v, x) = {
first n in 1 to length(x) {
free(v, n, x) and
not some p in n to length(x) {
free(v, p, x)
}
}
}

fun freePlace(k, v, x) = {
first n in 1 to freePlace(n, k - 1, v) {
free(v, n, x) and
not some p in n to freePlace(n, k - 1, v) {
free(v, p, x)
}
}
}

# number of places where v is free in x
fun nFreePlaces(v, x) = {
first n in 1 to length(x) {
freeplace(n, v, x) == 0
}
}



In the original logic, some inference rules are defined in terms of a primitive substitution operator, which we wrote as subst[v/c](a) to mean substitute the value c for the variable c in the statement a. We'll build that up on a couple of steps, using the freePlaces function that we just defined.

# Subst1 replaces a single instance of v with y.
fun subst'(0, x, v, y) = x
fun subst1(0k, x, v, y) =
insert(subst1(k, x, v, y), freePlace(k, v, x), y)

# subst replaces all instances of v with y
fun subst(x, v, y) = subst'(nFreePlaces(v, x), x, v, y)


The next thing we're going to do isn't, strictly speaking, absolutely necessary. Some of the harder stuff we want to do will be easier to write using things like implication, which aren't built in primitive of the Principia logic. To write those as clearly as possible, we'll define the full suite of usual logical operators in terms of the primitives.

# implication
fun gimp(x, y) = gor(gnot(x), y)

# logical and
fun gand(x, y) = gnot(gor(gnot(x), gnot(y)))

# if/f
fun gequiv(x, y) = gand(gimp(x, y), gimp(y, x))

# existential quantification
fun gexists(v, y) = not(gforall(v, not(y)))


### Axioms

The Peano axioms are valid logical statements, so they have Gödel numbers in this system. We could compute their value, but why bother? We know that they exist, so we'll just give them names, and define a predicate to check if a value matches them.

The form of the Peano axioms used in incompleteness are:

1. Zero: ¬(succ(x1) = 0)
2. Uniqueness: succ(x1) = succ(y1) Rightarrow x = y
3. Induction: x2(0) ∧ ∀x1(x2(x1)⇒ x2(succ(x1))) ⇒ ∀x1(x2(x1))
const pa1 = ...
const pa2 = ...
const pa3 = ...

pred peanoAxiom(x) =
(x == pa1) or (x == pa2) or (x == pa3)


Similarly, we know that the propositional axioms must have numbers. The propositional
axioms are:

1. $p \lor p \Rightarrow p$
2. $p \Rightarrow p \lor q$
3. $p \lor q \Rightarrow p \lor q$
4. $(p \Rightarrow q) \Rightarrow (r \lor p \Rightarrow r \lor q)$

I'll show the translation of the first - the rest follow the same pattern.

# Check if x is a statement that is a form of propositional
# axiom 1: y or y => y
pred prop1Axiom(x) =
some y in 1 to x {
isFm(x) and x == imp(or(y, y), y)
}
}

pred prop2Axiom(x) = ...
pred prop3Axiom(x) = ...
pred prop4Axiom(x) = ...
pred propAxiom(x) = prop2Axiom(x) or prop2Axiom(x) or
prop3Axiom(x) or prop4Axiom(x)


Similarly, all of the other axioms are written out in the same way, and we add a predicate isAxiom to check if something is an axiom. Next is quantifier axioms, which are complicated, so I'll only write out one of them - the other follows the same basic scheme.

The two quantifier axioms are:

1. $\forall v(a) \Rightarrow \text{subst}[v/c](a)$
2. $\forall v(b \lor a) \Rightarrow (b \lor \forall v(a))$
quantifier_axiom1_condition(z, y, v) = {
not some n in 1 to length(y) {
some m in 1 to length(z) {
some w in 1 to z {
w == item(m, z) and bound(w, n, y) and free(v, n, y)
}
}
}
}

pred quantifier1Axiom(x) = {
some v in 1 to x {
some y in 1 to x {
some z in 1 to x {
some n in 1 to x {
vtype(n, v) and stype(n, z) and
isFm(y) and
quantifier_axiom1_condition(z, y, v) and
x = gimp(gforall(v, y), subst(y, v, z))
}
}
}
}
}

quanitifier_axiom2 = ...
isQuantifierAxiom = quantifier1Axiom(x) or quantifier2Axiom(x)


We need to define a predicate for the reducibility axiom (basically, the Principia's version of the ZFC axiom of comprehension). The reducibility axiom is a schema: for any predicate $a$, $\exists u (\forall v (u(v) \Leftrightarrow a$. In our primitive recursive system, we can check if something is an instance of the reducibility axiom schema with:

pred reduAxiom(x) =
some u in 1 to x {
some v in 1 to x {
some y in 1 to x {
some n in 1 to x {
vtype(n, v) and
vtype(n+1, u) and
not free(u, y) and
isFm(y) and
x = gexists(u, gforall(v, gequiv(concat(seq(u), paren(seq(v))), y)))
}
}
}
}
}



Now, the set axiom. In the logic we're using, this is the axiom that defines set equality. It's written as $\forall x_1 (x_2(x_1) \Leftrightarrow y_2(y_1) \Rightarrow x_2 = x_1)$. Set equality is defined for all types of sets, so we need to have one version of axiom for each level. We do that using type-lifting: we say that the axiom is true for type-1 sets, and that any type-lift of the level-1 set axiom is also a version of the set axiom.

fun typeLift(n, x) = {
first y in 1 to x**(x**n) {
all k in 1 to length(x) {
item(k, x) ≤ 13 and item(k, y) == item(k, v) or
item(k, x) > 13 and item(k, y) = item(k, x) * prFactor(1, item(k, x))**n
}
}
}



We haven't defined the type-1 set axiom. But we just saw the axiom above, and it's obviously a simple logical statement. That mean that it's got a Gödel number. Instead of computing it, we'll just say that that number is called sa1. Now we can define a predicate to check if something is a set axiom:

val sa1 = ...
pred setAxiom(x) =
some n in 1 to x {
x = typeLift(n, sa)
}
}



We've now defined all of the axioms of the logic, so we can now create a general predicate to see if a statement fits into any of the axiom categories:

pred isAxiom(x) =
peanoAxiom(x) or propAxiom(x) or quantifierAxom(x) or
reduAxiom(x) or setAxiom(x)


### Proofs and Provability!

With all of the axioms expressible in primitive recursive terms, we can start on what it means for something to be provable. First, we'll define what it means for some statement x to be an immediate consequence of some statements y and z. (Back when we talked about the Principia's logic, we said that x is an immediate consequence of y and z if either: y is the formula z ⇒ x, or if c is the formula ∀v.x).

pred immConseq(x, y, z) = {
y = imp(z, x) or
some v in 1 to x {
isVar(v) and x = forall(v, y)
}
}



Now, we can use our definition of an immediate consequence to specify when a sequence of formula is a proof figure. A proof figure is a sequence of statements where each statement in it is either an axiom, or an immediate consequence of two of the statements that preceeded it.

pred isProofFigure(x) = {
(all n in 0 to length(x) {
isAxiom(item(n, x)) or
some p in 0 to n {
some q in 0 to n {
immConseq(item(n, x), item(p, x), item(q, x))
}
}
}) and
length(x) > 0
}



We can say that x is a proof of y if x is proof figure, and the last statement in x is y.

pred proofFor(x, y) =
isProofFigure(x) and
item(length(x), x) == y


Finally, we can get to the most important thing! We can define what it means for something to be provable! It's provable if there's a proof for it!

pre provable(x) =
some y {
proofFor(y, x)
}
}



Note that this last one is not primitive recursive! There's no way that we can create a bound for this: a proof can be any length.

At last, we're done with these definition. What we've done here is really amazing: now, every logical statement can be encoded as a number. Every proof in the logic can be encoded as a sequence of numbers: if something is provable in the Principia logic, we can encode that proof as a string of numbers, and check the proof for correctness using nothing but (a whole heck of a lot of) arithmetic!

Next post, we'll finally get to the most important part of what Gödel did. We've been able to define what it means for a statement to be provable - we'll use that to show that there's a way of creating a number encoding the statement that something is not provable. And we'll show how that means that there is a true statement in the Principia's logic which isn't provable using the Principia's logic, which means that the logic isn't complete.

In fact, the proof that we'll do shows a bit more than that. It doesn't just show that the Principia's logic is incomplete. It shows that any consistent formal system like the Principia, any system which is powerful enough to encode Peano arithmetic, must be incomplete.

# Godel part 2: Arithmetic and Logic

In the last post, we saw how to take statements written in the logic of the Principia Mathematica, and convert them into numerical form using Gödel numbering. For the next step in Gödel’s proof, we need to go meta-mathematical.

Ultimately, we want to write first-order statements that can reason about first order statements. But the entire structure of the principia and its logic is designed to make
that impossible. First order statements can only reason about numbers and their properties.

But now, we’ve got the ability to represent statements – first order, second order, third order, any order. What we still need is a way of describing the properties of those numerical statements in terms of operations that can be expressed using nothing but first order statements.

The basic trick to incompleteness is that we’re going to use the numerical encoding of statements to say that a predicate or relation is represented by a number. Then we’re going to write predicates about predicates by defining predicates on the numerical representations of the first-order predicates. That’s going to let us create a true statement in the logic that can’t be proven with the logic.

To do that, we need to figure out how to take our statements and relations represented as numbers, and express properties of those statements and relations in terms of arithmetic. To do that, we need to define just what it means to express something arithmetically. Gödel did that by defining “arithmetically” in terms of a concept called primitive recursion.

I learned about primitive recursion when I studied computational complexity. Nowadays, it’s seen as part of theoretical computer science. The idea, as we express it in modern terms, is that there are many different classes of computable functions. Primitive recursion is one of the basic complexity classes. You don’t need a Turing machine to compute primitive recursive functions – they’re a simpler class.

The easiest way to understand primitive recursion is that it’s what you get in a programming language with integer arithmetic, and simple for-loops. The only way you can iterate is by repeating things a bounded number of times. Primitive recursion has a lot of interesting properties: the two key ones for our purposes here are: number theoretic proofs are primitive recursive, and every computation of a primitive recursive function is guaranteed to complete within a bounded amount of time.

The formal definition of primitive recursion, the way that Gödel wrote it, is quite a bit more complex than that. But it means the same thing.

We start with what it means to define a formula via primitive recursion. (Note the language that I used there: I’m not explaining what it means for a function to be primitive recursive; I’m explaining what it means to be defined via primitive recursion.) And I’m defining formulae, not functions. In Gödel’s proof, we’re always focused on numerical reasoning, so we’re not going to talk about programs or algorithms, we’re going to about the definition of formulae.

A formula $phi(x_1, x_2, ..., x_n)$ is defined via primitive recursion if, for some other formulae $\psi$ and $\mu$:

• Base: $\phi(0, x_2, ..., x_n) = \psi(x_2, ..., x_n)$
• Recursive: $\phi(i+1, x_2, ..., x_n) = \mu(i, \phi(i, x_2, ..., x_n), x_2, ..., x_n)$.

So, basically, the first parameter is a bound on the number of times that $phi$ can invoked recursively. When it’s 0, you can’t invoke $\phi$ any more.

A formula is primitive recursive if it defined from a collection of formulae $\phi_1, ..., \phi_n$ where any formula $\phi_i$ is defined via primitive recursion from $\phi_1, ..., \phi_{i-1}$, or the primitive succ function from Peano arithmetic.

For any formula $phi_i$ in that sequence, the degree of the formula is the number of other primitive recursive formulae used in its definition.

Now, we can define a primitive recursive property: $R(x_1, ..., x_n)$ is primitive recursive if and only if there exists a primitive recursive function $\phi$ such that $\phi(x_1, ..., x_n) = 0$.

With primitive recursive formulae and relations defined, there’s a bunch of theorems about how you can compose primitive recursive formulae and relations:

1. Every function or relation that you get by substituting a primitive recursive function for a variable in a primitive recursive function/relation is primitive recursive.
2. If R and S are primitive relations, then ¬R, R∧S, R∨S are all primitive recursive.
3. If $\phi(x_1, ..., x_n)$ and $\psi(x_1, ..., x_n)$ are primitive recursive functions, then the relation $R(x_1, ..., x_n) \Leftrightarrow (\phi(x_1, ..., x_n) = \psi(x_1, ..., x_n)$ is also primitive recursive.
4. Let $xv$ and $zv$ be finite-length tuples of variables. If the function $\phi(xv)$ and the relation $R(y, zv)$ are primitive recursive, then so are the relations:
• $S(xv, zv) \Leftrightarrow (\exists y \le \phi(xv). R(y, zv))$
• $T(xv, zv) \Leftrightarrow (\forall y \le A(xv). R(y, zv))$
5. Let $xv$ and $zv$ be finite-length tuples of variables. And let $text{argmin}[y le f(x).R(x)]$ be the smallest value of $x$ for which $y le f(x)$ and $R(x)$ is true, or 0 if there is no such value. Then if the function $phi(xv)$ and the relation $R(y, zv)$ are primitive recursive, then so is the function $P(xv, zv) = (\text{argmin}[y \le A(xv). R(y, zv))]$.

By these definitions, addition, subtraction, multiplication, and integer division are all primitive recursive.

Ok. So, now we’ve got all of that out of the way. It’s painful, but it’s important. What we’ve done is come up with a good formal description of what it means for something to be an arithmetic property: if we can write it as a primitive recursive relation or formula, it’s arithmetic.

# Godel Numbering

The first step in Gödel’s incompleteness proof was finding a way of taking logical statements and encoding them numerically. Looking at this today, it seems sort-of obvious. I mean, I’m writing this stuff down in a text file – that text file is a stream of numbers, and it’s trivial to convert that stream of numbers into a single number. But when Gödel was doing it, it wasn’t so obvious. So he created a really clever mechanism for numerical encoding. The advantage of Gödel’s encoding is that it makes it much easier to express properties of the encoded statements arithmetically. (Arithmetically means something very specific here; we’ll see what in a later post.

Before we can look at how Gödel encoded his logic into numbers, we need to look at the logic that he used. Gödel worked with the specific logic variant used by the Principia Mathematica. The Principia logic is minimal and a bit cryptic, but it was built for a specific purpose: to have a minimal syntax, and a complete but minimal set of axioms.

The whole idea of the Principia logic is to be purely syntactic. The logic is expected to have a valid model, but you shouldn’t need to know anything about the model to use the logic. Russell and Whitehead were deliberately building a pure logic where you didn’t need to know what anything meant to use it. I’d really like to use Gödel’s exact syntax – I think it’s an interestingly different way of writing logic – but I’m working from a translation, and the translator updated the syntax. I’m afraid that switching between the older Gödel syntax, and the more modern syntax from the translation would just lead to errors and confusion. So I’m going to stick with the translation’s modernization of the syntax.

The basic building blocks of the logic are variables. Already this is a bit different from what you’re probably used to in a logic. When we think of logic, we usually consider predicates to be a fundamental thing. In this logic, they’re not. A predicate is just a shorthand for a set, and a set is represented by a variable.

Variables are stratified. Again, it helps to remember where Russell and Whitehead were coming from when they were writing the Principia. One of their basic motivations was avoiding self-referential statements like Russell’s paradox. In order to prevent that, they thought that they could create a stratified logic, where on each level, you could only reason about objects from the level below. A first-order predicate would be a second-level object could only reason about first level objects. A second-order predicate would be a third-level object which could reason about second-level objects. No predicate could ever reason about itself or anything on its on level. This leveling property is a fundamental property built into their logic. The way the levels work is:

• Type one variables, which range over simple atomic values, like specific single natural numbers. Type-1 variables are written as $a_1$, $b_1$.
• Type two variables, which range over sets of atomic values, like sets of natural numbers. A predicate, like IsOdd, about specific natural numbers would be represented as a type-2 variable. Type-2 variables are written $a_2$, $b_2$, …
• Type three variables range over sets of sets of atomic values. The mappings of a function could be represented as type-3 variables: in set theoretic terms, a function is set of ordered pairs. Ordered pairs, in turn, can be represented as sets of sets – for example, the ordered pair (1, 4) would be represented by the set { {1}, {1, 4} }. A function, in turn, would be represented by a type-4 variable – a set of ordered pairs, which is a set of sets of sets of values.

Using variables, we can form simple atomic expressions, which in Gödel’s terminology are called signs. As with variables, the signs are divided into stratified levels:

• Type-1 signs are variables, and successor expressions. Successor expressions are just Peano numbers written with “succ”: 0, succ(0), succ(succ(0)), succ(a1), etc.
• Signs of any type greater than 1 are just variables of that type/level.

Once you have signs, you can assemble the basic signs into formulae. Gödel explained how to build formulae in a classic logicians form, which I think is hard to follow, so I’ve converted it into a grammar:

 elementary_formula → signn+1(signn)
formula → ¬(elementary_formula)
formula → (elementary_formula) or (elementary_formula)
formula → ∀ signn (elementary_formula)


That’s the entire logic! It’s tiny, but it’s enough. Everything else from predicate logic can be defined in terms of combinations of these basic formulae. For example, you can define logical “and” in terms of negation and logical “or”: (a ∧ b) ⇔ ¬ (¬ a ∨ ¬ b).

With the syntax of the system set, the next thing we need is the basic axioms of logical inference in the system. In terms of logic the way I think of it, these axioms include both “true” axioms, and the inference rules defining how the logic works. There are five families of axioms.

• First, there’s the Peano axioms, which define the natural numbers.
1. $\lnot \text{succ}(x_1) = 0)$: 0 is a natural number, and it’s not the successor of anything.
2. $\text{succ}(x_1) = \text{succ}(y_1) \Rightarrow x_1 = y_1$: Successors are unique.
3. $(x_2(0) \land \forall x_1 (x_2(x_1) \Rightarrow x_2(succ(x_1)) \Leftrightarrow \forall x_1 (x_2(x_1))$: induction works on the natural numbers.
• Next, we’ve got a set of basic inference rules about simple propositions. These are defined as axiom schemata, which can be instantiated for any set of formalae $p$, $q$, and $r$.
1. $p \lor p \Rightarrow p$
2. $p \Rightarrow p \lor q$
3. $p \lor q \Rightarrow q \lor p$
4. $(p \Rightarrow q) \rightarrow (p \lor r) \Rightarrow q \lor r$
• Axioms that define inference over quantification. $v$ is a variable, $a$ is any formula, $b$ is any formula where $v$ is not a free variable, and $c$ is a sign of the same level as $v$, and which doesn’t have any free variables that would be bound if it were inserted as a replacement for $v$.
1. $\forall v(a) \Rightarrow \text{subst}[v/c](a)$: if formula $a$ is true for all values of $v$, then you can substitute any specific value $c$ for $v$ in $a$, and $a$ must still be true.
2. $(\forall v (b \lor a)) \Rightarrow (b \lor \forall v(a))$
• The Principia’s version of the set theory axiom of comprehension:
$\exists u (\forall v (u(v) \Rightarrow a))$
• And last but not least, an axiom defining set equivalence:
$\forall x_i (x_{i+1}(x_i) \Rightarrow y_{i+1}(y_i)) \Rightarrow x_{i+1} = y_{i+1}$

So, now, finally, we can get to the numbering. This is quite clever. We’re going to use the simplest encoding: for every possible string of symbols in the logic, we’re going to define a representation as a number. So in this representation, we are not going to get the property that every natural number is a valid formula: lots of natural numbers won’t be. They’ll be strings of nonsense symbols. (If we wanted to, we could make every number be a valid formula, by using a parse-tree based numbering, but it’s much easier to just let the numbers be strings of symbols, and then define a predicate over the numbers to identify the ones that are valid formulae.)

We start off by assigning numbers to the constant symbols:

Symbols Numeric Representation
0 1
succ 3
¬ 5
7
9
( 11
) 13

Variables will be represented by powers of prime numbers, for prime numbers greater that 13. For a prime number p, p will represent a type one variable, p2 will represent a type two variable, p3 will represent a type-3 variable, etc.

Using those symbol encodings, we can take a formula written as symbols x1x2x3…xn, and encode it numerically as the product 2x13x25x2…pnxn, where pn is the nth prime number.

For example, suppose I wanted to encode the formula: ∀ x1 (y2(x1)) ∨ x2(x1).

First, I’d need to encode each symbol:

1. “∀” would be 9.
2. “x1“” = 17
3. “(” = 11
4. “y2” = 192 = 361
5. “(” = 11
6. “x1” = 17
7. “)” = 13
8. “∨” = 7
9. “x2” = 172 = 289
10. “(” = 11
11. “x1” = 17
12. “)” = 13
13. “)” = 13

The formula would thus be turned into the sequence: [9, 17, 11, 361, 11, 17, 13, 7, 289, 11, 17, 13, 13]. That sequence would then get turned into a single number 29 317 511 7361 1111 1317 1713 197 23289 2911 3117 3713 4113, which according to Hugs is the number (warning: you need to scroll to see it. a lot!):

1,821,987,637,902,623,701,225,904,240,019,813,969,080,617,900,348,538,321,073,935,587,788,506,071,830,879,280,904,480,021,357,988,798,547,730,537,019,170,876,649,747,729,076,171,560,080,529,593,160,658,600,674,198,729,426,618,685,737,248,773,404,008,081,519,218,775,692,945,684,706,455,129,294,628,924,575,925,909,585,830,321,614,047,772,585,327,805,405,377,809,182,961,310,697,978,238,941,231,290,173,227,390,750,547,696,657,645,077,277,386,815,869,624,389,931,352,799,230,949,892,054,634,638,136,137,995,254,523,486,502,753,268,687,845,320,296,600,343,871,556,546,425,114,643,587,820,633,133,232,999,109,544,763,520,253,057,252,248,982,267,078,202,089,525,667,161,691,850,572,084,153,306,622,226,987,931,223,193,578,450,852,038,578,983,945,920,534,096,055,419,823,281,786,399,855,459,394,948,921,598,228,615,703,317,657,117,593,084,977,371,635,801,831,244,944,526,230,994,115,900,720,026,901,352,169,637,434,441,791,307,175,579,916,097,240,141,893,510,281,613,711,253,660,054,258,685,889,469,896,461,087,297,563,191,813,037,946,176,250,108,137,458,848,099,487,488,503,799,293,003,562,875,320,575,790,915,778,093,569,309,011,025,000,000,000.

Next, we’re going to look at how you can express interesting mathematical properties in terms of numbers. Gödel used a property called primitive recursion as an example, so we’ll walk through a definition of primitive recursion, and show how Gödel expressed primitive recursion numerically.