Category Archives: Type Theory

Inference with Sets in Type Theory

It’s been quite a while since I did any meaningful writing on type theory. I spent a lot of time introducing the basic concepts, and trying to explain the intuition behind them. I also spent some time describing what I think of as the platform: stuff like arity theory, which we need for talking about type construction. But once we move beyond the basic concepts, I ran into a bit of a barrier – no so much in understanding type theory, but in finding ways of presenting it that will be approacha ble.

I’ve been struggling to figure out how to move forward in my exploration of type theory. The logical next step is working through the basics of intuitionistic logic with type theory semantics. The problem is, that’s pretty dry material. I’ve tried to put together a couple of approaches that skip over this, but it’s really necessary.

For someone like me, coming from a programming language background, type theory really hits its stride when we look at type systems and particularly type inference. But you can’t understand type inference without understanding the basic logic. In fact, you can’t describe the algorithm for type inference without referencing the basic inference rules of the underlying logic. Type inference is nothing but building type theoretic proofs based on a program.

So here we are: we need to spend some time looking at the basic logic of type theory. We’ve looked at the basic concepts that underlie the syntax and semantics, so what we need to do next is learn the basic rules that we use to build logical statements in the realm of type theory. (If you’re interested in more detail, this is material from chapter 5 of “Programming in Martin-Löof’s Type Theory”, which is the text I’m using to learn this material.)

Martin Löoff’s type theory is a standard intuitionistic predicate logic, so we’ll go through the rules using standard sequent notation. Each rule is a sequence which looks sort-of like a long fraction. The “numerator” section is a collection of things which we already know are true; the “denominator” is something that we can infer given those truths. Each statement has the form A[\Gamma], where A is a statement, and B is a set of assumptions. For example, F(a) [a \in A, \Delta] means that F(a) is true, provided we’re in a context that includes a \in A.

Personally, I find that this stuff feels very abstract until you take the logical statements, and interpret them in terms of programming. So throughout this post, I’ll do that for each of the rules.

With that introduction out of the way, let’s dive in to the first set of rules.

Simple Introductions

We’ll start off with a couple of really easy rules, which allow us to introduce a variable given a type, or a type given a variable.

Introducing Elements

\frac{A \,\text{type}}{a \in A \,[a \in A]}

This is an easy one. It says that if we know that A is a type, then we can introduce the statement that a \in A, and add that as an assumption in our context. What this means is also simple: since our definition of type says that it’s only a type if it has an element, then if we know that A is a type, we know that there must be an element of A, and so we can write statements using it.

If you think of this in programming terms, the statement A \text{type} is saying that A is a type. To be a valid type, there must be at least one value that belongs to the type. So you’re allowed to introduce a variable that can be assigned a value of the type.

Introducing Propositions as Types

 \frac{a \in A \, []}{A \, \text{true}}

This is almost the mirror image of the previous. A type and a true proposition are the same thing in our type theory: a proposition is just a type, which is a set with at least one member. So if we know that there’s a member of the set A, then A is both a type and a true proposition.

Equality Rules

We start with the three basic rules of equality: equality is reflexive, symmetric, and transitive.

Reflexivity

 \frac{a \in A}{a = a \in A}

 \frac{A \, \text{type}}{A = A}

If a is an element of a type A, then a is equal to itself in type A; and if A is a type, then A is equal to itself.

The only confusing thing about this is just that when we talk about an object in a type, we make reference to the type that it’s a part of. This makes sense if you think in terms of programming: you need to declare the type of your variables. “3: Int” doesn’t necessarily mean the same object as “3: Real”; you need the type to disambiguate the statement. So within type theory, we always talk about values with reference to the type that they’re a part of.

Symmetry

\frac{a = b \in A}{b = a \in A}

\frac{A = B}{B = A}

No surprises here – standard symmetry.

Transitivity

\frac{a = b \in A \quad b = c \in A}{a = c \in A}

\frac{A = B \quad B = C}{A = C}

Type Equality

\frac{a \in A \quad A = B}{a \in B}

\frac{a = b \in A \quad A = B}{a = b \in B}

These are pretty simple, and follow from the basic equality rules. If we know that a is a member of the type A, and we know that the type A equals the type B, then obviously a is also a member of B. Along the same lines, if we know that a=b in type A, and A equals B, then a=b in the type B.

Substitution Rules

We’ve got some basic rules about how to formulate some simple meaningful statements in the logic of our type theory. We still can’t do any interesting reasoning; we haven’t built up enough inference rules. In particular, we’ve only been looking at simple, atomic statements using parameterless predicates.

We can use those basic rules to start building upwards, to get to parametric statements, by using substitution rules that allow us to take a parametric statement and reason with it using the non-parametric rules we talked about above.

For example, a parametric statement can be something like C(x) \, \text{type} [x \in A], which says that applying C to a value x which is a member of type A produces a value which is a type. We can use that to produce new inference rules like the ones below.

 \frac{C(x) \, \text{type} [x \in A] \quad a \in A}{C(a) \, \text{type}}

This says that if we know that given a of type A, C will produce a type; and we know that the value a is of type A, then C(a) will be a type. In logical terms, it’s pretty straightforward; in programming terms it’s even clearer: if C is a function on type A, and we pass it a value of type A, it will produce a result. In other words, C(a) is defined for all values of type A.

 \frac{C(x) \, \text{type}[x \in A] \quad a = b \in A}{C(a) = C(b)}

This is even simpler: if C is a function on type A, then given two values that are equal in type A, C will produce the same result for those values.

Of course, I’m lying a bit. In this stuff, C isn’t really a function. It’s a logical statement; C(x) isn’t quite a function. It’s a logical stamement which includes the symbol x; when we say C(a), what we mean is the logical statement C, with the object a substituted for the symbol x. But I think the programming metaphors help clarify what it means.

Using those two, we can generate more:

 \frac{c(x) \in C(x) [x \in A] \quad a \in A}{c(a) \in C(A)}

This one becomes interesting. C(x) is a proposition which is parametric in x. Then c(x) is a proof-element: it’s an instance of C(x) which proves that C(X) is a type, and we can see c as a computation which, given an element of a produces a instance of C. Then what this judgement says is that given an instance a of type A, we know that c(a) is an instance of type C(a). This will become very important later on, when we really get in to type inference and quantification and parametric types.

\frac{c(x) \in C(x) [x \in A] \quad a = b}{c(a)=c(b)\in C(a)}

This is just a straightforward application of equality to proof objects.

There’s more of these kinds of rules, but I’m going to stop here. My goal isn’t to get you to know every single judgement in the intuitionistic logic of type theory, but to give you a sense of what they mean.

That brings us to the end of the basic inference rules. The next things we’ll need to cover are ways of constructing new types or types from existing ones. The two main tools for that are enumeration types (basically, types consisting of a group of ordered values), and cartesian products of multiple types. With those, we’ll be able to find ways of constructing most of the types we’ll want to use in programming languages.

A Review of Type Theory (so far)

I’m trying to get back to writing about type theory. Since it’s been quite a while since the last type theory post, we’ll start with a bit of review.

What is this type theory stuff about?

The basic motivation behind type theory is that set theory isn’t the best foundation for mathematics. It seems great at first, but when you dig in deep, you start to see cracks.

If you start with naive set theory, the initial view is amazing: it’s so simple! But it falls apart: it’s not consistent. When you patch it, creating axiomatic set theory, you get something that isn’t logically inconsistent – but it’s a whole lot more complicated. And while it does fix the inconsistency, it still gives you some results which seem wrong.

Type theory covers a range of approaches that try to construct a foundational theory of mathematics that has the intuitive appeal of axiomatic set theory, but without some of its problems.

The particular form of type theory that we’ve been looking at is called Martin-Löf type theory. M-L type theory is a constructive theory of mathematics in which computation plays a central role. The theory rebuilds mathematics in a very concrete form: every proof must explicitly construct the objects it talks about. Every existence proof doesn’t just prove that something exists in the abstract – it provides a set of instructions (a program!) to construct an example of the thing that exists. Every proof that something is false provides a set of instructions (also a program!) for how to construct a counterexample that demonstrates its falsehood.

This is, necessarily, a weaker foundation for math than traditional axiomatic set theory. There are useful things that are provable in axiomatic set theory, but which aren’t provable in a mathematics based on M-L type theory. That’s the price you pay for the constructive foundations. But in exchange, you get something that is, in many ways, clearer and more reasonable than axiomatic set theory. Like so many things, it’s a tradeoff.

The constructivist nature of M-L type theory is particularly interesting to wierdos like me, because it means that programming becomes the foundation of mathematics. It creates a beautiful cyclic relationship: mathematics is the foundation of programming, and programming is the foundation of mathematics. The two are, in essence, one and the same thing.

The traditional set theoretic basis of mathematics uses set theory with first order predicate logic. FOPL and set theory are so tightly entangled in the structure of mathematics that they’re almost inseparable. The basic definitions of type theory require logical predicates that look pretty much like FOPL; and FOPL requires a model that looks pretty much like set theory.

For our type theory, we can’t use FOPL – it’s part of the problem. Instead, Martin-Lof used intuitionistic logic. Intuitionistic logic plays the same role in type theory that FOPL plays in set theory: it’s deeply entwined into the entire system of types.

The most basic thing to understand in type theory is what a logical proposition means. A proposition is a complete logical statement no unbound variables and no quantifiers. For example, “Mark has blue eyes” is a proposition. A simple proposition is a statement of fact about a specific object. In type theory, a proof of a proposition is a program that demonstrates that the statement is true. A proof that “Mark has blue eyes” is a program that does something like “Look at a picture of Mark, screen out everything but the eyes, measure the color C of his eyes, and then check that C is within the range of frequencies that we call “blue”. We can only say that a proposition is true if we can write that program.

Simple propositions are important as a starting point, but you can’t do anything terribly interesting with them. Reasoning with simple propositions is like writing a program where you can only use literal values, but no variables. To be able to do interesting things, you really need variables.

In Martin-Lof type theory, variables come along with predicates. A predicate is a statement describing a property or fact about an object (or about a collection of objects) – but instead of defining it in terms of a single fixed value like a proposition, it takes a parameter. “Mark has blue eyes” is a proposition; “Has blue eyes” is a predicate. In M-L type theory, a predicate is only meaningful if you can write a program that, given an object (or group of objects) as a parameter, can determine whether or no the predicate is true for that object.

That’s roughly where we got to in type theory before the blog went on hiatus.

Truth in Type Theory

Now, we’re getting to the heart of type theory: judgements. Judgements are the basic logical statements that we want to be able to prove about computations in type theory. There’s a basic set of judgements that we want to be able to make.

I keep harping on this, but it’s the heart of type theory: type theory is all about understanding logical statements as specifications of computations. Or, more precisely, in computer science terms, they’re about understanding true logical statements as halting computations.

In this post, we’ll see the ultimate definition of truth in type theory: every logical proposition is a set, and the proposition is true if the set has any members. A non-halting computation is a false statement – because you can never get it to resolve an expression to a canonical value.

So remember as we go through this: judgements are based on the idea of logical statements as specifications of computations. So when we talk about a predicate P, we’re using its interpretation as a specification of a computation. When we look at an expression 3+5, we understand it not as a piece of notation that describes the number 8, but as a description of a computation that adds 3 to 5. “3+5” not the same computation as “2*4” or “2+2+2+2”, but as we’ll see, they’re equal because they evaluate to the same thing – that is, they each perform a computation that results in the same canonical value – the number 8.

In this post, we’re going to focus on a collection of canonical atomic judgement types:

A \text{set}
This judgement says that A is a set.
A = B
A and Bare equal sets.
a \in A
a is an element of the set A
a_1 == a_2 \in A
a_1 and a_2 are equal members of the set A.
A is a proposition.
The proposition A is true.

The definition of the meanings of judgements is, necessarily, mutually recursive, so we’ll have to go through all of them before any of them is complete.

An object A is a Set
When I say that A is a set in type theory, that means that:

  • I know the rules for how to form the canonical expressions for the set;
  • I’ve got an equivalence relation that says when two canonical members of the set are equal.
Two Sets are Equal

When I say that A and B are equal sets, that means:

  • A and B are both sets.
  • If a is a canonical member of A, then a is also a canonical member of B, and vice versa.
  • If a and b are canonical members of A, and they’re also equal in A, then a and b are also equal canonical members of B (and vice versa).

The only tricky thing about the definition of this judgement is the fact that it defines equality is a property of a set, not of the elements of the set. By this definition, tt’s possible for two expressions to be members of two different sets, and to be equal in one, but not equal in the other. (We’ll see in a bit that this possibility gets eliminated, but this stage of the definition leaves it open!)

An object is a member of a set
When I say a \in A, that means that if I evaluate a, the result will be a canonical member of A.
Two members of a set are equal
If a \in A and b \in A and a = b, that means when if you evaluate a, you’ll get a canonical expression a; and when you evaluate b, you’ll get a canonical expression b. For a == b to be true, then a, that is, the canonical expressions resulting from their evaluation must also be equal.

This nails down the problem back in set equivalence: since membership equivalence is defined in terms of evaluation as canonical values, and every expression evaluates to exactly one canonical expression (that’s the definition of canonical!), then if two objects are equal in a set, they’re equal in all sets that they’re members of.

An object A is a proposition
Here’s where type theory really starts to depart from the kind of math that we’re used to. In type theory, a proposition is a set. That’s it: to be a proposition, A has to be a set.
The proposition A is true
And the real meat of everything so far: if we have a proposition A, and A is true, what that means is that a has at least one element. If a proposition is a non-empty set, then it’s true. If it’s empty, it’s false.

Truth in type theory really comes down to membership in a set. This is, subtly, different from the predicate logic that we’re familiar with. In predicate logic, a quantified proposition can, ultimately, be reduced to a set of values, but it’s entirely reasonable for that set to be empty. I can, for example, write a logical statement that “All dogs with IQs greater that 180 will turn themselves into cats.” A set-based intepretation of that is the collection of objects for which it’s true. There aren’t any, because there aren’t any dogs with IQs greater than 180. It’s empty. But logically, in terms of predicate logic, I can still say that it’s “true”. In type theory, I can’t: to be true, the set has to have at least one value, and it doesn’t.

In the next post, we’ll take these atomic judgements, and start to expand them into the idea of hypothetical judgements. In the type-theory sense, that means statements require some other set of prior judgements before they can be judged. In perhaps more familiar terms, we’ll be looking at type theory’s equivalent of the contexts in classical sequent calculus – those funny little \Gammas that show up in all of the sequent rules.

Canonical Expressions in Type Theory

Sorry for the gap in posts. I’ve been trying to post more regularly, and was just hitting a rhythm, when my son brought home a particularly vicious bug, and I got sick. I’ve spent the last couple of weeks being really, really sick, and then trying to get caught up at work. I’m mostly recovered, except for some lingering asthma, so I’m trying to get back to that twice-per-week posting schedule.

In the last couple of posts, we looked at Martin-Löf’s theory of expressions. The theory of expressions is purely syntactic: it’s a way of understanding the notation of expressions. Everything that we do in type theory will be written with expressions that follow the syntax, the arity, and the definitional equivalency rules of expression theory.

The next step is to start to understand the semantics of expressions. In type theory, when it comes to semantics, we’re interested in two things: evaluation and judgements. Evaluation is the process by which an expression is reduced to its simplest form. It’s something that we care about, but it’s not really a focus of type theory: type theory largely waves its hands in the air and says “we know how to do this”, and opts for normal-order evaluation. Judgements are where things get interesting.

Judgements are provable statements about expressions and the values that they represent. As software people, when we think about types and type theory, we’re usually thinking about type declarations: type declarations are judgements about the expressions that they apply to. When you write a type declaration in a programming language, what you’re doing is asserting the type theory judgement. When the compiler “type-checks” your program, what it’s doing in type theory terms is checking that your judgements are proven by your program.

For example, we’d like to be able to make the judgement A set – that is, that A is a set. In order to make the judgement that A is a set in type theory, we need to know two things:

  1. How are canonical instances of the set A formed?
  2. Given two canonical instances of A, how can we determine if they’re equal?

To understand those two properties, we need to take a step back. What is a canonical instance of a set?

If we think about how we use predicate logic, we’re always given some basic set of facts as a starting point. In type theory, the corresponding concept is a primitive constant. The primitive constants include base values and primitive functions. For example, if we’re working with lispish expressions, then cons(1, cons(2, nil)) is an expression, and cons, nil, 1 and 2 are primitive constants; cons is the head of the expression, and 1 and cons(2, nil) are the arguments.

A canonical expression is a saturated, closed expression whose head is a primitive constant.

The implications of this can be pretty surprising, because it means that a canonical expression can contain unevaluated arguments! The expression has to be saturated and closed – so its arguments can’t have unbound variables, or be missing parameters. But it can contain unevaluated subexpressions. For example, if we were working with Peano arithmetic in type theory, succ(2+3) is canonical, even though “2+3” hasn’t be evaluated.

In general, in type theory, the way that we evaluate an expression is called normal order evaluation – what programming language people call lazy evaluation: that’s evaluating from the outside in. Given a non-canonical expression, we evaluate from the outside in until we get to a canonical expression, and then we stop. A canonical expression is considered the result of a computation – so we can see succ(2+3) as a result!

A canonical expression is the evaluated form of an expression, but not the fully evaluated form. The fully evaluated form is when the expression and all of its saturated parts are fully evaluated. So in our previous example, the saturated part 2+3 wasn’t evaluated, so it’s not fully evaluated. To get it to be fully evaluated, we’d need to evaluate 2+3, giving us succ(5); then, since succ(5) is saturated, it’s evaluated to 6, which is the fully evaluated form of the expression.

Next post (coming monday!), we’ll use this new understanding of canonical expressions, and start looking at judgements, and what they mean. That’s when type theory starts getting really fun and interesting.

Expressions and Arity (Part 2): Equivalence

Continuing where I left off: we were talking about arity in Martin-Löf’s theory of expressions. There are two basic problems that arity solves: it makes certain kinds of impossible-to-evaluate expressions be invalid in the theory; and it helps enable some way of comparing expressions for equality. Arity solves both of those problems by imposing a simple type system over expressions.

At the end of the last post, I started giving a sketch of what arities look like. Now we’re going to dive in, and take a look at how to determine the arity of an expression. It’s a fairly simple system of rules.

Before diving in, I want to stress the most important thing about the way that these rules work is that the expressions are totally syntactic and static. This is something that confused me the first time I tried to read about expression theory. When you see an expression, you think about how it’s evaluated. But expression theory is a purely syntactic theory: it’s about analyzing expressions as syntactic entities. There are, deliberately, no evaluation rules. It’s about understanding what the notations mean, and how to determine when two expressions are equivalent.

If, under the rules of Martin-Löf’s expression theory, two expressions are equivalent, then if you were to chose a valid set of evaluation rules, the two expressions will evaluate to the same value. But expression equivalence is stronger: expressions are equivalent only if you can prove their equivalence from their syntax.

That clarified, let’s start by looking at the rules of arity in expressions.

Variables and Constants
Every variable and every primitive constant has a pre-defined arity; if x is a variable or primitive constant with arity \alpha, then the expression x has arity \alpha.
Definitions
In a definition D := e, the arity of the defined name D is the same as the arity of the expression e.
Applications
If a is an expression of arity \alpha \twoheadrightarrow \beta, and b is a expression of arity \alpha, then a(b) is an expression of arity \beta.
Abstractions
If e is an expression of arity \beta and x is a variable of arity \alpha, then (x)e is an expression of arity \alpha \twoheadrightarrow \beta.
Combinations
If e_1 is an expression af arity \alpha_1, e_2 is an expression of arity \alpha_2, …, and e_n is an expression of arity \alpha_n, then a combination expression e_1, e_2, ..., e_n is an expression of arity \alpha_1 \otimes \alpha_2 \otimes \ldots \otimes \alpha_n.
Selections
If e is an expression of arity \alpha_1 \otimes \alpha_2 \otimes \ldots \otimes \alpha_n where n \ge 2, then (e).i is an expression af arity e_i.

Let’s try working through an example: x^2 + 3x + 7.

  1. As we saw in this post, this is equivalent to the simple AST-form: (x)+(+(*(x,x), *(3, x)),7).
  2. x” is a variable of arity 0; “3” and “7” are constants of arity 0; “+” and “*” are constants of arity (0 \otimes 0) \twoheadrightarrow 0.
  3. From the combination rule, since x and 3 both have arity 0, (x, x) and (3, x) each have arity 0 \otimes 0.
  4. Since (x, x) has arity 0 \otimes 0, and * has arity (0 \otimes 0) \twoheadrightarrow 0, *(x, x) has arity 0. The same thing works for *(3, x).
  5. Since the arities of the *(x, x) and *(3, x) are both 0, the combination of the pair (the arguments to the inner “+”) are 0 \otimes 0, and the arity of the inner sum expression is thus 0.
  6. Since 7 has arity 0, the combination of it with the inner sum is 0 \otimes 0, and the arity of the outer sum is 0.
  7. Since x is a variable of arity 0, and outer sum expression has arity 0, the abstract has arity 0 \twoheadrightarrow 0.

If you’re familiar with type inference in the simply typed lambda calculus, this should look pretty familiar; the only real difference is that the only thing that arity really tracks is applicability and parameter counting.

Just from this much, we can see how this prevents problems. If you try to compute the arity of 3.1 (that is, the selection of the first element from 3), you find that you can’t: there is no arity rule that would allow you to do that. The selection rule only works on a product-arity, and 3 has arity 0.

The other reason we wanted arity was to allow us to compare expressions. Intuitively, it should be obvious that the expression e and the expression (x)e(x) are in some sense equal. But we need some way of being able to actually precisely define that equality.

The kind of equality that we’re trying to get at here is called definitional equality. We’re not trying to define equality where expressions a and b evaluate to equal values – that would be easy. Instead, we’re trying to get at something more subtle: we want to capture the idea that the expressions are different ways of writing the same thing.

We need arity for this, for a simple reason. Let’s go back to that first example expression: (x)+(+(*(x,x), *(3, x)),7). Is that equivalent to (y)+(+(*(y,y), *(3, y)),7)? Or to 8x+1? If we apply them to the value 3, and then evaluate them using standard arithmetic, then all three expressions evaluate to 25. So are they all equivalent? We want to be able to say that the first two are equivalent expressions, but the last one isn’t. And we’d really like to be able to say that structurally – that is, instead of saying something evaluation-based like “forall values of x, eval(f(x)) == eval(g(x)), therefore f == g”, we want to be able to do something that says f \equiv g because they have the same structure.

Using arity, we can work out a structural definition of equivalence for expressions.

In everything below, we’l write a: \alpha to mean that a has arity \alpha, and a \equiv b : \alpha to mean that a and b are equivalent expressions of arity \alpha. We’ll define equivalence in a classic inductive form by structure:

Variables and Constants
If x is a variable or constant of arity \alpha, then x \equiv x : alpha. This is the simplest identity rule: variables and constants are equivalent to themselves.
Definitions
If a := b is a definition, and b: \alpha, then a \equiv b : \alpha. This is a slightly more complex form of an identity rule: if there’s a definition of b as the value of a, then a and b are equivalent.
Application Rules
  1. If a \equiv a and b \equiv b, then a(b) \equiv a. If an applyable expression a is equivalent to another applyable expression a, then applying a to an expression b is equivalent to applying a to an expression b if the argument b is equivalent to the argument b. That’s a mouthful, but it’s simple: if you have two function application expressions, they’re equivalent if both the function expressions and the argument expressions are equivalent.
  2. If x is a variable of arity \alpha, and a is an expression of arity \alpha and b is an expression of arity \beta, then ((x)b)(a)  b[x := a]: \beta. This is arity’s version of the classic beta rule of lambda calculus: applying an abstraction to an argument means substituting the argument for all references to the abstracted parameter in the body of the abstraction.
Abstraction Rules
  1. If x is a variable of arity \alpha, and b \equiv b: \beta, then (x)b \equiv (x)b: \alpha \twoheadrightarrow \beta. If two expressions are equivalent, then two abstractions using the same variable over the same expression is equivalent.
  2. If x and y are both variables of arity \alpha, and b is an expression of arity \beta, then (x)b \equiv (y)(b[x := y]): \alpha \twoheadrightarrow \beta, provided y is not free in b.
    Basically, the renaming variables in abstractions don’t matter, as long as you aren’t using the variable in the body of the abstraction. So (x)(3+4y) is equivalent to (z)(3+4y), but it’s not equivalent to (y)(3+4y), because y is a free variable in 3+4y, and the abstraction would create a binding for y.

  3. This is arities version of the eta-rule from lambda calculus: If x is a variable of arity \alpha, and b is an expression of arity \alpha \twoheadrightarrow \beta, then (x)(b(x)) \equiv b: \alpha \twoheadrightarrow \beta. This is a fancy version of an identity rule: abstraction and application cancel.

Combination Rules
  1. If a_1 \equiv a_1, a_2 \equiv a_2, …, a_n \equiv a_n, then a_1, a_2, ..., a_n \equiv a_1. This one is simple: if you have two combination expressions with the same arity, then they’re equivalent if their elements are equivalent.
  2. If e: \alpha_1 \otimes \alpha_2 \otimes ... \otimes \alpha_n, then e.1, e.2, ..., e.n \equiv: e : \alpha_1 \otimes \alpha_2 \otimes ... \otimes \alpha_n. Another easy one: if you take a combination expression, and you decompose it using selections, and then recombine those selection expressions into a combination, it’s equivalent to the original expression.
Selection Rules
  1. If a \equiv a, then a.i \equiv a. This is the reverse of combinations rule one: if you have two equivalent tuples, then their elements are equivalent.
  2. If a_1: \alpha_1, a_2: \alpha_2, ..., a_n: \alpha_n, then (a_1, a_2, ..., a_n).i \equiv a_i. An element of a combination is equivalent to itself outside of the combination.
Symmetry
If a: \alpha, then a \equiv a: \alpha.
Symmetry
If a \equiv b: \alpha, then b \equiv a: \alpha.
Transitivity
If a \equiv b: \alpha, and b \equiv c: \alpha, then a \equiv c: \alpha.

Jumping back to our example: Is x^2 + 3x + 7 equivalent to x^2 + 3x + 7? If we convert them both into their canonical AST forms, then yes. They’re identical, except for one thing: the variable name in their abstraction. By abstraction rule 2, then, they’re equivalent.

Expressions and Arity (part 1)

In the last post, we started looking at expressions. In this post, we’re going to continue doing that, and start looking at a simple form of expression typing called arity.

Before we can do that, we need to introduce a couple of new formalisms to complete our understanding of the elements that form expressions. The reason we need another formalism is because so far, we’ve defined the meaning of expressions in terms of function calls. But we’ve still got some ambiguity about our function calls – specifically, how do parameters work?

Suppose I’ve got a function, f. Can I call it as f(x)? Or f(x,y)? Both? Neither? It depends on exactly what function calls mean, and what it means to be a valid parameter to a function.

There are several approaches that you can take:

  1. You can say that a function application expression takes an arbitrary number of arguments. This is, roughly, what we do in dynamically typed programming languages like Python. In Python, you can write f(x) + f(x,y) + f(x, y, z, a, b, c), and the language parser will accept it. (It’ll complain when you try to execute it, but at the level of expressions, it’s a perfectly valid expression.)
  2. You can say that every function takes exactly one argument, and that a multi-argument function like f(x, y, z) is really a shorthand for a curried call sequence f(x)(y)(z) – and thus, the “application” operation really takes exactly 2 parameters – a function, and a single value to apply that function to. This is the approach that we usually take in lambda calculus.
  3. You can say that application takes two arguments, but the second one can be a combination of multiple objects – effectively a tuple. That’s what we do in a lot of versions of recursive function theory, and in programming languages like SML.

In the theory of expressions, Martin-Löf chose the third approach. A function takes a single parameter, which is a combination of a collection of other objects. If a, b, c, and d are expressions, then a, b, c, d is an expression called the combination of its four elements. This is very closely related to the idea of cartesian products in set theory, but it’s important to realize that it’s not the same thing. We’re not defining elements of a set here: we’re defining a syntactic construct of a pseudo-programming language, where one possible interpretation of it is cartesian products.

In addition to just multi-parameter functions, we’ll use combinations for other things. In type theory, we want to be able to talk about certain mathematical constructs as first-class objects. For example, we’d like to be able to talk about orderings, where an ordering is a collection of objects A combined with an operation \le. Using combinations, we can write that very naturally as A,\le.

For combinations to be useful, we need to be able to extract the component values from them. In set theory, we’d do that by having projection functions associated with the cross-product. In our theory of expressions, we have selection expressions. If x=(a, b, c, d) is a combination with four elements, then x.1 is a selection expression which extracts the first element from x.

In programming language terms, combinations give us a way of writing tuple values. Guess what’s next? Record types! Or rather, combinations with named elements. We can write a combination with names: x = (a: 1, b:2), and then write selection expressions using the names, like x.a.

Now we can start getting to the meat of things.

In combinatory logic, we’d just start with a collection of primitive constant values, and then build whatever we wanted with them using abstractions, applications, combinations, and selections. Combinatory logic is the parent of computation theory: why can’t we just stick with that foundation?

There are two answers to that. The first is familiar to programming language people: if you don’t put any restrictions on things, then you lose type safety. You’ll be able to write “valid” expressions that don’t mean anything – things like 1.x, even though “1” isn’t a combination, and so calling a selector on it makes no sense. Or you’d be able to call a function like factorial(27, 13), even though the function only takes one parameter.

The other answer is equality. One thing that we’d really like to be able to do in our theory of expressions is determine when two expressions are the same. For example, we’d really like to be able to do things like say that ((x)e)(x) == e. But without some form of control, we can’t really do that: the problem of determining whether or not two expressions are equal can become non-decidable. (The canonical example of this problem involves y combinator: ((x)x(x))((x)x(x)). If we wanted to try to work out whether an expression involving this was equivilant to another expression, we could wind up in an infinite loop of applications.)

The way that Martin-Löf worked around this is to associate an arity with an expression. Each arity of expressions is distinct, and there are rules about what operations can be applied to an expression depending on its arity. You can’t call .2 on “1+3”, because “1+3” is a single expression, and selectors can only be applied to combined expressions.

To most of us, arity sounds like it should be a numeric value. When we we talk about the arity of a function in a program, what we mean is how many parameters it takes. In Martin-Löf expressions, arity is more expressive than that: it’s almost the same thing as types in the simply typed lambda calculus.

There are two dimensions to arity: single/combined and saturated/unsaturated.

Single expressions are atomic values, where you can’t extract other values from them by selection; multiple expressions are combinations of multiple other expressions.

Saturated expressions are expressions that have no holes in them that can be filled by other expressions – that is, expressions with no free variables. Unsaturated expressions have open holes – which means that they can be applied to other expressions.

Saturated single expressions have arity 0. An expression of arity 0 can’t be applied, and you can’t be the target of selection expressions.

An unsaturated expression has an arity (\alpha \twoheadrightarrow \beta), where both \alpha and \beta are arities. For example, the integer addition function has arity (0 \otimes 0 \twoheadrightarrow 0). (Erik Eidt pointed out that I made an error here. I originally wrote addition as 0 \twoheadrightarrow 0, where it should have been 0 \otimes 0 \twoheadrightarrow 0.)

A combined expression (e_0, e_1, ..., e_n) has arity (\alpha_1 \otimes \alpha_2 \otimes ... \otimes \alpha_n), where each of the \alpha_is are the arities of the expression e_i.

Sadly, I’m out of time to work on this post, so we’ll have to stop here. Next time, we’ll look at the formal rules for arities, and how to use them to define equality of expressions.

Understanding Expressions

Administrivia

I’m going to be trying something a bit different with this blog.

What I’ve tried to do here on GM/BM is make each post as self-contained as possible. Obviously, many things take more than one post to explain, but I’ve done my best to take things, and divide them into pieces where there’s a basic concept or process that’s the focus of each post.

I’m finding that for this type theory stuff, I’m having a hard time doing that. Or rather, given my work schedule right now when I’m trying to write about type theory, I’m finding it hard to find enough time to do that, and still be posting regularly. (That sounds like a complaint, but it’s not meant to be. I started a new job at Dropbox about three months ago, and I absolutely love it. I’m spending a lot of time working because I’m having so much fun, not because some big mean boss guy is leaning over me and threatening.)

Anyway, the point of this whole diversion is that I really want to get this blog back onto a regular posting schedule. But to do that, I’m going to have to change my posting style a bit, and make the individual posts shorter, and less self-contained. I’m definitely interested in what you, my readers, think of this – so please, as I roll into this, let me know if you think it’s working or not. Thanks!


In this post, we’re going to start looking at expressions. This might seem like it’s a diversion from the stuff I’ve been writing about type theory, but it really isn’t! Per Martin-Löf developed a theory of expressions which is used by type theorists and many others, and we’re going to be looking at that.

We’ve all seen arithmetic expressions written out since we were in first grade. We think we understand what they mean. But actually, most of us have never really stopped and thought precisely about what an expression actually means. Most of the time, that’s OK: we’ve got an intuitive sense of it that’s good enough. But for type theory, it’s not sufficient. In fact, even if we did have a really good, formal notion of expressions, it wouldn’t be right for type theory: in type theory, we’re rebuilding mathematics from a foundation of computability, and that’s not the basis of any theory of expressions that’s used in other mathematical fields.

Why is this a problem?

Let’s start by looking at a nice, simple expression:

x^2 + 3x + 7

What does that mean? Roughly speaking, it’s a function with one parameter: f(x) = x^2 + 3x + 7. But that doesn’t really tell us much: all we’ve really done is add a bit of notation. We still don’t know what it means.

Let’s take it a step further. It’s actually describing a computation that adds three elements: +(x^2, 3x, 7). But that’s not quite right either, because we know addition is binary. That means that we need to decide how to divide that addition into two parts. From the commutative property, we know that it doesn’t matter which way we divide it – but from a computational perspective, it might: doing it one way versus the other might take much longer!

We’ll pick left-associative, and say that it’s really +(+(x^2, 3x), 7). We also need to expand other parts of this into this functional idea. If we follow it all out, we wind up with: +(+(*(x,x), *(3, x)),7).

We’ve converted the expression into a collection of applications of primitive functions. Or in terms that are more familiar to geeks like me, we’ve taken the expression, and turned it into an abstract syntax tree (AST) that describes it as a computation.

We’re still being pretty vague here. We haven’t really defined our notion of function or computation. But even with that vagueness, we’ve already started making the notion of what an expression is much more concrete. We’ve taken the abstract notion of expressions, and translated it to a much less abstract notion of a computation expressed as a sequence of computable functions.

This is a really useful thing to do. It’s useful enough that we don’t want to limit it to just “pure” expressions. In the type theoretic view of computation, everything is an expression. That’s important for multiple reasons – but to make it concrete, we’re going to eventually get around to showing how types work in expressions, what it means for an expression to be well-typed, how to infer types for expressions, and so on. We want all of that to work for any program – not just for something that looks like a formula.

Fortunately, this works. We can also take an “expression” like for i in 1 .. 10 do f(i), and analyze it as a function: for(i, 1, 10, f(i)).

So, we’ve got a way of understanding expressions as functions. But even if we want to keep the computational side of things abstract and hand-wavy, that’s still not really enough. We’re closer to understanding expressions, but we’ve still got some huge, gaping holes.

Let’s jump back to that example expression: x^2 + 3x + 7. What does it mean? What we’ve seen so far is that we can both understand it, as a series of function calls: +(+(*(x, x), *(3, x)), 7). But we’d like to be able to evaluate it – to execute the computation that it describes. But we can’t do that: it’s got a gaping hole named x. What do we do with that?

We’re missing a really important notion: funcional abstraction. Our expression isn’t just an expression: what it really is is a function. We alluded to that before, but now we’re going to deal with it head-on. That expression doesn’t really define a computation: it defines a computational object that computes the function. When an expression has free variables – that is, variables that aren’t assigned a meaning within the expression – our expression represents something that’s been abstracted a level: instead of being a computation that produces a value, it’s an object that takes a parameter, and performs a computation on its parameter(s) in order to produce a value.

In our expression x^2 + 3x + 7, we’re looking at an expression in one free variable – which makes it a single-parameter function. In the notation of type theory, we’d write it as (x)(+(+(*(x, x), *(3, x)), 7) – that is,
the parameter variable in parens ahead of the expression that it parameterizes. (Yes, this notation stinks; but I’m sticking with the notations from the texts I’m using, and this is it.)

This notion, of parameters and function abstraction turns out to be more complex than you might expect. I’m going to stop this post here – but around wednesday, I’ll post the next part, which will look at how to understand the arity of an expression.

Logical Statements as Tasks

In the next step of our exploration of type theory, we’re going to take a step away from the set-based stuff. There are set-based intepretations of every statement in set theory. But what we really want to focus on is the interpretation of statements in computational terms.

What that means is that we’re going to take logical statements, and view them as computation tasks – that is, as formal logical specifications of a computation that we’d like to do. Under that interpretation, a proof of a statement S is an implementation of the task specified by S.

This interpretation is much, much easier for computer science types like me than the set-based interpretations. We can walk through the interpretations of all of the statements of our intuitionistic logic in just a few minutes.

We’ll start with the simple statements.

Conjunction
A \land B is a specification for a program that produces a pair (a, b) where a is a solution for A, and b is a solution for B.
Disjunction
A \lor B is a specification for a program that produces either a solution to A or a solution to B, along with a way of identifying which of A and B it solved. We do that using a version of the classical projection functions: A \lor B produce either \text{inl}(A) (that is, the left projection), or \text{inr}(B) (the right projection).
Implication
A \supset B is a specification for a program that produces a solution to B given a solution to A; in lambda calculus terms, it’s a form like \lambda x: b(x).

Now, we can move on to quantified statements. They get a bit more complicated, but if you read the quantifier right, it’s not bad.

Universal
(\forall x \in A) B(x) is a program which, when executed, yields a program of the form \lambda x.b(x), where b(x) is an implementation of B, and x is an implementation of A. In other words, a universal statement is a program factory, which produces a program that turns one program into another program.

To me, the easiest way to understand this is to expand the quantifier. A quantified statement \forall x \in A: B(x) can be read as \forall x: x \in A \Rightarrow B(x). If you read it that way, and just follow the computational interpretation of implication, you get precisely the definition above.

Existential
Existential quantification is easy. An existential statement \exists x \in A: B(x) is a two part problem: it needs a value for a (that is, a value of x for which a proof exists that x \in A), and a proof that for that specific value of x, x \in B. A solution, then, has two parts: it’s a pair (a, b), where a is a value in A, and b is a program that computes the problem B(a).

This is the perspective from which most of Martin-Loff’s type theory pursues things. There’s a reason why ML’s type theory is so well-loved by computer scientists: because what we’re really doing here is taking a foundational theory of mathematics, and turning it into a specification language for describing computing systems.

That’s the fundamental beauty of what Martin-Loff did: he found a way of formulating all of constructive mathematics so that it’s one and the same thing as the theory of computation.

And that’s why this kind of type theory is so useful as a component of programming languages: because it’s allowing you to describe the semantics of your program in terms that are completely natural to the program. The type system is a description of the problem; and the program is the proof.

With full-blown Martin-Loff type system, the types really are a full specification of the computation described by the program. We don’t actually use the full expressiveness of type theory in real languages – among other things, it’s not checkable! But we do use a constrained, limited logic with Martin-Loff’s semantics. That’s what types really are in programming languages: they’re logical statements! As we get deeper into type theory, well see exactly how that works.

Propositions as Proofsets: Unwinding the confusion

My type theory post about the different interpretations of a proposition caused a furor in the comments. Understand what’s going on that caused all of the confusion is going to be important as we continue to move forward into type theory.

The root problem is really interesting, once you see what’s going on. We’re taking a statement that, on the face of it, isn’t about sets. Then we’re appyling a set-based interpretation of it, and looking at the subset relation. That’s all good. The problem is that when we start looking at a set-based interpretation, we’re doing what we would do in classical set theory – but that’s a different thing from what we’re doing here. In effect, we’re changing the statement.

For almost all of us, math is something that we learned from the perspective of axiomatic set theory and first order predicate logic. So that’s the default interpretation that we put on anything mathematical. When you talk about a a proposition as a set, we’re programmed to think of it in that classical way: for any set S, there’s a logical predicate P_s such that by definition, \forall x: x \in S \Leftrightarrow P_s(x). When you see P \Rightarrow Q in a set-theory context, what you think is something like \forall x: x \in P \Rightarrow x \in Q. Under that intepretation, the idea that P \supset Q is equivalent to P \rightarrow Q is absolutely ridiculous. If you follow the logic, implication must be the reverse of the subset relation!

The catch, though, is that we’re not talking about set theory, and the statement P \Rightarrow Q that we’re looking at is emphatically not \forall x : P(x) \Rightarrow Q(x). And that, right there, is the root of the problem.

P \rightarrow Q always means P \rightarrow Q – it doesn’t matter whether we’re doing set theory or type theory or whatever else. But in set theory, when we talk about the intepretation of P as a set, right now, in the world of type theory, we’re talking about a different set.

Super set doesn’t suddenly mean subset. Implication doesn’t start working backwards! and yet, I’m still trying to tell you that i really meant it when i said that superset meant implication! how can that possibly make sense?

In type theory, we´re trying to take a very different look at math. In particular, we’re building everything up on a constructive/computational framework. So we’re necessarily going to look at some different interpretations of things – we’re going to look at things in ways that just don’t make sense in the world of classical set theory/FOPL. We’re not going to contradict set theory – but we’re going to look at things very differently.

For example, the kind of statement we’re talking here about is a complete, closed, logical proposition, not a predicate, nor a set. The proposition P is a statement like “‘hello’ has five letters”.

When we look at a logical proposition P, one of the type theoretic interpretations of it is as a set of facts: P can be viewed as the set of all facts that can be proven true using P. In type theory land, this makes perfect sense: if I’ve got a proof of P, then I’ve got a proof of everything that P can prove. P isn’t a statement about the items in Ps proof-set. P is a logical statement about something, and the elements of the proof-set of P are the things that the statement P can prove.

With that in mind, what does P \Rightarrow Q mean in type theory? It means that everything provable using Q is provable using nothing but P.

(It’s really important to note here that there are no quantifiers in that statement. Again, we are not saying \forall p: P(x) \Rightarrow Q(x). P and Q are atomic propositions – not open quantified statements.)

If you are following the interpretation that says that P is the set of facts that are provable using the proposition P, then if P \Rightarrow Q, that means that everything that’s in Q must also be in P. In fact, it means pretty much exactly the same thing as classical superset. Q is a set of facts provable by the statement Q. The statement Q is provable using the statement P – which means that everything in the provable set of Q must, by definition! be in the provable set of P.

The converse doesn’t hold. There can be things provable by P (and thus in the proof-set of P) which are not provable using Q. So taken as sets of facts provable by logical propositions, P \supset Q!

Again, that seems like it’s the opposite of what we’d expect. But the trick is to recognize the meaning of the statements we’re working with, and that despite a surface resemblance, they’re not the same thing that we’re used to. Type theory isn’t saying that the set theoretic statements are wrong; nor is set theory saying that type theory is wrong.

The catch is simple: we’re trying to inject a kind of quantification into the statement P \Rightarrow Q which isn’t there; and then we’re using our interpretation of that quantified statement to say something different.

But there’s an interpretation of statements in type theory which is entirely valid, but which trips over our intuition: our training has taught us to take it, and expand it into an entirely different statement. We create blanks that aren’t there, fill them in, and by doing so, convert it into something that it isn’t, and confuse ourselves.

The Program is the Proof: Propositions in Type Theory

As usual, I’m going in several different directions. I’m going to continue doing data structure posts, but at the same time I also want to get back to the type theory stuff that I was writing about before I put the blog on hiatus.

So let’s get back to a bit of Martin-Loff type theory! (The stuff I’m writing about today corresponds, roughly, to chapter 2 of the Nordstrom/Petersson/Smith text.)

One of the key ideas of Martin-Loff’s type theory is that a logical statement is exactly the same thing as a specification of a computation. When you define a predicate like “Even”, the definition specifies both the set of numbers that satisfy the predicate, and the computation that tests a number for membership in the set of even numbers. If you haven’t provided enough information to fully specify the computation, then in Martin-Loff type theory, you haven’t defined a predicate or a set.

When you say “2 is even”, what you’re really saying in terms of the type theory is that “The computation for ‘even(2)’ will succeed”. The computation and the logical statement are the same thing.

In functional programming, we like to say that the program is the proof. Martin-Loff type theory is where that came from – and today we’re going to take a first look in detail at exactly what it means. In the world of type theory, the program is the proof, and the proof doesn’t exist without the program.

This creates an interesting set of equivalent interpretations. When you see a statement like “x : T” (or x \in T), that could be interpreted in the following ways, all of which are really equivalent in type theory.

  1. Set theoretic: x is a member of the set T.
  2. Intuitionistic: x is a proof object for the proposition T.
  3. Computational: x is a program that satisfies the specification T.
  4. Abstract: x is a solution for the problem T.

In the rest of this post, I’m going to focus on those four interpretations, and explain how each of them makes sense in this version of type theory.

The set theoretic interpretation is obvious – as the name suggests, it’s nothing but what we all learned from basic set theory. An object is a member of a set – which means, from set theory, that the object satisfies some predicate in first order predicate logic – because that’s what it means to be a member of a set.

The intuitionistic interpretation is almost the same as the set theoretic, but rewritten for intuitionistic logic. In intuitionistic logic, the predicate over the set is written as a proposition T, and if we know that x is a member of the set T, then that means that we have a proof that x demonstrates that T is true.

The computational interpretation takes the intuitionistic one, and rephrases it in computational terms. A logical proposition, rendered into a computational setting, is just a specification of a program; and a proof of the proposition is a program that satisfies the specification.

Finally, the abstract interpretation just rephrases the computational one into terms that aren’t tied to a computing device. A predicate is a problem that needs to be solved; anything that provides a solution to the problem is demonstrating a member of the set.

The key takeaway though is the basic idea here of what a type is. What we’re talking about as a type here is something that goes far beyond any programming language’s idea of what a type is. In intuitionistic type theory, a type is a specification of a computation. If we had this in a real language, that would mean that any program that typechecked would be guaranteed to work: asserting that x has type T means, precisely, that x is a computation that matches the specification!

(Of course, that’s not the panacea that you might think the first time you hear that. The catch is simple: the type is a specification of the computation. That means that just writing a type is a form of programming! And that means that your type descriptions are going to have bugs. But we’ll come back to that in a much later post.)

What type theory is doing is taking something like set theory, and re-rendering it entirely in a computational world. It still has a mapping from the computations to the abstract concepts that we use when we’re thinking, but if we can talk about those abstract concepts in type theory, we’ll always do it by mapping them into some kind of computation.

In type theory, we’re not dealing in a world of pure mathematical objects that exist if we can describe them; instead, we’re building a world where everything is at least theoretically computable. That might seem constraining, but every proof already corresponds to a computation of some sort; the only additional constraint here is that we can’t play tricks like the axiom of choice, where we can “prove” the existence of some unattainable, intangible, nonsensical object.

To make that work, we’re going to take all of the construct that we’re used to seeing in intuitionistic logic, and give them a meaning in terms of computations.

For example, in set theory, we can have a statement A \supset B – meaning that A is a superset of B, that every element of B is also necessarity an element of A. In type theory, since A and B are specifications of computations, that means that a member (or proof) of A \supset B is a computation that given a proof of A, generates a proof of B – in short, that A implies B.

Now, suppose that we want to prove A \supset A. How could we do that? We need a program that given a proof of A generates a proof of A. That is, we need an implementation of the identity function: \lambda a . a.

In fact, using the computation interpretation of things, we can interpret A \supset B as being the type of a function that takes an instance of a, and generates an instance of b – that is, that if f : (A \supset B), then f is a function from an instance of A to an instance of B!

The only trick to that is understanding that in type theory, saying that a is an element of A means that a is a proof of A. Using the same interpretation, that means that f: A \supset B means that f is a proof of A \supset B – which means the same thing as saying that given an example of an element of A (a proof of A), f will produce an element of B (a proof of B). The statement A \supset B is exactly the same thing as the logical implication A \rightarrow B, which is exactly the same thing as the type of a function from A to B.


Notes on sources: I’m working from two main references in this series of posts, both of which are available in PDF form online.

  1. “Programming in Martin-Lof’s Type Theory”, by Nordstrom, Petersson, and Smith, which you can download here.
  2. “Type Theory and Functional Programming” by Simon Thompson (available here).

In addition, I first learned a lot of this from reading some papers by Phil Wadler and Simon Peyton Jones. The exactly references but the exact references to those are long-lost in the shadows of my memory. But any of their papers are well-worth reading, so just read them all!)