Category Archives: Type Theory

A Review of Type Theory (so far)

I’m trying to get back to writing about type theory. Since it’s been quite a while since the last type theory post, we’ll start with a bit of review.

What is this type theory stuff about?

The basic motivation behind type theory is that set theory isn’t the best foundation for mathematics. It seems great at first, but when you dig in deep, you start to see cracks.

If you start with naive set theory, the initial view is amazing: it’s so simple! But it falls apart: it’s not consistent. When you patch it, creating axiomatic set theory, you get something that isn’t logically inconsistent – but it’s a whole lot more complicated. And while it does fix the inconsistency, it still gives you some results which seem wrong.

Type theory covers a range of approaches that try to construct a foundational theory of mathematics that has the intuitive appeal of axiomatic set theory, but without some of its problems.

The particular form of type theory that we’ve been looking at is called Martin-Löf type theory. M-L type theory is a constructive theory of mathematics in which computation plays a central role. The theory rebuilds mathematics in a very concrete form: every proof must explicitly construct the objects it talks about. Every existence proof doesn’t just prove that something exists in the abstract – it provides a set of instructions (a program!) to construct an example of the thing that exists. Every proof that something is false provides a set of instructions (also a program!) for how to construct a counterexample that demonstrates its falsehood.

This is, necessarily, a weaker foundation for math than traditional axiomatic set theory. There are useful things that are provable in axiomatic set theory, but which aren’t provable in a mathematics based on M-L type theory. That’s the price you pay for the constructive foundations. But in exchange, you get something that is, in many ways, clearer and more reasonable than axiomatic set theory. Like so many things, it’s a tradeoff.

The constructivist nature of M-L type theory is particularly interesting to wierdos like me, because it means that programming becomes the foundation of mathematics. It creates a beautiful cyclic relationship: mathematics is the foundation of programming, and programming is the foundation of mathematics. The two are, in essence, one and the same thing.

The traditional set theoretic basis of mathematics uses set theory with first order predicate logic. FOPL and set theory are so tightly entangled in the structure of mathematics that they’re almost inseparable. The basic definitions of type theory require logical predicates that look pretty much like FOPL; and FOPL requires a model that looks pretty much like set theory.

For our type theory, we can’t use FOPL – it’s part of the problem. Instead, Martin-Lof used intuitionistic logic. Intuitionistic logic plays the same role in type theory that FOPL plays in set theory: it’s deeply entwined into the entire system of types.

The most basic thing to understand in type theory is what a logical proposition means. A proposition is a complete logical statement no unbound variables and no quantifiers. For example, “Mark has blue eyes” is a proposition. A simple proposition is a statement of fact about a specific object. In type theory, a proof of a proposition is a program that demonstrates that the statement is true. A proof that “Mark has blue eyes” is a program that does something like “Look at a picture of Mark, screen out everything but the eyes, measure the color C of his eyes, and then check that C is within the range of frequencies that we call “blue”. We can only say that a proposition is true if we can write that program.

Simple propositions are important as a starting point, but you can’t do anything terribly interesting with them. Reasoning with simple propositions is like writing a program where you can only use literal values, but no variables. To be able to do interesting things, you really need variables.

In Martin-Lof type theory, variables come along with predicates. A predicate is a statement describing a property or fact about an object (or about a collection of objects) – but instead of defining it in terms of a single fixed value like a proposition, it takes a parameter. “Mark has blue eyes” is a proposition; “Has blue eyes” is a predicate. In M-L type theory, a predicate is only meaningful if you can write a program that, given an object (or group of objects) as a parameter, can determine whether or no the predicate is true for that object.

That’s roughly where we got to in type theory before the blog went on hiatus.

Truth in Type Theory

Now, we’re getting to the heart of type theory: judgements. Judgements are the basic logical statements that we want to be able to prove about computations in type theory. There’s a basic set of judgements that we want to be able to make.

I keep harping on this, but it’s the heart of type theory: type theory is all about understanding logical statements as specifications of computations. Or, more precisely, in computer science terms, they’re about understanding true logical statements as halting computations.

In this post, we’ll see the ultimate definition of truth in type theory: every logical proposition is a set, and the proposition is true if the set has any members. A non-halting computation is a false statement – because you can never get it to resolve an expression to a canonical value.

So remember as we go through this: judgements are based on the idea of logical statements as specifications of computations. So when we talk about a predicate P, we’re using its interpretation as a specification of a computation. When we look at an expression 3+5, we understand it not as a piece of notation that describes the number 8, but as a description of a computation that adds 3 to 5. “3+5” not the same computation as “2*4” or “2+2+2+2”, but as we’ll see, they’re equal because they evaluate to the same thing – that is, they each perform a computation that results in the same canonical value – the number 8.

In this post, we’re going to focus on a collection of canonical atomic judgement types:

A \text{set}
This judgement says that A is a set.
A = B
A and Bare equal sets.
a \in A
a is an element of the set A
a_1 == a_2 \in A
a_1 and a_2 are equal members of the set A.
A is a proposition.
The proposition A is true.

The definition of the meanings of judgements is, necessarily, mutually recursive, so we’ll have to go through all of them before any of them is complete.

An object A is a Set
When I say that A is a set in type theory, that means that:

  • I know the rules for how to form the canonical expressions for the set;
  • I’ve got an equivalence relation that says when two canonical members of the set are equal.
Two Sets are Equal

When I say that A and B are equal sets, that means:

  • A and B are both sets.
  • If a is a canonical member of A, then a is also a canonical member of B, and vice versa.
  • If a and b are canonical members of A, and they’re also equal in A, then a and b are also equal canonical members of B (and vice versa).

The only tricky thing about the definition of this judgement is the fact that it defines equality is a property of a set, not of the elements of the set. By this definition, tt’s possible for two expressions to be members of two different sets, and to be equal in one, but not equal in the other. (We’ll see in a bit that this possibility gets eliminated, but this stage of the definition leaves it open!)

An object is a member of a set
When I say a \in A, that means that if I evaluate a, the result will be a canonical member of A.
Two members of a set are equal
If a \in A and b \in A and a = b, that means when if you evaluate a, you’ll get a canonical expression a; and when you evaluate b, you’ll get a canonical expression b. For a == b to be true, then a, that is, the canonical expressions resulting from their evaluation must also be equal.

This nails down the problem back in set equivalence: since membership equivalence is defined in terms of evaluation as canonical values, and every expression evaluates to exactly one canonical expression (that’s the definition of canonical!), then if two objects are equal in a set, they’re equal in all sets that they’re members of.

An object A is a proposition
Here’s where type theory really starts to depart from the kind of math that we’re used to. In type theory, a proposition is a set. That’s it: to be a proposition, A has to be a set.
The proposition A is true
And the real meat of everything so far: if we have a proposition A, and A is true, what that means is that a has at least one element. If a proposition is a non-empty set, then it’s true. If it’s empty, it’s false.

Truth in type theory really comes down to membership in a set. This is, subtly, different from the predicate logic that we’re familiar with. In predicate logic, a quantified proposition can, ultimately, be reduced to a set of values, but it’s entirely reasonable for that set to be empty. I can, for example, write a logical statement that “All dogs with IQs greater that 180 will turn themselves into cats.” A set-based intepretation of that is the collection of objects for which it’s true. There aren’t any, because there aren’t any dogs with IQs greater than 180. It’s empty. But logically, in terms of predicate logic, I can still say that it’s “true”. In type theory, I can’t: to be true, the set has to have at least one value, and it doesn’t.

In the next post, we’ll take these atomic judgements, and start to expand them into the idea of hypothetical judgements. In the type-theory sense, that means statements require some other set of prior judgements before they can be judged. In perhaps more familiar terms, we’ll be looking at type theory’s equivalent of the contexts in classical sequent calculus – those funny little \Gammas that show up in all of the sequent rules.

Canonical Expressions in Type Theory

Sorry for the gap in posts. I’ve been trying to post more regularly, and was just hitting a rhythm, when my son brought home a particularly vicious bug, and I got sick. I’ve spent the last couple of weeks being really, really sick, and then trying to get caught up at work. I’m mostly recovered, except for some lingering asthma, so I’m trying to get back to that twice-per-week posting schedule.

In the last couple of posts, we looked at Martin-Löf’s theory of expressions. The theory of expressions is purely syntactic: it’s a way of understanding the notation of expressions. Everything that we do in type theory will be written with expressions that follow the syntax, the arity, and the definitional equivalency rules of expression theory.

The next step is to start to understand the semantics of expressions. In type theory, when it comes to semantics, we’re interested in two things: evaluation and judgements. Evaluation is the process by which an expression is reduced to its simplest form. It’s something that we care about, but it’s not really a focus of type theory: type theory largely waves its hands in the air and says “we know how to do this”, and opts for normal-order evaluation. Judgements are where things get interesting.

Judgements are provable statements about expressions and the values that they represent. As software people, when we think about types and type theory, we’re usually thinking about type declarations: type declarations are judgements about the expressions that they apply to. When you write a type declaration in a programming language, what you’re doing is asserting the type theory judgement. When the compiler “type-checks” your program, what it’s doing in type theory terms is checking that your judgements are proven by your program.

For example, we’d like to be able to make the judgement A set – that is, that A is a set. In order to make the judgement that A is a set in type theory, we need to know two things:

  1. How are canonical instances of the set A formed?
  2. Given two canonical instances of A, how can we determine if they’re equal?

To understand those two properties, we need to take a step back. What is a canonical instance of a set?

If we think about how we use predicate logic, we’re always given some basic set of facts as a starting point. In type theory, the corresponding concept is a primitive constant. The primitive constants include base values and primitive functions. For example, if we’re working with lispish expressions, then cons(1, cons(2, nil)) is an expression, and cons, nil, 1 and 2 are primitive constants; cons is the head of the expression, and 1 and cons(2, nil) are the arguments.

A canonical expression is a saturated, closed expression whose head is a primitive constant.

The implications of this can be pretty surprising, because it means that a canonical expression can contain unevaluated arguments! The expression has to be saturated and closed – so its arguments can’t have unbound variables, or be missing parameters. But it can contain unevaluated subexpressions. For example, if we were working with Peano arithmetic in type theory, succ(2+3) is canonical, even though “2+3” hasn’t be evaluated.

In general, in type theory, the way that we evaluate an expression is called normal order evaluation – what programming language people call lazy evaluation: that’s evaluating from the outside in. Given a non-canonical expression, we evaluate from the outside in until we get to a canonical expression, and then we stop. A canonical expression is considered the result of a computation – so we can see succ(2+3) as a result!

A canonical expression is the evaluated form of an expression, but not the fully evaluated form. The fully evaluated form is when the expression and all of its saturated parts are fully evaluated. So in our previous example, the saturated part 2+3 wasn’t evaluated, so it’s not fully evaluated. To get it to be fully evaluated, we’d need to evaluate 2+3, giving us succ(5); then, since succ(5) is saturated, it’s evaluated to 6, which is the fully evaluated form of the expression.

Next post (coming monday!), we’ll use this new understanding of canonical expressions, and start looking at judgements, and what they mean. That’s when type theory starts getting really fun and interesting.

Expressions and Arity (Part 2): Equivalence

Continuing where I left off: we were talking about arity in Martin-Löf’s theory of expressions. There are two basic problems that arity solves: it makes certain kinds of impossible-to-evaluate expressions be invalid in the theory; and it helps enable some way of comparing expressions for equality. Arity solves both of those problems by imposing a simple type system over expressions.

At the end of the last post, I started giving a sketch of what arities look like. Now we’re going to dive in, and take a look at how to determine the arity of an expression. It’s a fairly simple system of rules.

Before diving in, I want to stress the most important thing about the way that these rules work is that the expressions are totally syntactic and static. This is something that confused me the first time I tried to read about expression theory. When you see an expression, you think about how it’s evaluated. But expression theory is a purely syntactic theory: it’s about analyzing expressions as syntactic entities. There are, deliberately, no evaluation rules. It’s about understanding what the notations mean, and how to determine when two expressions are equivalent.

If, under the rules of Martin-Löf’s expression theory, two expressions are equivalent, then if you were to chose a valid set of evaluation rules, the two expressions will evaluate to the same value. But expression equivalence is stronger: expressions are equivalent only if you can prove their equivalence from their syntax.

That clarified, let’s start by looking at the rules of arity in expressions.

Variables and Constants
Every variable and every primitive constant has a pre-defined arity; if x is a variable or primitive constant with arity \alpha, then the expression x has arity \alpha.
Definitions
In a definition D := e, the arity of the defined name D is the same as the arity of the expression e.
Applications
If a is an expression of arity \alpha \twoheadrightarrow \beta, and b is a expression of arity \alpha, then a(b) is an expression of arity \beta.
Abstractions
If e is an expression of arity \beta and x is a variable of arity \alpha, then (x)e is an expression of arity \alpha \twoheadrightarrow \beta.
Combinations
If e_1 is an expression af arity \alpha_1, e_2 is an expression of arity \alpha_2, …, and e_n is an expression of arity \alpha_n, then a combination expression e_1, e_2, ..., e_n is an expression of arity \alpha_1 \otimes \alpha_2 \otimes \ldots \otimes \alpha_n.
Selections
If e is an expression of arity \alpha_1 \otimes \alpha_2 \otimes \ldots \otimes \alpha_n where n \ge 2, then (e).i is an expression af arity e_i.

Let’s try working through an example: x^2 + 3x + 7.

  1. As we saw in this post, this is equivalent to the simple AST-form: (x)+(+(*(x,x), *(3, x)),7).
  2. x” is a variable of arity 0; “3” and “7” are constants of arity 0; “+” and “*” are constants of arity (0 \otimes 0) \twoheadrightarrow 0.
  3. From the combination rule, since x and 3 both have arity 0, (x, x) and (3, x) each have arity 0 \otimes 0.
  4. Since (x, x) has arity 0 \otimes 0, and * has arity (0 \otimes 0) \twoheadrightarrow 0, *(x, x) has arity 0. The same thing works for *(3, x).
  5. Since the arities of the *(x, x) and *(3, x) are both 0, the combination of the pair (the arguments to the inner “+”) are 0 \otimes 0, and the arity of the inner sum expression is thus 0.
  6. Since 7 has arity 0, the combination of it with the inner sum is 0 \otimes 0, and the arity of the outer sum is 0.
  7. Since x is a variable of arity 0, and outer sum expression has arity 0, the abstract has arity 0 \twoheadrightarrow 0.

If you’re familiar with type inference in the simply typed lambda calculus, this should look pretty familiar; the only real difference is that the only thing that arity really tracks is applicability and parameter counting.

Just from this much, we can see how this prevents problems. If you try to compute the arity of 3.1 (that is, the selection of the first element from 3), you find that you can’t: there is no arity rule that would allow you to do that. The selection rule only works on a product-arity, and 3 has arity 0.

The other reason we wanted arity was to allow us to compare expressions. Intuitively, it should be obvious that the expression e and the expression (x)e(x) are in some sense equal. But we need some way of being able to actually precisely define that equality.

The kind of equality that we’re trying to get at here is called definitional equality. We’re not trying to define equality where expressions a and b evaluate to equal values – that would be easy. Instead, we’re trying to get at something more subtle: we want to capture the idea that the expressions are different ways of writing the same thing.

We need arity for this, for a simple reason. Let’s go back to that first example expression: (x)+(+(*(x,x), *(3, x)),7). Is that equivalent to (y)+(+(*(y,y), *(3, y)),7)? Or to 8x+1? If we apply them to the value 3, and then evaluate them using standard arithmetic, then all three expressions evaluate to 25. So are they all equivalent? We want to be able to say that the first two are equivalent expressions, but the last one isn’t. And we’d really like to be able to say that structurally – that is, instead of saying something evaluation-based like “forall values of x, eval(f(x)) == eval(g(x)), therefore f == g”, we want to be able to do something that says f \equiv g because they have the same structure.

Using arity, we can work out a structural definition of equivalence for expressions.

In everything below, we’l write a: \alpha to mean that a has arity \alpha, and a \equiv b : \alpha to mean that a and b are equivalent expressions of arity \alpha. We’ll define equivalence in a classic inductive form by structure:

Variables and Constants
If x is a variable or constant of arity \alpha, then x \equiv x : alpha. This is the simplest identity rule: variables and constants are equivalent to themselves.
Definitions
If a := b is a definition, and b: \alpha, then a \equiv b : \alpha. This is a slightly more complex form of an identity rule: if there’s a definition of b as the value of a, then a and b are equivalent.
Application Rules
  1. If a \equiv a and b \equiv b, then a(b) \equiv a. If an applyable expression a is equivalent to another applyable expression a, then applying a to an expression b is equivalent to applying a to an expression b if the argument b is equivalent to the argument b. That’s a mouthful, but it’s simple: if you have two function application expressions, they’re equivalent if both the function expressions and the argument expressions are equivalent.
  2. If x is a variable of arity \alpha, and a is an expression of arity \alpha and b is an expression of arity \beta, then ((x)b)(a)  b[x := a]: \beta. This is arity’s version of the classic beta rule of lambda calculus: applying an abstraction to an argument means substituting the argument for all references to the abstracted parameter in the body of the abstraction.
Abstraction Rules
  1. If x is a variable of arity \alpha, and b \equiv b: \beta, then (x)b \equiv (x)b: \alpha \twoheadrightarrow \beta. If two expressions are equivalent, then two abstractions using the same variable over the same expression is equivalent.
  2. If x and y are both variables of arity \alpha, and b is an expression of arity \beta, then (x)b \equiv (y)(b[x := y]): \alpha \twoheadrightarrow \beta, provided y is not free in b.
    Basically, the renaming variables in abstractions don’t matter, as long as you aren’t using the variable in the body of the abstraction. So (x)(3+4y) is equivalent to (z)(3+4y), but it’s not equivalent to (y)(3+4y), because y is a free variable in 3+4y, and the abstraction would create a binding for y.

  3. This is arities version of the eta-rule from lambda calculus: If x is a variable of arity \alpha, and b is an expression of arity \alpha \twoheadrightarrow \beta, then (x)(b(x)) \equiv b: \alpha \twoheadrightarrow \beta. This is a fancy version of an identity rule: abstraction and application cancel.

Combination Rules
  1. If a_1 \equiv a_1, a_2 \equiv a_2, …, a_n \equiv a_n, then a_1, a_2, ..., a_n \equiv a_1. This one is simple: if you have two combination expressions with the same arity, then they’re equivalent if their elements are equivalent.
  2. If e: \alpha_1 \otimes \alpha_2 \otimes ... \otimes \alpha_n, then e.1, e.2, ..., e.n \equiv: e : \alpha_1 \otimes \alpha_2 \otimes ... \otimes \alpha_n. Another easy one: if you take a combination expression, and you decompose it using selections, and then recombine those selection expressions into a combination, it’s equivalent to the original expression.
Selection Rules
  1. If a \equiv a, then a.i \equiv a. This is the reverse of combinations rule one: if you have two equivalent tuples, then their elements are equivalent.
  2. If a_1: \alpha_1, a_2: \alpha_2, ..., a_n: \alpha_n, then (a_1, a_2, ..., a_n).i \equiv a_i. An element of a combination is equivalent to itself outside of the combination.
Symmetry
If a: \alpha, then a \equiv a: \alpha.
Symmetry
If a \equiv b: \alpha, then b \equiv a: \alpha.
Transitivity
If a \equiv b: \alpha, and b \equiv c: \alpha, then a \equiv c: \alpha.

Jumping back to our example: Is x^2 + 3x + 7 equivalent to x^2 + 3x + 7? If we convert them both into their canonical AST forms, then yes. They’re identical, except for one thing: the variable name in their abstraction. By abstraction rule 2, then, they’re equivalent.

Expressions and Arity (part 1)

In the last post, we started looking at expressions. In this post, we’re going to continue doing that, and start looking at a simple form of expression typing called arity.

Before we can do that, we need to introduce a couple of new formalisms to complete our understanding of the elements that form expressions. The reason we need another formalism is because so far, we’ve defined the meaning of expressions in terms of function calls. But we’ve still got some ambiguity about our function calls – specifically, how do parameters work?

Suppose I’ve got a function, f. Can I call it as f(x)? Or f(x,y)? Both? Neither? It depends on exactly what function calls mean, and what it means to be a valid parameter to a function.

There are several approaches that you can take:

  1. You can say that a function application expression takes an arbitrary number of arguments. This is, roughly, what we do in dynamically typed programming languages like Python. In Python, you can write f(x) + f(x,y) + f(x, y, z, a, b, c), and the language parser will accept it. (It’ll complain when you try to execute it, but at the level of expressions, it’s a perfectly valid expression.)
  2. You can say that every function takes exactly one argument, and that a multi-argument function like f(x, y, z) is really a shorthand for a curried call sequence f(x)(y)(z) – and thus, the “application” operation really takes exactly 2 parameters – a function, and a single value to apply that function to. This is the approach that we usually take in lambda calculus.
  3. You can say that application takes two arguments, but the second one can be a combination of multiple objects – effectively a tuple. That’s what we do in a lot of versions of recursive function theory, and in programming languages like SML.

In the theory of expressions, Martin-Löf chose the third approach. A function takes a single parameter, which is a combination of a collection of other objects. If a, b, c, and d are expressions, then a, b, c, d is an expression called the combination of its four elements. This is very closely related to the idea of cartesian products in set theory, but it’s important to realize that it’s not the same thing. We’re not defining elements of a set here: we’re defining a syntactic construct of a pseudo-programming language, where one possible interpretation of it is cartesian products.

In addition to just multi-parameter functions, we’ll use combinations for other things. In type theory, we want to be able to talk about certain mathematical constructs as first-class objects. For example, we’d like to be able to talk about orderings, where an ordering is a collection of objects A combined with an operation \le. Using combinations, we can write that very naturally as A,\le.

For combinations to be useful, we need to be able to extract the component values from them. In set theory, we’d do that by having projection functions associated with the cross-product. In our theory of expressions, we have selection expressions. If x=(a, b, c, d) is a combination with four elements, then x.1 is a selection expression which extracts the first element from x.

In programming language terms, combinations give us a way of writing tuple values. Guess what’s next? Record types! Or rather, combinations with named elements. We can write a combination with names: x = (a: 1, b:2), and then write selection expressions using the names, like x.a.

Now we can start getting to the meat of things.

In combinatory logic, we’d just start with a collection of primitive constant values, and then build whatever we wanted with them using abstractions, applications, combinations, and selections. Combinatory logic is the parent of computation theory: why can’t we just stick with that foundation?

There are two answers to that. The first is familiar to programming language people: if you don’t put any restrictions on things, then you lose type safety. You’ll be able to write “valid” expressions that don’t mean anything – things like 1.x, even though “1” isn’t a combination, and so calling a selector on it makes no sense. Or you’d be able to call a function like factorial(27, 13), even though the function only takes one parameter.

The other answer is equality. One thing that we’d really like to be able to do in our theory of expressions is determine when two expressions are the same. For example, we’d really like to be able to do things like say that ((x)e)(x) == e. But without some form of control, we can’t really do that: the problem of determining whether or not two expressions are equal can become non-decidable. (The canonical example of this problem involves y combinator: ((x)x(x))((x)x(x)). If we wanted to try to work out whether an expression involving this was equivilant to another expression, we could wind up in an infinite loop of applications.)

The way that Martin-Löf worked around this is to associate an arity with an expression. Each arity of expressions is distinct, and there are rules about what operations can be applied to an expression depending on its arity. You can’t call .2 on “1+3”, because “1+3” is a single expression, and selectors can only be applied to combined expressions.

To most of us, arity sounds like it should be a numeric value. When we we talk about the arity of a function in a program, what we mean is how many parameters it takes. In Martin-Löf expressions, arity is more expressive than that: it’s almost the same thing as types in the simply typed lambda calculus.

There are two dimensions to arity: single/combined and saturated/unsaturated.

Single expressions are atomic values, where you can’t extract other values from them by selection; multiple expressions are combinations of multiple other expressions.

Saturated expressions are expressions that have no holes in them that can be filled by other expressions – that is, expressions with no free variables. Unsaturated expressions have open holes – which means that they can be applied to other expressions.

Saturated single expressions have arity 0. An expression of arity 0 can’t be applied, and you can’t be the target of selection expressions.

An unsaturated expression has an arity (\alpha \twoheadrightarrow \beta), where both \alpha and \beta are arities. For example, the integer addition function has arity (0 \otimes 0 \twoheadrightarrow 0). (Erik Eidt pointed out that I made an error here. I originally wrote addition as 0 \twoheadrightarrow 0, where it should have been 0 \otimes 0 \twoheadrightarrow 0.)

A combined expression (e_0, e_1, ..., e_n) has arity (\alpha_1 \otimes \alpha_2 \otimes ... \otimes \alpha_n), where each of the \alpha_is are the arities of the expression e_i.

Sadly, I’m out of time to work on this post, so we’ll have to stop here. Next time, we’ll look at the formal rules for arities, and how to use them to define equality of expressions.

Understanding Expressions

Administrivia

I’m going to be trying something a bit different with this blog.

What I’ve tried to do here on GM/BM is make each post as self-contained as possible. Obviously, many things take more than one post to explain, but I’ve done my best to take things, and divide them into pieces where there’s a basic concept or process that’s the focus of each post.

I’m finding that for this type theory stuff, I’m having a hard time doing that. Or rather, given my work schedule right now when I’m trying to write about type theory, I’m finding it hard to find enough time to do that, and still be posting regularly. (That sounds like a complaint, but it’s not meant to be. I started a new job at Dropbox about three months ago, and I absolutely love it. I’m spending a lot of time working because I’m having so much fun, not because some big mean boss guy is leaning over me and threatening.)

Anyway, the point of this whole diversion is that I really want to get this blog back onto a regular posting schedule. But to do that, I’m going to have to change my posting style a bit, and make the individual posts shorter, and less self-contained. I’m definitely interested in what you, my readers, think of this – so please, as I roll into this, let me know if you think it’s working or not. Thanks!


In this post, we’re going to start looking at expressions. This might seem like it’s a diversion from the stuff I’ve been writing about type theory, but it really isn’t! Per Martin-Löf developed a theory of expressions which is used by type theorists and many others, and we’re going to be looking at that.

We’ve all seen arithmetic expressions written out since we were in first grade. We think we understand what they mean. But actually, most of us have never really stopped and thought precisely about what an expression actually means. Most of the time, that’s OK: we’ve got an intuitive sense of it that’s good enough. But for type theory, it’s not sufficient. In fact, even if we did have a really good, formal notion of expressions, it wouldn’t be right for type theory: in type theory, we’re rebuilding mathematics from a foundation of computability, and that’s not the basis of any theory of expressions that’s used in other mathematical fields.

Why is this a problem?

Let’s start by looking at a nice, simple expression:

x^2 + 3x + 7

What does that mean? Roughly speaking, it’s a function with one parameter: f(x) = x^2 + 3x + 7. But that doesn’t really tell us much: all we’ve really done is add a bit of notation. We still don’t know what it means.

Let’s take it a step further. It’s actually describing a computation that adds three elements: +(x^2, 3x, 7). But that’s not quite right either, because we know addition is binary. That means that we need to decide how to divide that addition into two parts. From the commutative property, we know that it doesn’t matter which way we divide it – but from a computational perspective, it might: doing it one way versus the other might take much longer!

We’ll pick left-associative, and say that it’s really +(+(x^2, 3x), 7). We also need to expand other parts of this into this functional idea. If we follow it all out, we wind up with: +(+(*(x,x), *(3, x)),7).

We’ve converted the expression into a collection of applications of primitive functions. Or in terms that are more familiar to geeks like me, we’ve taken the expression, and turned it into an abstract syntax tree (AST) that describes it as a computation.

We’re still being pretty vague here. We haven’t really defined our notion of function or computation. But even with that vagueness, we’ve already started making the notion of what an expression is much more concrete. We’ve taken the abstract notion of expressions, and translated it to a much less abstract notion of a computation expressed as a sequence of computable functions.

This is a really useful thing to do. It’s useful enough that we don’t want to limit it to just “pure” expressions. In the type theoretic view of computation, everything is an expression. That’s important for multiple reasons – but to make it concrete, we’re going to eventually get around to showing how types work in expressions, what it means for an expression to be well-typed, how to infer types for expressions, and so on. We want all of that to work for any program – not just for something that looks like a formula.

Fortunately, this works. We can also take an “expression” like for i in 1 .. 10 do f(i), and analyze it as a function: for(i, 1, 10, f(i)).

So, we’ve got a way of understanding expressions as functions. But even if we want to keep the computational side of things abstract and hand-wavy, that’s still not really enough. We’re closer to understanding expressions, but we’ve still got some huge, gaping holes.

Let’s jump back to that example expression: x^2 + 3x + 7. What does it mean? What we’ve seen so far is that we can both understand it, as a series of function calls: +(+(*(x, x), *(3, x)), 7). But we’d like to be able to evaluate it – to execute the computation that it describes. But we can’t do that: it’s got a gaping hole named x. What do we do with that?

We’re missing a really important notion: funcional abstraction. Our expression isn’t just an expression: what it really is is a function. We alluded to that before, but now we’re going to deal with it head-on. That expression doesn’t really define a computation: it defines a computational object that computes the function. When an expression has free variables – that is, variables that aren’t assigned a meaning within the expression – our expression represents something that’s been abstracted a level: instead of being a computation that produces a value, it’s an object that takes a parameter, and performs a computation on its parameter(s) in order to produce a value.

In our expression x^2 + 3x + 7, we’re looking at an expression in one free variable – which makes it a single-parameter function. In the notation of type theory, we’d write it as (x)(+(+(*(x, x), *(3, x)), 7) – that is,
the parameter variable in parens ahead of the expression that it parameterizes. (Yes, this notation stinks; but I’m sticking with the notations from the texts I’m using, and this is it.)

This notion, of parameters and function abstraction turns out to be more complex than you might expect. I’m going to stop this post here – but around wednesday, I’ll post the next part, which will look at how to understand the arity of an expression.

Logical Statements as Tasks

In the next step of our exploration of type theory, we’re going to take a step away from the set-based stuff. There are set-based intepretations of every statement in set theory. But what we really want to focus on is the interpretation of statements in computational terms.

What that means is that we’re going to take logical statements, and view them as computation tasks – that is, as formal logical specifications of a computation that we’d like to do. Under that interpretation, a proof of a statement S is an implementation of the task specified by S.

This interpretation is much, much easier for computer science types like me than the set-based interpretations. We can walk through the interpretations of all of the statements of our intuitionistic logic in just a few minutes.

We’ll start with the simple statements.

Conjunction
A \land B is a specification for a program that produces a pair (a, b) where a is a solution for A, and b is a solution for B.
Disjunction
A \lor B is a specification for a program that produces either a solution to A or a solution to B, along with a way of identifying which of A and B it solved. We do that using a version of the classical projection functions: A \lor B produce either \text{inl}(A) (that is, the left projection), or \text{inr}(B) (the right projection).
Implication
A \supset B is a specification for a program that produces a solution to B given a solution to A; in lambda calculus terms, it’s a form like \lambda x: b(x).

Now, we can move on to quantified statements. They get a bit more complicated, but if you read the quantifier right, it’s not bad.

Universal
(\forall x \in A) B(x) is a program which, when executed, yields a program of the form \lambda x.b(x), where b(x) is an implementation of B, and x is an implementation of A. In other words, a universal statement is a program factory, which produces a program that turns one program into another program.

To me, the easiest way to understand this is to expand the quantifier. A quantified statement \forall x \in A: B(x) can be read as \forall x: x \in A \Rightarrow B(x). If you read it that way, and just follow the computational interpretation of implication, you get precisely the definition above.

Existential
Existential quantification is easy. An existential statement \exists x \in A: B(x) is a two part problem: it needs a value for a (that is, a value of x for which a proof exists that x \in A), and a proof that for that specific value of x, x \in B. A solution, then, has two parts: it’s a pair (a, b), where a is a value in A, and b is a program that computes the problem B(a).

This is the perspective from which most of Martin-Loff’s type theory pursues things. There’s a reason why ML’s type theory is so well-loved by computer scientists: because what we’re really doing here is taking a foundational theory of mathematics, and turning it into a specification language for describing computing systems.

That’s the fundamental beauty of what Martin-Loff did: he found a way of formulating all of constructive mathematics so that it’s one and the same thing as the theory of computation.

And that’s why this kind of type theory is so useful as a component of programming languages: because it’s allowing you to describe the semantics of your program in terms that are completely natural to the program. The type system is a description of the problem; and the program is the proof.

With full-blown Martin-Loff type system, the types really are a full specification of the computation described by the program. We don’t actually use the full expressiveness of type theory in real languages – among other things, it’s not checkable! But we do use a constrained, limited logic with Martin-Loff’s semantics. That’s what types really are in programming languages: they’re logical statements! As we get deeper into type theory, well see exactly how that works.