Why oh why Y?

So in the last few posts, I’ve been building up the bits and pieces that turn lambda calculus into a useful system. We’ve got numbers, booleans, and choice operators. The only thing we’re lacking is some kind of repetition or iteration.
In lambda calculus, all iteration is done by recursion. In fact, recursion is a pretty natural way of expressing iteration. It takes a bit of getting used to, but anyone who’s programmed in a language like Scheme, ML, or Haskell for a while gets very used to idea, and feels frustrated coming back to a language like Java, where you need to write loops.
It can be a bit difficult if you’re not used to thinking recursively. I wrote [an explanation of recursion][recursion] which you can go read if you’re not used to what recursion is or how it works.
be found here). But since functions in lambda calculus don’t have names, that means that we resort to something tricky. It’s called the Y combinator, aka the lambda fixed point operator.
Let’s start by looking at a simple recursive function outside of the lambda calculus. The factorial function, n!, is the standard example:

factorial(n) = 1 if n = 0,
or factorial(n) = n*factorial(n-1) if n > 0

If we want to start trying to write that in lambda calculus, we’d need a couple of tools… We need a test for equality to zero, and we need a way of multiplying numbers; and we need a way of subtracting one.
For testing equality to zero, we’ll use a function named *IsZero*, which takes three parameters: a number, and two values. If the number is 0, it returns the first value; if it’s not 0, then it returns the second value.
For multiplication – multiplication is an iterative algorithm, so we can’t write multiplication until we work out recursion. But we’ll just handwave that for now, and have a function *Mult x y*.
And finally, for subtracting one, we’ll use *Pred x* for the predecessor of x – that is, x – 1.
So – a first stab at factorial, written with the recursive call left with a blank in it, would be:

*λ n . IsZero n 1 (Mult n (**something** (Pred n)))*

Now, the question is, what kind of “something” can we plug in to there? What we’re really like to do is plug in a copy of the function itself:

*Fact ≡ λ n . IsZero n 1 (Mult n (Fact (Pred n)))*

How can we do that? Well, the usual way of plugging something in to a lambda calculus function is through a parameter:

*Fact ≡ (λ f n . IsZero n 1 (Mult n (f (Pred n)))) Fact*

Of course, we can’t plug in a copy of the function as its own parameter that way: the name *Fact* doesn’t exist in the expression in which we’re trying to use it. You can’t use an undefined name – and in lambda calculus, the *only* way to bind a name is by passing it as a parameter to a λ expression. So what can we do?
The answer is to use something called a *combinator*. A combinator is a special kind of *higher order function* which can be defined without reference to anything but function applications. (A higher order function is a function which takes functions as parameters and returns functions as results). The Y combinator is the special, almost magical function that makes recursion possible. Here’s what it looks like:

Y ≡ λ y . (λ x . y (x x)) (λ x . y (x x))

If you look at it, the reason for calling it Y is because it is *shaped* like a Y. To show you that more clearly, sometimes we write lambda calculus using trees. Here’s the tree for the Y combinator:

Why is the Y combinator an answer to our problem in defining the factorial function? The Y combinator is something called a *fixed point* combinator. What makes it special is the fact that for any function *f*, *Y f* evaluates to *f Y f*; which evaluates to *f (f Y f)*; which evaluates to *f (f (f Y f))*. See why it’s called Y?
Let’s try walking through “*Y f*”:
1. Expand Y: “*(λ y . (λ x . y (x x)) (λ x . y (x x))) f*”
2. β: “*(λ x . f (x x)) (λ x . f (x x))*
3. β again: “*f (λ x . f (x x)) (λ x . f (x x))*”
4. Since “*Y f = (λ x . f (x x)) (λ x . f (x x))*”, what we just got in step three is “f Y f”.
See, there’s the magic of “Y”. No matter what you do, you can’t make it consume itself. Evaluating “*Y f*” will produce another copy of *f*, and leave the “Y f” part untouched.
So how do we use this crazy thing?
Remember our last attempt at defining the factorial function? Let’s look at it again:

*Fact ≡ (λ f n . IsZero n 1 (Mult n (f (Pred n)))) Fact*

Let’s rewrite it just a tiny bit to make it easier to talk about:

*Metafact ≡ (λ f n . IsZero n 1 (Mult n (f (Pred n))))*
*Fact ≡ Metafact Fact*

Now – the trick is, “Fact” is not an identifier defined inside of “Fact”. How do we let “Fact” reference “Fact”? Well, we did a lambda abstraction to let us pass the “Fact” function as a parameter; so what we needed to do is to find a way to write “Fact” that lets us pass it to itself as a parameter.
What does “Y f” do? It expands into a call to “f” with “Y f” as its first parameter. And “Y f” will expand into “f Y f” – that is, a call to “f” with “Y f” as its parameter. In other words, Y f turns “f” into a recursive function with *itself* as its first parameter.
So the factorial function is:

*Fact ≡ Y Metafact*

*(Y metafact)* is the parameter value of “f” in the metafact lambda; when we do β on the function, if n is zero, then it just returns 1. If it’s not zero, then we get the call to *f (Pred n)*. *f* betas to *Y metafact*. Which does that funky magic copying thing, giving us *metafact (Y metafact) (Pred n)*.
Voila, recursion. s
I learned about the Y combinator back in my undergrad days, which would place it around 1989 – and I still find it rather mystifying. I do understand it now, but I can’t imagine how on earth anyone ever figured it out!
If you’re interested in this, then I highly recommend getting a copy of the book [The Little Schemer][schemer]. It’s a wonderful little book – set up like a childrens’ book, where the front of each page is a question; and the back of each page is the answer. It’s written in a delightfully playful style, it’s very fun and engaging, and it will not only teach you to program in Scheme.
As an important side-note there are actually a couple of different versions of the Y combinator. There are different ways of evaluating lambda calculus: given an expression like *(λ x y . x * y) 3 ((λ z. z * z) 4)*”
we can do it in two different orders: we can first do the beta on “*(λ x y . x * y)*”,which would give us: “*3 * ((λ z . z * z) 4)*”.
Or, we could beta “*((λ z . z * z) 4)*” first: “*(λ x y . x * y) 3 (4 * 4)*”. Nn this case, the two orders end up with the same result; but that’s not always the case. Sometimes the order of evaluation matters – and the way that the Y combinator works is one of those times. One order will result in expanding the Y infinitely; the other will result in a clean recursive function.
The first order is what we call *lazy evaluation*: don’t evaluate the parameters to a function until they’re needed. (This is also pretty much the same thing as what we sometime call *by name* parameter passing.) The second is called *eager evaluation* : always evaluate parameters *before* the functions that they’re passed to. (In real programming languages, Lisp, Scheme, and ML are lambda-calculus based languages that use eager evaluation; Haskell and Miranda are lambda calculus based languages that use lazy evaluation.) The Y combinator I described above is the Y for *lazy* evaluation. If we used eager evaluation, then Y combinator above wouldn’t work – in fact, it would copy Ys forever. There is another version of Y which works for eager evaluation – in fact, it’s the one described in “The Little Schemer”, and they explain it so much better than I do that I’ll just recommend again that you head for whatever bookstore you prefer, and buy yourself a copy.
[recursion]: http://goodmath.blogspot.com/2006/03/clarifying-recursion.html
[schemer]: http://www.amazon.com/gp/redirect.html?link_code=ur2&tag=goodmathbadma-20&camp=1789&creative=9325&location=/gp/search%3F%26index=books%26keywords=little%20schemer%26_encoding=UTF8

5 thoughts on “Why oh why Y?

  1. Daryl McCullough

    I’m not sure if this makes Y less mysterious, but the intuitive leap is self-application.
    You have the equation

    Fact = Metafact Fact

    If we make the assumption that Fact is of the form QQ (some function Q applied to itself) then we have:

    QQ = Metafact QQ

    We still don’t know what Q is, but we know that Q operating on Q produces some expression involving Q. Let’s just generalize to assume that Q operating on any function f produces the same expression, but with Q replaced by f:

    Qf = Metafact (ff)

    Now, the right side no longer involves Q! So we can write:

    Q = λ f . Metafact (f f)

    Remembering that Fact = QQ, we can write:

    Fact = (λ f . Metafact (f f)) (λ f . Metafact (f f)) = Y Metafact

  2. Nagu

    Thank you for such beautiful essays.
    Could you please advice me on how to draw such graphs/diagrams on a blog?
    Thank you,

  3. Per Vognsen

    Let’s just generalize to assume that Q operating on any function f produces the same expression, but with Q replaced by f:

    For those that haven’t seen the reasoning before I suspect the above part is probably the only confusing step. The point is that the only way that Q can appear in the beta-reduction of QQ is by having e reference x, assuming Q = lambda x.e. Why? Well, if this wasn’t the case then Q would have to appear literally in e (its own body). But this would lead to an infinite regress as Mark already demonstrated in the article.

  4. Pingback: Types and Lambda Calculus | Good Math/Bad Math

Leave a Reply