Why oh why Y?

So in the last few posts, I’ve been building up the bits and pieces that turn lambda calculus into a useful system. We’ve got numbers, booleans, and choice operators. The only thing we’re lacking is some kind of repetition or iteration.
In lambda calculus, all iteration is done by recursion. In fact, recursion is a pretty natural way of expressing iteration. It takes a bit of getting used to, but anyone who’s programmed in a language like Scheme, ML, or Haskell for a while gets very used to idea, and feels frustrated coming back to a language like Java, where you need to write loops.
It can be a bit difficult if you’re not used to thinking recursively. I wrote [an explanation of recursion][recursion] which you can go read if you’re not used to what recursion is or how it works.
be found here). But since functions in lambda calculus don’t have names, that means that we resort to something tricky. It’s called the Y combinator, aka the lambda fixed point operator.
Let’s start by looking at a simple recursive function outside of the lambda calculus. The factorial function, n!, is the standard example:

factorial(n) = 1 if n = 0,
or factorial(n) = n*factorial(n-1) if n > 0

If we want to start trying to write that in lambda calculus, we’d need a couple of tools… We need a test for equality to zero, and we need a way of multiplying numbers; and we need a way of subtracting one.
For testing equality to zero, we’ll use a function named *IsZero*, which takes three parameters: a number, and two values. If the number is 0, it returns the first value; if it’s not 0, then it returns the second value.
For multiplication – multiplication is an iterative algorithm, so we can’t write multiplication until we work out recursion. But we’ll just handwave that for now, and have a function *Mult x y*.
And finally, for subtracting one, we’ll use *Pred x* for the predecessor of x – that is, x – 1.
So – a first stab at factorial, written with the recursive call left with a blank in it, would be:

*λ n . IsZero n 1 (Mult n (**something** (Pred n)))*

Now, the question is, what kind of “something” can we plug in to there? What we’re really like to do is plug in a copy of the function itself:

*Fact ≡ λ n . IsZero n 1 (Mult n (Fact (Pred n)))*

How can we do that? Well, the usual way of plugging something in to a lambda calculus function is through a parameter:

*Fact ≡ (λ f n . IsZero n 1 (Mult n (f (Pred n)))) Fact*

Of course, we can’t plug in a copy of the function as its own parameter that way: the name *Fact* doesn’t exist in the expression in which we’re trying to use it. You can’t use an undefined name – and in lambda calculus, the *only* way to bind a name is by passing it as a parameter to a λ expression. So what can we do?
The answer is to use something called a *combinator*. A combinator is a special kind of *higher order function* which can be defined without reference to anything but function applications. (A higher order function is a function which takes functions as parameters and returns functions as results). The Y combinator is the special, almost magical function that makes recursion possible. Here’s what it looks like:

Y ≡ λ y . (λ x . y (x x)) (λ x . y (x x))

If you look at it, the reason for calling it Y is because it is *shaped* like a Y. To show you that more clearly, sometimes we write lambda calculus using trees. Here’s the tree for the Y combinator:

Why is the Y combinator an answer to our problem in defining the factorial function? The Y combinator is something called a *fixed point* combinator. What makes it special is the fact that for any function *f*, *Y f* evaluates to *f Y f*; which evaluates to *f (f Y f)*; which evaluates to *f (f (f Y f))*. See why it’s called Y?
Let’s try walking through “*Y f*”:
1. Expand Y: “*(λ y . (λ x . y (x x)) (λ x . y (x x))) f*”
2. β: “*(λ x . f (x x)) (λ x . f (x x))*
3. β again: “*f (λ x . f (x x)) (λ x . f (x x))*”
4. Since “*Y f = (λ x . f (x x)) (λ x . f (x x))*”, what we just got in step three is “f Y f”.
See, there’s the magic of “Y”. No matter what you do, you can’t make it consume itself. Evaluating “*Y f*” will produce another copy of *f*, and leave the “Y f” part untouched.
So how do we use this crazy thing?
Remember our last attempt at defining the factorial function? Let’s look at it again:

*Fact ≡ (λ f n . IsZero n 1 (Mult n (f (Pred n)))) Fact*

Let’s rewrite it just a tiny bit to make it easier to talk about:

*Metafact ≡ (λ f n . IsZero n 1 (Mult n (f (Pred n))))*
*Fact ≡ Metafact Fact*

Now – the trick is, “Fact” is not an identifier defined inside of “Fact”. How do we let “Fact” reference “Fact”? Well, we did a lambda abstraction to let us pass the “Fact” function as a parameter; so what we needed to do is to find a way to write “Fact” that lets us pass it to itself as a parameter.
What does “Y f” do? It expands into a call to “f” with “Y f” as its first parameter. And “Y f” will expand into “f Y f” – that is, a call to “f” with “Y f” as its parameter. In other words, Y f turns “f” into a recursive function with *itself* as its first parameter.
So the factorial function is:

*Fact ≡ Y Metafact*

*(Y metafact)* is the parameter value of “f” in the metafact lambda; when we do β on the function, if n is zero, then it just returns 1. If it’s not zero, then we get the call to *f (Pred n)*. *f* betas to *Y metafact*. Which does that funky magic copying thing, giving us *metafact (Y metafact) (Pred n)*.
Voila, recursion. s
I learned about the Y combinator back in my undergrad days, which would place it around 1989 – and I still find it rather mystifying. I do understand it now, but I can’t imagine how on earth anyone ever figured it out!
If you’re interested in this, then I highly recommend getting a copy of the book [The Little Schemer][schemer]. It’s a wonderful little book – set up like a childrens’ book, where the front of each page is a question; and the back of each page is the answer. It’s written in a delightfully playful style, it’s very fun and engaging, and it will not only teach you to program in Scheme.
As an important side-note there are actually a couple of different versions of the Y combinator. There are different ways of evaluating lambda calculus: given an expression like *(λ x y . x * y) 3 ((λ z. z * z) 4)*”
we can do it in two different orders: we can first do the beta on “*(λ x y . x * y)*”,which would give us: “*3 * ((λ z . z * z) 4)*”.
Or, we could beta “*((λ z . z * z) 4)*” first: “*(λ x y . x * y) 3 (4 * 4)*”. Nn this case, the two orders end up with the same result; but that’s not always the case. Sometimes the order of evaluation matters – and the way that the Y combinator works is one of those times. One order will result in expanding the Y infinitely; the other will result in a clean recursive function.
The first order is what we call *lazy evaluation*: don’t evaluate the parameters to a function until they’re needed. (This is also pretty much the same thing as what we sometime call *by name* parameter passing.) The second is called *eager evaluation* : always evaluate parameters *before* the functions that they’re passed to. (In real programming languages, Lisp, Scheme, and ML are lambda-calculus based languages that use eager evaluation; Haskell and Miranda are lambda calculus based languages that use lazy evaluation.) The Y combinator I described above is the Y for *lazy* evaluation. If we used eager evaluation, then Y combinator above wouldn’t work – in fact, it would copy Ys forever. There is another version of Y which works for eager evaluation – in fact, it’s the one described in “The Little Schemer”, and they explain it so much better than I do that I’ll just recommend again that you head for whatever bookstore you prefer, and buy yourself a copy.
[recursion]: http://goodmath.blogspot.com/2006/03/clarifying-recursion.html
[schemer]: http://www.amazon.com/gp/redirect.html?link_code=ur2&tag=goodmathbadma-20&camp=1789&creative=9325&location=/gp/search%3F%26index=books%26keywords=little%20schemer%26_encoding=UTF8

The Genius of Alonzo Church (rerun)

I’m on vacation this week, so I’m posting reruns of some of the better articles from when Goodmath/Badmath was on Blogger. Todays is a combination of two short posts on numbers and control booleans in λ calculus.
So, now, time to move on to doing interesting stuff with lambda calculus. To
start with, for convenience, I’ll introduce a bit of syntactic sugar to let us
name functions. This will make things easier to read as we get to complicated
stuff.
To introduce a *global* function (that is a function that we’ll use throughout our lambda calculus introduction without including its declaration in every expression), we’ll use a definition like the following:
*square ≡ λ x . x × x*
This declares a function named “square”, whose definition is “*λ x . x×x*”. If we had an expression “square 4”, the definition above means that it would effectively be treated as if the expression were: “*(λ square . square 4)(λ x . x×x)*”.
Numbers in Lambda Calculus
——————————
In some of the examples, I used numbers and arithmetic operations. But numbers don’t really exist in lambda calculus; all we really have are functions! So we need to invent some way of creating numbers using functions. Fortunately, Alonzo Church, the genius who invented the lambda calculus worked out how to do that. His version of numbers-as-functions are called Church Numerals.
In Church numerals, all numbers are functions with two parameters:
* Zero ≡ *λ s z . z*
* One ≡ *λ s z . s z*
* Two ≡ *λ s z . s (s z)*
* Three ≡ *λ s z . s (s (s z))*
* Any natural number “n”, is represented by a Church numeral which is a function which applies its first parameter to its second parameter “n” times.
A good way of understanding this is to think of “z” as being a a name for a zero-value, and “s” as a name for a successor function. So zero is a function which just returns the “0” value; one is a function which applies the successor function once to zero; two is a function which applies successor to the successor of zero, etc. It’s just the Peano arithmetic definition of numbers transformed into lambda calculus.
But the really cool thing is what comes next. If we want to do addition, x + y, we need to write a function with four parameters; the two numbers to add; and the “s” and “z” values we want in the resulting number:
add ≡ *λ s z x y . x s (y s z)*
Let’s curry that, to separate the two things that are going on. First, it’s taking two parameters which are the two values we need to add; second, it needs to normalize things so that the two values being added end up sharing the same binding of the zero and successor values.
add_curry ≡ λ x y. (λ s z . (x s (y s z)))
Look at that for a moment; what that says is, to add x and y: create the church numeral “y” using the parameters “s” and “z”. Then **apply x** to that new church numeral y. That is: a number is a function which adds itself to another number.
Let’s look a tad closer, and run through the evaluation of 2 + 3:
add_curry (λ s z . s (s z)) (λ s z . s (s (s z)))
To make things easier, let’s alpha 2 and 3, so that “2” uses “s2” and “z2”, and 3 uses “s3” and “z3”;
add_curry (λ s2 z2 . s2 (s2 z2)) (λ s3 z3 . s3 (s3 (s3 z3)))
Now, let’s do replace “add_curry” with its definition:

(λ x y .(λ s z. (x s y s z))) (λ s2 z2 . s2 (s2 z2)) (λ s3 z3 . s3 (s3 (s3 z3)))
Now, let’s do a beta on add:

λ s z . (λ s2 z2 . s2 (s2 z2)) s (λ s3 z3 . s3 (s3 (s3 z3)) s z)
And now let’s beta the church numeral for three. This basically just “normalizes” three: it replaces the successor and zero function in the definition of three with the successor and zero functions from the parameters to add.
λ s z . (λ s2 z2 . s2 (s2 z2)) s (s (s (s z)))
Now.. Here comes the really neat part. Beta again, this time on the lambda for two. Look at what we’re going to be doing here: two is a function which takes two parameters: a successor function, and zero function. To add two and three, we’re using the successor function from add function; and we’re using the result of evaluating three *as the value of the zero!* for two:
λ s z . s (s (s (s (s z))))
And we have our result: the church numeral for five!
Choice in Lambda Calculus
—————————
Now that we have numbers in our Lambda calculus, there are only two things missing before we can express arbitrary computations: a way of expressing choice, and a way of expressing repetition. So now I’ll talk about booleans and choice; and then next post I’ll explain repetition and recursion.
We’d like to be able to write choices as if/then/else expressions, like we have in most programming languages. Following the basic pattern of the church numerals, where a number is expressed as a function that adds itself to another number, we’ll express true and false values as functions that perform an if-then-else operation on their parameters. These are sometimes called *Church booleans*. (Of course, also invented by Alonzo Church.)
* TRUE ≡ λ t f . t
* FALSE ≡ λ t f . f
So, now we can write an “if” function, whose first parameter is a condition expression, second parameter is the expression to evaluate if the condition was true, and third parameter is the expression to evaluate if the condition is false.
* IfThenElse ≡ *λ cond t f . cond t f*
For the boolean values to be useful, we also need to be able to do the usual logical operations:
* BoolAnd ≡ *λ x y .x y FALSE*
* BoolOr ≡ *λ x y. x TRUE y*
* BoolNot ≡ *λ x . x FALSE TRUE*

Now, let’s just walk through those a bit. Let’s first take a look at BoolAnd. We’ll try evaluating “*BoolAnd TRUE FALSE*”:
* Expand the TRUE and FALSE definitions: “*BoolAnd (λ t f . t) (λ t f . f)*”
* Alpha the true and false: “*BoolAnd (λ tt tf . tt) (λ ft ff . ff)*”
* Now, expand BoolAnd: *(λ t f. t f FALSE) (λ tt tf . tt) (λ ft ff . ff)*
* And beta: *(λ tt tf.tt) (λ ft ff. ff) FALSE*
* Beta again: *(λ xf yf . yf)*
And we have the result: *BoolAnd TRUE FALSE = FALSE*. Now let’s look at “*BoolAnd FALSE TRUE*”:
* *BoolAnd (λ t f . f) (λ t f .t)*
* Alpha: *BoolAnd (λ ft ff . ff) (λ tt tf . tt)*
* Expand BoolAnd: *(λ x y .x y FALSE) (lambda ft ff . ff) (lambda tt tf . tt)*
* Beta: *(λ ft ff . ff) (lambda tt tf . tt) FALSE
* Beta again: FALSE
So *BoolAnd FALSE TRUE = FALSE*. Finally, *BoolAnd TRUE TRUE*:
* Expand the two trues: *BoolAnd (λ t f . t) (λ t f . t)*
* Alpha: *BoolAnd (λ xt xf . xt) (λ yt yf . yt)*
* Expand BoolAnd: *(λ x y . x y FALSE) (λ xt xf . xt) (λ yt yf . yt)*
* Beta: *(λ xt xf . xt) (λ yt yf . yt) FALSE*
* Beta: (λ yt yf . yt)
So *BoolAnd TRUE TRUE = TRUE*.
The other booleans work in much the same way.
So by the genius of Alonzo church, we have *almost* everything we need in order to write programs in λ calculus.

Berlinski responds: A Digested Debate

I thought that for a followup to yesterday’s repost of my takedown of Berlinksi, that today I’d show you a digested version of the debate that ensued when Berlinksi showed up to defend himself. You can see the original post and the subsequent discussion here.
It’s interesting, because it demonstrates the kinds of debating tactics that people like Berlinski use to avoid actually confronting the genuine issue of their dishonesty. The key thing to me about this is that Berlinski is a reasonably competent mathematician – but watch how he sidesteps to avoid discussing any of the actual mathematical issues.
Berlinksi first emailed his response to me rather than posting it in a comment. So I posted it, along with my response. As I said above, this is a digest of the discussion; if you want to see the original debate in all its gory detail, you can follow the link above.
——-
The way the whole thing started was when Berlinski emailed me after my original post criticizing his sloppy math. He claimed that it was “impossible for him to post comments to my site”; although he never had a problem after I posted this. His email contained several points, written in Berlinski’s usual arrogant and incredibly verbose manner:
1. He didn’t make up the numbers; he’s quoting established literature: “As I have indicated on any number of occasions, the improbabilities that I cited are simply those that are cited in the literature”
2. His probability calculations were fine: “The combinatorial calculations I
made were both elementary and correct.”
3. Independence is a valid assumption in his calculations: “Given the chemical
structure of RNA, in which nucleotides are bound to a sugar phosphate backbone
but not to one another, independence with respect to template formation is not
only reasonable as an assumption, but inevitable.”
4. He never actually said there could be only one replicator: “There may be many sequences in the pre-biotic environment capable of carrying out various chemical activities.”
Of course, he was considerably more verbose than that.
This initial response is somewhat better at addressing arguments than the later ones. What makes that a really sad statement is that even this initial response doesn’t *really* address anything:
* He didn’t address the specific criticisms of his probability calculations, other than merely asserting that they were correct.
* He doesn’t address questions about the required sizes of replicating molecules, other than asserting that his minimum length is correct, and attributing the number to someone else. (While neglecting to mention that there are *numerous* predictions of minimum length, and the one he cited is the longest.)
* He doesn’t explain why, even though he doesn’t deny that there may have been many potential replicators, his probability calculation is based on the assumption that there is *exactly one*. As I said in my original response to him: In your space of 1060 alleged possibilities, there may be 1 replicator; there may be 1040. By not addressing that, you make your probability calculation utterly worthless.
* He doesn’t address his nonsensical requirement for a “template” for a replicator. The idea of the “template” is basically that for a replicator to copy itself, it needs to have a unique molecule called a “template” that it can copy itself onto. It can’t replicate onto anything *but* a template, and it can’t create the template itself. The “template” is a totally independent chemical, but there is only one possible template that a replicator can copy itself onto. He doesn’t address that point *at all* in his response.
* He doesn’t address the validity of his assumption that all nucleotide chains of the same length are equally likely.
Berlinksi responded, in absolutely classic style. On of the really noteworthy things about Berlinksi’s writing style is its incredible pomposity. This was no disappointment on that count; however, it was quite sad with respect to content. I have to quote the first several lines verbatim, to give you a sense of what I’m talking about when I say he’s pompous:
>Paris
>6 April, 2006
>
>I have corrected a few trivial spelling errors in your original posting, and I
>have taken the liberty of numbering comments:
>
>I discuss these points seriatim:
You see, we *really* needed to know that he was in Paris. Crucially important to his arguments to make sure that we realize that! And I’m a bad speller, which is a very important issue in the discussed. And look, he can use latin words for absolutely no good reason!
He spends fifty lines of prose on the issue of whether or no 100 bases is the correct minimum possible length for a replicator. Those 50 lines come down to “I have one source that says that length, so nyah!” No acknowledgment that there are other sources; no reason for *why* this particular source must be correct. Just lots and lots of wordy prose.
He response to the question about the number of replicators by sidestepping. No math, no science; just evading the question:
>2a) On the contrary. Following Arrhenius, I entertain the possibility that
>sequence specificity may not, after all, be a necessary condition for
>demonstrable ligase activity — or any other biological function, for that
>matter. I observed — correctly, of course — that all out evidence is against
>it. All evidence – meaning laboratory evidence; all evidence – meaning our
>common experience with sequence specificity in linguistics or in any other
>field in which an alphabet of words gives rise to a very large sample space in
>which meaningful sequences are strongly non-generic – the space of all
>proteins, for example.
The template stuff? More sidestepping. He rambles a bit, cites several different sources, and asserts that he’s correct. The basic idea of his response is: the RNA-world hypothesis assumes Watson-Crick base pairing replication, which needs a template. And the reason that it needs to be Watson-Crick is because anything else is too slow and too error prone. But why does speed matter, if there’s no competition? And *of course* the very first replicator would be error prone! Error correction is not something that we would suppose would happen spontaneously and immediately as the first molecule started self-replicating. Error correction is something that would be *selected for* by evolution *after* replication and competition had been established.
Then he sidesteps some more, by playing linguistic games. I referred to the chemicals from which an initial self-replicator developed as “the precursors of a replicator”; he criticizes that phrasing. That’s his entire response.
And finally, we get to independence. It’s worth quoting him again, to show his tactics:
>There remains the issue of independence. Independence is, of course, the de
>facto hypothesis in probability calculations; and in the case of pre-biotic
>chemistry, strongly supported by the chemical facts. You are not apt to
>dismiss, I suppose, the hypothesis that if two coins are flipped the odds in
>favor if seeing two heads is one in four on the grounds that, who knows?, the
>coins might be biased. Who knows? They might be. But the burden of
>demonstrating this falls on you.
One other quote, to give you more of the flavor of a debate with Berlinski:
>5a) There are two issues here: The first is the provenance of my argument; the
>second, my endorsement of its validity. You have carelessly assumed that
>arguments I drew from the literature were my own invention. This is untrue. I
>expect you to correct this misunderstanding as a matter of scholarly probity.
>
>As for the second point, it goes without saying that I endorsed the arguments
>that I cited. Why on earth would I have cited them otherwise?
I really love that quote. Such delightful fake indignance; how *dare* I accuse him of fabricating arguments! Even though he *did* fabricate them. The fake anger allows him to avoid actually *discussing* his arguments.
After that, it descends into seeming endless repetition. It’s just more of the “nyah nyah I’m right” stuff, without actually addressing the criticism. There’s always a way to sidestep the real issue by either using excess wordiness to distract people, or fake indignance that anyone would dare to question anything so obvious!
My response to that is short enough that I’ll just quote it, rather than redigesting it:
>As I’ve said before, I think that there are a few kinds of fundamental errors
>that you make repeatedly; and I don’t think your comments really address them
>in a meaningful way. I’m going to keep this as short as I can; I don’t like
>wasting time rehashing the same points over and over again.
>
>With regard to the basic numbers that you use in your probability calculations:
>no probability calculation is any better than the quality of the numbers that
>get put into it. As you admit, no one knows the correct length of a minimum
>replicator. And you admit that no one has any idea how many replicators of
>minimum or close to minimum length there are – you make a non-mathematical
>argument that there can’t be many. But there’s no particular reason to believe
>that the actual number is anywhere close to one. A small number of the possible
>patterns of minimum length? No problem. *One*? No way, sorry. You need to make
>a better argument to support eliminating 10^60 – 1 values. (Pulling out my old
>favorite, recursive function theory: the set of valid turing machine programs
>is a space very similar to the set of valid RNA sequences; there are numerous
>equally valid and correct universal turing machine programs at or close to the
>minimum length. The majority of randomly generated programs – the *vast*
>majority of randomly generated programs – are invalid. But the number of valid
>ones is still quite large.)
>
>Your template argument is, to be blunt, silly. No, independence is not the de
>facto hypothesis, at least not in the sense that you’re claiming. You do not
>get to go into a probability calculation, say “I don’t know the details of how
>this works, and therefore I can assume these events are independent.” You need
>to eliminate dependence. In the case of some kind of “pool” of pre-biotic
>polymers and fragments (which is what I meant by precursors), the chemical
>reactions occuring are not occuring in isolation. There are numerous kinds of
>interactions going on in a chemically active environment. You don’t get to just
>assume that those chemical interactions have no effect. It’s entirely
>reasonable to believe that there is a relationship between the chains that form
>in such an environment; if there’s a chance of dependence, you cannot just
>assume independence. But again – you just cook the numbers and use the
>assumptions that suit the argument you want to make.
———-
The rest of the debate was more repetition. Some selected bits:
>No matter how many times I offer a clear and well-supported answers to certain
>criticisms of my essays, those very same criticisms tend to reappear in this
>discussion, strong and vigorous as an octopus.
>
>1 No one knows the minimum ribozyme length for demonstrable replicator
>activity. The figure of the 100 base pairs required for what Arrhenius calls
>”demonstrable ligase activity,” is known. No conceivable purpose is gained from
>blurring this distinction.
>
>Does it follow, given a sample space containing 1060 polynucleotides of 100
>NT’s in length, that the odds in favor of finding any specific polynucleotide
>is one in 1060?
>Of course it does. It follows as simple mathematical fact, just as it follows
>as simple mathematical fact that the odds in favor of pulling any particular
>card from a deck of cards is one in fifty two.
>Is it possible that within a random ensemble of pre-biotic polynucleotides
>there may be more than one replicator?
>Of course it is possible. Whoever suggested the contrary?
This is a great example of Berlinksi’s argument style. Very arrogant argument by assertion, trying to throw as much text as possible at things in order to confuse them.
The issues we were allegedly “discussing” here was whether or not the space of nucleotide chains of a given length could be assumed to be perfectly uniform; and whether or not it made sense to assert that there was only *one* replicator in that space of 1060 possible chains.
As you can see, his response to the issue of distribution is basically shouting: “**Of course** it’s uniform, any moron can see that!”.
Except that it *isn’t* uniform. In fact, quite a number of chains of length 100 *are impossible*. It’s a matter of geometry: the different chains take different shapes depending on their constituents. Many of the possible chains are geometrically impossible in three dimensions. How many? No one is sure: protein folding is still a big problem: given our current level of knowledge, figuring out the shape of a protein that we *know* exists is still very difficult for us.
And his response to his claim that there is exactly one replicator in that space? To sidestep it by claiming that he never said that. Of course, he calculated his probability using that as an assumption, but he never explicitly *said* it.
His next response opened with a really wonderful example of his style: pure pedantry that avoids actually *discussing* the criticisms of his points.
>>The point is that we’re talking about some kind of pool of active chemicals
>>reacting with one another and forming chains ….
>
>What you are talking about is difficult to say. What molecular biologists are
>talking about is a) a random pool of beta D-nucleotides; and b) a random
>ensemble of polynucleotides. The polynucleotides form a random ensemble because
>chain polymerization is not sequence-specific.
>
>>The set of very long chains that form is probably not uniform ….
>
>Sets are neither uniform nor non-uniform. It is probability distributions that
>are uniform. Given a) and b) above, one has a classical sampling with
>replacement model in the theory of probability, and thus a uniform and discrete
>probability measure.
Ok. Anyone out there who could read this argument, and *not* know what I was talking about when I said “some kind of pool of active chemicals reacting with one another and forming chains”?
How about anyone who thinks that my use of the word “set” in the quote above is the least bit unclear? Anyone who thinks that “the set of very long chains that form is probably not uniform” is the *least* bit ambiguous?
No, I thought not. The point, as usual, is to avoid actually *addressing* difficult arguments. So when confronted with something hard to answer, you look for a good distraction, like picking on grammar or word choice, so that you can pretend that the reason you can’t answer an argument is because the argument didn’t make sense. So he focuses on the fact that I didn’t use the word “set” in its strict mathematical sense, and then re-asserts his argument.
Another example. At one point in the argument, disputing his assertion of independent probabilities for his “template” and his “replicator”, I said the following:
>I’m *not* saying that in general, you can’t make assumptions of independence.
>What I’m saying is what *any* decent mathematician would say: to paraphrase my
>first semester probability book: “independence between two events is a valid
>assumption *if and only if* there is no known interaction between the events.”
>That is the *definition* of independence…
Berlinksi’s response? Again, pure distractive pedantry:
>If you are disposed to offer advice about mathematics, use the language, and
>employ the discipline, common to mathematics itself. What you have offered is
>an informal remark, and not a definition. The correct definition is as follows:
>Two events A and B are independent if P(AB) = P(A)P(B). As a methodological
>stricture, the remark you have offered is, moreover, absurd inasmuch as some
>interaction between events can never be ruled out a priori, at least in the
>physical sciences.
Does this address my criticism? No.
The Bayesian rules for combining probabilities say “If A and B are independent, then the probability of AB is the probability of A times the probability of B”. You *can* invert that definition, and use it to show that two events are independent, by showing that the probability of their occurring together is
the product of their individual probabilities. What he’s doing up there is a pedantic repetition of a textbook definition in the midst of some arrogant posturing. But since he’s claiming to be talking mathematically, let’s look at what he says *mathematically*. I’m asserting that you need to show that events are independent if you want to treat them as independent in a probability calculation. He responds by saying I’m not being mathematical; and spits out the textbook definition. So let’s put the two together, to see what Berlinksi is arguing mathematically:
>We can assume that the probability of two events, A and B are independent and
>can be computed using P(AB) = P(A)×P(B) if and only if the probability
>P(AB) = P(A)×P(B).
Not a very useful definition, eh?
And his response to my criticism of that?
>Paris
>David Berlinski
>
>I am quite sure that I have outstayed my welcome. I’m more than happy to let
>you have the last words. Thank you for allowing me to post my own comments.
>
>DB
And that was the end of the debate.
Sad, isn’t it?

A Lambda Calculus rerun

I’m on vacation this week, so I’m recycling some posts that I thought were particularly interesting to give you something to read.

In computer science, especially in the field of programming languages, we tend to use one particular calculus a lot: the Lambda calculus. Lambda calculus is also extensively used by logicians studying the nature of computation and the structure of discrete mathematics. Lambda calculus is great for a lot of reasons, among them:

  1. It’s very simple.
  2. It’s Turing complete.
  3. It’s easy to read and write.
  4. It’s semantics are strong enough that we can do reasoning from it.
  5. It’s got a good solid model.
  6. It’s easy to create variants to explore the properties of various alternative ways of structuring computations or semantics.

The ease of reading and writing lambda calculus is a big deal. It’s led to the development of a lot of extremely good programming languages based, to one degree or another, on the lambda calculus: Lisp, ML, and Haskell are very strongly lambda calculus based.

The lambda calculus is based on the concept of *functions*>. In the pure lambda calculus, *everything* is a function; there are no values *at all* except for functions. But we can build up anything we need using functions. Remember back in the early days of this blog, I talked a bit about how to build mathematics? We can build the entire structure of mathematics from nothing but lambda calculus.

So, enough lead-in. Let’s dive in a look at LC. Remember that for a calculus, you need to define two things: the syntax, which describes how valid expressions can be written in the calculus; and a set of rules that allow you to symbolically manipulate the expressions.

Lambda Calculus Syntax

The lambda calculus has exactly three kinds of expressions:

  1. Function definition: a function in lambda calculus is an expression, written: “λ param . body” which defines a function with one parameter.
  2. Identifier reference: an identifier reference is a name which matches the name of a parameter defined in a function expression enclosing the reference.
  3. Function application: applying a function is written by putting the function value in front of its parameter, as in “x y” to apply the function x to the value y.

Currying

There’s a trick that we play in lambda calculus: if you look at the definition above, you’ll notice that a function (lambda expression) only takes one parameter. That seems like a very big constraint – how can you even implement addition with only one parameter?

It turns out to be no problem, because of the fact that *functions are values*. So instead of writing a two parameter function, you can write a one parameter function that returns a one parameter function – in the end, it’s effectively the same thing. It’s called currying, after the great logician Haskell Curry.

For example, suppose we wanted to write a function to add x and y. We’d like to write something like: λ x y . x + y. The way we do that with one-parameter functions is: we first write a function with one parameter, which returns another function with one parameter.

Adding x plus y becomes writing a one-parameter function with parameter x, which returns another one parameter function which adds x to its parameter:

λ x . (λ y . x + y)

Now that we know that adding multiple parameter functions doesn’t *really* add anything but a bit of simplified syntax, we’ll go ahead and use them when it’s convenient.

Free vs Bound Identifiers

One important syntactic issue that I haven’t mentioned yet is *closure* or *complete binding*. For a lambda calculus expression to be evaluated, it cannot reference any identifiers that are not *bound*. An identifier is bound if it a parameter in an enclosing lambda expression; if an identifier is *not* bound in any enclosing context, then it is called a *free* variable.

  1. *λ x . plus x y*: in this expression, “y” and “plus” are free, because they’re not the parameter of any enclosing lambda expression; x is bound because it’s a parameter of the function definition enclosing the expression “plus x y” where it’s referenced.
  2. *λ x y.y x*: in this expression both x and y are bound, because they are parameters of the function definition.
  3. *λ y . (λ x . plus x y*: In the inner lambda, “*λ x . plus x y*”, y and plus are free and x is bound. In the full expression, both x and y are bound: x is bound by the inner lambda, and y is bound by the other lambda. “plus” is still free.

We’ll often use “free(x)” to mean the set of identifiers that are free in the expression “x”.

A lambda calculus expression is completely valid only when all of its variables are bound. But when we look at smaller subexpressions of a complex expression, taken out of context, they can have free variables – and making sure that the variables that are free in subexpressions are treated right is very important.

Lambda Calculus Evaluation Rules

There are only two real rules for evaluating expressions in lambda calculus; they’re called α and beta. α is also called “conversion”, and beta is also called “reduction”.

α Conversion

α is a renaming operation; basically it says that the names of variables are unimportant: given any expression in lambda calculus, we can change the name of the parameter to a function as long as we change all free references to it inside the body.

So – for instance, if we had an expression like:

λ x . if (= x 0) then 1 else x^2

We can do α to replace X with Y (written “α[x/y]” and get):

λ y . if (= y 0) then 1 else y^2

Doing α does *not* change the meaning of the expression in any way. But as we’ll see later, it’s important because it gives us a way of doing things like recursion.

β Reduction

β reduction is where things get interesting: this single rule is all that’s needed to make the lambda calculus capable of performing *any* computation that can be done by a machine.

Beta basically says that if you have a function application, you can replace it with a copy of the body of the function with references to the parameter identifiers replaced by references to the parameter value in the application. That sounds confusing, but it’s actually pretty easy when you see it in action.

Suppose we have the application expression: “*(λ x . x + 1) 3*“. What beta says is that we can replace the application by taking the body of the function (which is “*x + 1*”); and replacing references to the parameter “*x*” by the value “*3*”; so the result of the beta reduction is “*3 + 1*”.

A slightly more complicated example is the expression:

λ y . (λ x . x + y)) q

It’s an interesting expression, because it’s a lambda expression that when applied, results in another lambda expression: that is, it’s a function that creates functions. When we do beta reduction in this, we’re replacing all references to the parameter “y” with the identifier “q”; so, the result is “*λ x. x+q*”.

One more example, just for the sake of being annoying:

(λ x y. x y) (λ z . z × z) 3“.

That’s a function that takes two parameters, and applies the first one to the second one. When we evaluate that, we replace the parameter “*x*” in the body of the first function with “*λ z . z × z*”; and we replace the parameter “*y*” with “3”, getting: “*(λ z . z × z) 3*”. And we can perform beta on that, getting “*3 × 3*”.

Written formally, beta says:

λ x . B e = B[x := e] if free(e) ⊂ free(B[x := e]

That condition on the end, “if free(e) subset free(B[x := e]” is why we need α: we can only do beta reduction *if* doing it doesn’t create any collisions between bound identifiers and free identifiers: if the identifier “z” is free in “e”, then we need to be sure that the beta-reduction doesn’t make “z” become bound. If there is a name collision between a variable that is bound in “B” and a variable that is free in “e”, then we need to use α to change the identifier names so that they’re different.

As usual, an example will make that clearer: Suppose we have a expression defining a function, “*λ z . (λ x . x+z)*”. Now, suppose we want to apply it:

(λ z . (λ x . x + z)) (x + 2)

In the parameter “*(x + 2)*”, x is free. Now, suppose we break the rule and go ahead and do beta. We’d get “*λ x . x + x + 2*”.

The variable that was *free* in “x + 2” is now bound. Now suppose we apply that function: “*(λ x . x + x + 2) 3*”. By beta, we’d get “3 + 3 + 2”, or 8.

What if we did α the way we were supposed to?

  • By α[x/y], we would get “*λ z . (λ y . y + z) (x+2)*”.
  • by α[x/y]: lambda z . (lambda y . y+z)) (x + 2). Then by β, we’d get “*λ y . y + x + 2*”. If we apply this function the way we did above, then by β, we’d get “*3+x+2*”.

“*3+x+2*” and “*3+3+2*” are very different results!

And that’s pretty much it. There’s another *optional* rule you can add called η-conversion. η is a rule that adds *extensionality*, which provides a way of expressing equality between functions.

η says that in any lambda expression, I can replace the value “f” with the value “g” if/f for all possible parameter values x, *f x = g x*.

What I’ve described here is Turing complete – a full effective computation system. To make it useful, and see how this can be used to do real stuff, we need to define a bunch of basic functions that allow us to do math, condition tests, recursion, etc. I’ll talk about those in my next post.

We also haven’t defined a model for lambda-calculus yet. (I discussed models here and here.) That’s actually quite an important thing! LC was played with by logicians for several years before they were able to come up with a complete model for it, and it was a matter of great concern that although LC looked correct, the early attempts to define a model for it were failures. After all, remember that if there isn’t a valid model, that means that the results of the system are meaningless!

Bad Math from David Berlinksi

I’m away on vacation this week, so this is a repost of one of early GM/BM entries when it was on Blogger. As usual, I’ve revised it slightly. Berlinksi actually showed up and responded; a digest of the discussion back and forth is scheduled to appear here later this week.
——————————-
In my never-ending quest for bad math to mock, I was taking a look at the Discovery Institute’s website, where I found an essay, On the Origin of Life, by David Berlinksi. Bad math? Oh, yeah. Bad, sloppy, crappy math. Some of which is just duplication of things I’ve criticized before, but there’s a few different tricks in this mess.
Before I jump in to look at a bit of it, I’d like to point out a general technique that’s used in this article. It’s *very* wordy. It rambles, it wanders off on tangents, it mixes quotes from various people into its argument in superfluous ways. The point of this seems to be to keep you, the reader, somewhat off balance; it’s harder to analyze an argument when the argument is so scattered around, and it’s easier to miss errors when the steps of the argument are separated by large quantities of cutesy writing. Because of this, the section I’m going to quote is fairly long; it’s the shortest I could find that actually contained enough of the argument I want to talk about to be coherent.
>The historical task assigned to this era is a double one: forming chains of
>nucleic acids from nucleotides, and discovering among them those capable of
>reproducing themselves. Without the first, there is no RNA; and without the
>second, there is no life.
>
>In living systems, polymerization or chain-formation proceeds by means of the
>cell’s invaluable enzymes. But in the grim inhospitable pre-biotic, no enzymes
>were available. And so chemists have assigned their task to various inorganic
>catalysts. J.P. Ferris and G. Ertem, for instance, have reported that activated
>nucleotides bond covalently when embedded on the surface of montmorillonite, a
>kind of clay. This example, combining technical complexity with general
>inconclusiveness, may stand for many others.
>
>In any event, polymerization having been concluded–by whatever means–the result
>was (in the words of Gerald Joyce and Leslie Orgel) “a random ensemble of
>polynucleotide sequences”: long molecules emerging from short ones, like fronds
>on the surface of a pond. Among these fronds, nature is said to have discovered
>a self-replicating molecule. But how?
>
>Darwinian evolution is plainly unavailing in this exercise or that era, since
>Darwinian evolution begins with self-replication, and self-replication is
>precisely what needs to be explained. But if Darwinian evolution is unavailing,
>so, too, is chemistry. The fronds comprise “a random ensemble of polynucleotide
>sequences” (emphasis added); but no principle of organic chemistry suggests
>that aimless encounters among nucleic acids must lead to a chain capable of
>self-replication.
>
>If chemistry is unavailing and Darwin indisposed, what is left as a mechanism?
>The evolutionary biologist’s finest friend: sheer dumb luck.
>
>Was nature lucky? It depends on the payoff and the odds. The payoff is clear:
>an ancestral form of RNA capable of replication. Without that payoff, there is
>no life, and obviously, at some point, the payoff paid off. The question is the
>odds.
>
>For the moment, no one knows how precisely to compute those odds, if only
>because within the laboratory, no one has conducted an experiment leading to a
>self-replicating ribozyme. But the minimum length or “sequence” that is needed
>for a contemporary ribozyme to undertake what the distinguished geochemist
>Gustaf Arrhenius calls “demonstrated ligase activity” is known. It is roughly
>100 nucleotides.
>
>Whereupon, just as one might expect, things blow up very quickly. As Arrhenius
>notes, there are 4100 or roughly 1060 nucleotide sequences that are 100
>nucleotides in length. This is an unfathomably large number. It exceeds the
>number of atoms contained in the universe, as well as the age of the universe
>in seconds. If the odds in favor of self-replication are 1 in 1060, no betting
>man would take them, no matter how attractive the payoff, and neither
>presumably would nature.
>
>”Solace from the tyranny of nucleotide combinatorials,” Arrhenius
>remarks in discussing this very point, “is sought in the feeling
>that strict sequence specificity may not be required through all
>the domains of a functional oligmer, thus making a large number of
>library items eligible for participation in the construction of the
>ultimate functional entity.” Allow me to translate: why assume that
>self-replicating sequences are apt to be rare just because they are long? They
>might have been quite common.
>They might well have been. And yet all experience is against it. Why
>should self-replicating RNA molecules have been common 3.6 billion
>years ago when they are impossible to discern under laboratory
>conditions today? No one, for that matter, has ever seen a ribozyme
>capable of any form of catalytic action that is not very specific in its
>sequence and thus unlike even closely related sequences. No one has ever seen a
>ribozyme able to undertake chemical action without a suite of enzymes in
>attendance. No one has ever seen anything like it.
>
>The odds, then, are daunting; and when considered realistically, they are even
>worse than this already alarming account might suggest. The discovery of a
>single molecule with the power to initiate replication would hardly be
>sufficient to establish replication. What template would it replicate against?
>We need, in other words, at least two, causing the odds of their joint
>discovery to increase from 1 in 1060 to 1 in 10120. Those two sequences would
>have been needed in roughly the same place. And at the same time. And organized
>in such a way as to favor base pairing. And somehow held in place. And buffered
>against competing reactions. And productive enough so that their duplicates
>would not at once vanish in the soundless sea.
>
>In contemplating the discovery by chance of two RNA sequences a mere 40
>nucleotides in length, Joyce and Orgel concluded that the requisite “library”
>would require 1048 possible sequences. Given the weight of RNA, they observed
>gloomily, the relevant sample space would exceed the mass of the earth. And
>this is the same Leslie Orgel, it will be remembered, who observed that “it was
>almost certain that there once was an RNA world.”
>
>To the accumulating agenda of assumptions, then, let us add two more: that
>without enzymes, nucleotides were somehow formed into chains, and that by means
>we cannot duplicate in the laboratory, a pre-biotic molecule discovered how to
>reproduce itself.
Ok. Lots of stuff there, huh? Let’s boil it down.
The basic argument is the good old *big numbers* argument. Berlinski wants to come up with some really whoppingly big numbers to make things look bad. So, he makes his first big numbers appeal by looking at polymer chains that could have self-replicated. He argues (not terribly well) that the minimum length for a self-replicating polymer is 100 nucleotides. From this, he then argues that the odds of creating a self-replicating chain is 1 in 1060.
Wow, that’s a big number. He goes to some trouble to stress just what a whopping big number it is. Yes Dave, it’s a big number. In fact, it’s not just a big number, it’s a bloody *huge* number. The shame of it is, it’s *wrong*; and what’s worse, he *knows* it’s wrong. Right after he introduces it, he quotes a biochemist who pointed out the fact that that’s stupid odds because there’s probably more than one replicator in there. In fact, we can be pretty certain that there’s more than one: we know lots of ways of modifying RNA/DNA chains that don’t affect their ability to replicate. How many of those 1060 cases self-replicate? We don’t know. Berlinski just handwaves. Let’s look again at how he works around that:
>They might well have been. And yet all experience is against it. Why should
>self-replicating RNA molecules have been common 3.6 billion years ago when they
>are impossible to discern under laboratory conditions today? No one, for that
>matter, has ever seen a ribozyme capable of any form of catalytic action that
>is not very specific in its sequence and thus unlike even closely related
>sequences. No one has ever seen a ribozyme able to undertake chemical action
>without a suite of enzymes in attendance. No one has ever seen anything like
>it.
So – first, he takes a jump away from the math, so that he can wave his hands around. Then he tries to strengthen the appeal to big numbers by pointing out that we don’t see simple self-replicators in nature today.
Remember what I said in my post about Professor Culshaw, the HIV-AIDS denialist? You can’t apply a mathematical model designed for one environment in another environment without changing the model to match the change in the environment. The fact that it’s damned unlikely that we’ll see new simple self-replicators showing up *today* is irrelevant to discussing the odds of them showing up billions of years ago. Why? Because the environment is different. In the days when a first self-replicator developed, there was *no competition for resources*. Today, any time you have the set of precursors to a replicator, they’re part of a highly active, competitive biological system.
And then, he goes back to try to re-invoke the big-numbers argument by making it look even worse; and he does it by using an absolutely splendid example of bad combinatorics:
>The odds, then, are daunting; and when considered realistically, they are even
>worse than this already alarming account might suggest. The discovery of a
>single molecule with the power to initiate replication would hardly be
>sufficient to establish replication. What template would it replicate against?
>We need, in other words, at least two, causing the odds of their joint
>discovery to increase from 1 in 1060 to 1 in 10120. Those
>two sequences would have been needed in roughly the same place. And at the same
>time. And organized in such a way as to favor base pairing. And somehow held in
>place. And buffered against competing reactions. And productive enough so that
>their duplicates would not at once vanish in the soundless sea.
The odds of one self-replicating molecule developing out of a soup of pre-biotic chemicals is, according to Berlinksi, 1 in 1060. But then, the replicator can’t replicate unless it has a “template” to replicate against; and the odds of that are also, he claims, 1 in 1060. Therefore, the probability of having both the replicator and the “template” is the product of the probabilities of either one, or 1 in 10120.
The problem? Oh, no biggie. Just totally invalid false-independence. The Bayesian product formulation probabilities only works if the two events are completely independent. They aren’t. If you’ve got a soup of nucleotides and polymers, and you get a self-replicating polymer, it’s in an environment where the “target template” is quite likely to occur. So the odds are *not* independent – and so you can’t use the product rule.
Oh, and he repeats the same error he made before: assuming that there’s exactly *one* “template” molecule that can be used for replication.
And even that is just looking at a tiny aspect of the mathematical part: the entire argument about a template is a strawman; no one argues that the earliest self-replicator could only replicate by finding another perfectly matched molecule of exactly the same size that it could reshape into a duplicate of itself.
Finally, he rehashes his invalid-model argument: because we don’t see primitive self-replicators in todays environment, that must mean that they were unlikely in a pre-biotic environment.
This is what mathematicians call “slop”. A pile of bad reasoning based on fake numbers pulled out of thin air, leading to assertions based on the presence of really big numbers. All of which is what you expect from an argument that deliberately uses wrong numbers, invalid combinatorics, and misapplication of models. It’s hard to imagine what else he could have gotten wrong.

Vacation Time

I’m leaving on vacation today. I’ll be away for a week, with intermittent internet access. And even when I have access, I doubt I’ll have much time to do blog-related stuff. I’ve scheduled a bunch of reposts of some of my favorite posts from the early days of _Goodmath, Badmath_ back when it lived on Blogger, so there’ll be some fun stuff continuing to appear here even while I’m away.
Have a nice week!

Poincare, Perelman, and Prizes

About 10 days ago, I wrote about [Grigory Perelman and his proof of the Poincare conjecture][poincare]. This is a quick followup. There’s a more detailed story over on [Seed][seed].
The Fields medal was supposed to be presented this past week, and they planned on presenting it to Perelman.
He turned it down. He refused to come to the conference where the award was presented; refused to accept the award in absentia. He wants nothing to do with it. Even a personal visit from the head of the Fields committee to his mothers apartment in St. Petersburg wasn’t enough to convince him to come out of isolation and accept the prize.
He’s also refusing the $1 million Clay award; a bounty put forward to be collected by whoever eventually either proved or disproved the Poincare conjecture.
[seed]: http://www.seedmagazine.com/news/2006/08/not_feeling_the_fields.php
[poincare]: http://scienceblogs.com/goodmath/2006/08/the_poincar_conjecture.php