Author Archives: markcc

Arabic numerals have nothing to do with angle counting!

There’s an image going around that purports to explain the origin of the arabic numerals. It’s cute. It claims to show why the numerals that we use look the way that they do. Here it is:

According to this, the shapes of the numbers was derived from a notation where for each numeral contains its own number of angles. It’s a really interesting idea, and it would be really interesting if it were true. The problem is, it isn’t.

Look at the numerals in that figure. Just by looking at them, you can see quite a number of problems with them.

For a couple of obvious examples:

  • Look at the 7. The crossed seven is a recent invention made up to compensate for the fact that in cursive roman lettering, it can be difficult to distinguish ones from sevens, the mark was added to clarify. The serifed foot on the 7 is even worse: there’s absolutely no tradition of writing a serifed foot on the 7; it’s just a font decoration. The 7’s serifed foot is no more a part of the number than serifed foot on the lowercase letter l is an basic feature of the letter ls.
  • Worse is the curlique on the 9: the only time that curly figures like that appear in writing is in calligraphic documents, where they’re an aesthetic flourish. That curly thing has never been a part of the number 9. But if you want to claim this angle-counting nonsense, you’ve got to add angles to a 9 somewhere. It’s not enough to just add a serifed foot – that won’t get you enough angles. So you need the curlique, no matter how obviously ridiculous it is.

You don’t even need to notice stuff like that to see that this is rubbish. We actually know quite a lot about the history of arabic numeral notation. We know what the “original” arabic numerals looked like. For example, this wikipedia image shows the standard arabic numerals (this variant is properly called the Bakshali numerals) from around the second century BC:

Bakhshali_numerals_2

It’s quite fascinating to study the origins of our numeric notation. It’s true that we – “we” meaning the scholarly tradition that grew out of Europe – learned the basic numeric notation from the Arabs. But they didn’t invent it – it predates them by a fair bit. The notation originally came from India, where Hindu scholars, who wrote in an alphabet derived from Sanskrit, used a sanskrit-based numeric notation called Brahmi numerals (which, in turn, were derived from an earlier notation, Karosthi numerals, which weren’t used quite like the modern numbers, so the Brahmi numerals are considered the earliest “true” arabic numeral.) That notation moved westward, and was adopted by the Persians, who spread it to the Arabs. As the arabs adopted it, they changed the shapes to work with their calligraphic notations, producing the Bakshali form.

In the Brahmi numerals, the numbers 1 through 4 are written in counting-based forms: one is written as one horizontal line; 2 as two lines; 3 as three lines. Four is written as a pair of crossed lines, giving four quadrants. 5 through 9 are written using sanskrit characters: their “original” form had nothing to do with counting angles or lines.

The real history of numerical notations is really interesting. It crosses through many different cultures, and the notations reform each time it migrates, keeping the same essential semantics, but making dramatic changes in the written forms of individual numerals. It’s so much more interesting – and the actual numeral forms are so much more beautiful – than you’d ever suspect from the nonsense of angle-counting.

My version of Tsukune: Japanese Chicken Meatballs

A few weeks ago, my wife took me out to dinner at an Izakaya that she found. If you don’t know about Izakayas, you really need to find one and try it. An Izakaya is basically a Japanese sake bar – it’s sort-of the Japanese equivalent of a Tapas bar – you order a variety of small dishes, and eat them with your sake. Izakayas are really terrific – they’re a nice informal place to have dinner with good sake, and a variety of delicious dishes.

One of the dishes we had was Tsukune, which is a standard at Izakayas. It’s basically a meatball made from ground chicken, and served drizzled with a sauce that’s a lot like a thick, rich teriyaki. These suckers were the tenderest, most succulent meatballs I’ve ever eaten. I was strongly motivated to try to reproduce them.

So I did some research. And it turned out that there were two tricks to how really good tsukune come out that way.

One part is what goes in: pretty much any part of the bird that can get finely minced. Breast, thigh, neck, legs, wings. If the cartilage is soft enough to be minced, it goes right in with the meat.

The other part is how it’s treated. The meat has the living hell beaten out of it. It’s finely minced, and then the ground meat is kneaded and pounded until it’s all soft and firmly stuck together. That process both binds the meat without adding any extra binding agents like eggs, and tenderizes it.

I really don’t like working with cartilage. I’ve tried it in some recipes in the past, and I find it really unpleasant. It’s slimy, and it tends to sometimes form sharp edges that can cut you. Getting cut by raw meat is not something I like doing.

So I decided to try to reproduce as much of the flavor as I could, using ingredients that I had on hand. And I tried couple of tricks to get the tender texture. It’s not as tender as a true Izakaya tsukune, but it was awfully good.

I relied on three things to get the tenderness I wanted.

  1. Force: I beat the living hell out of the meat. I didn’t do it by kneading; instead, I took chicken thighs, cut the meat off the bones, and then beat the crap out of it with the flat of my meat tenderizer. By the time I was done, I wouldn’t describe the result as a particular thickness; I’d say that it had been reduced to more of a chicken paste smeared across my cutting board.
  2. Chemistry: I took that, and put it into a flat bottomed baking dish, and poured a cup of apple juice over it. There’s an enzyme in fruits from the apple family that tenderizes meat, so I let that do some work.
  3. Slow heat: Finally, I browned the meatball quickly in a pan, and then transferred them to a baking dish with the sauce, and put them into a low oven for a long time.

The result was, honestly, not very much like the tsukune from the Izakaya. But it was really, really delicious. I mean really, seriously delicious – like some of the best meatballs I’ve ever eaten. Not quite as good as the Izakaya ones, and quite different – but so, so good.

It’s not a difficult dish to make, but it is time consuming. Not really that different from other meatball recipes I’ve made: meatballs are often the kind of dish that cooks all day.

The sauce cooks for a while on its own, and then cooks more with the meatballs in it. But the chicken needs to marinate for an hour or two – so you really need to start both around the same time.

The Sauce

The sauce is basically just a slightly amped up teriyaki sauce – but homemade is so much better than the crap you get out of a bottle, and it’s really easy to make.

Ingredients
  • White part of one scallion, minced.
  • 3 cloves finely minced garlic.
  • 1 inch section of fresh ginger, finely minced.
  • 1/3 cup soy sauce (ideally good quality shoyu!).
  • 1/2 cup sake
  • 1/4 cup mirin
  • 1/2 cup apple juice
  • 1 cup chicken stock.
  • 2 tablespoons brown sugar.
Instructions
  1. Heat a bit of oil in a pot. When it’s hot, add the garlic, ginger, and scallions, and cook until it’s nice and fragrant.
  2. Add the sake, and then let it reduce by half.
  3. Reduce the heat to medium-low, and add the rest of the liquids and the brown sugar.
  4. Simmer for about an hour.
  5. Remove from heat, and set aside.

The meatballs

Ingredients
  • 6 boneless chicken thighs
  • 1/4 onion, finely minced
  • 1/4 cup apple juice
  • 1/2 cup bread crumbs.
  • sliced scallion greens (the leftovers from the whites used in the sauce.)
  • flour
Instructions
  1. With a meat tenderizer, pound the heck out of the chicken thighs until they’re severely mangled and very thin.
  2. Put them into a dish with the apple juice, and put that into the refrigerator for an hour or two.
  3. Put the marinated thighs into the food processor with 1/4 cup of the sauce, and pulse a couple of times until they’re minced.
  4. Fold in the onion and the bread crumbs.
  5. Chill the ground meat for an hour until it firms up.
  6. Separate into about 30 meatballs – it should be around a heaped teaspoon per meatball.
  7. Roll each meatball in flour; flatten slightly, and then brown the two flat sides in a hot pan with a bit of oil. The meatballs will not be cooked through by this – you’re just browning the meatballs, not fully cooking them.
  8. Put the meatballs into a baking dish, cover with the sauce, and bake for 1 1/2 hours.
  9. Remove the meatballs from the sauce, and reduce the sauce until there’s just a cup or so left. Thicken with a bit of cornstarch.
  10. Pour the sauce back over the meatballs, and sprinkle with scallion greens.

Serve this on a bed of simple steamed rice, and some stir fried veggies on the side. Home cooking doesn’t get better than this.

Silly φ and π crackpottery

Over time, I’ve come to really, really hate the number φ.

φ is the so-called golden ratio. It’s the number that is a solution for the equation (a+b)/a = (a/b). The reason that that’s interesting at all is because it’s got an interesting property when you draw it out: if you take a rectangle where the ratio of the length of the sides is 1:φ, then if you remove the largest possible square from it, you’ll get another rectangle whose sides have the ratio φ:1. If you take the largest square from that, you’ll get a rectangle whose sides have the ratio 1:φ. And so on.

The numeric value of it is (1+sqrt(5))/2, or about 1.618033988749895.

The problem with φ is that people are convinced that it’s some kind of incredibly profound thing, and find it all over the place. The problem is, virtually all of the places where people claim to find it are total rubbish. A number that’s just a tiny bit more that 1 1/2 is really easy to find if you go looking for it, and people go looking for it all over the place.

People claim it’s in all sorts of artwork. You can certainly find a ton of things in paintings whose size ratio is about 1 1/2, and people find it and insist that it was deliberately done to make it φ. People find it in musical scales, the diatonic and pentatonic scales, and the indian scales.

People claim it comes up all over the place in nature: in beehives, ant colonies, flowers, tree sizes, tree-limb positions, size of herds of animals, litters of young, body shapes, face shapes.

People claim it’s key to architecture.

And yet… it seems like if you actually take any of those and actually start to look at it in detail? The φ isn’t there. It’s just a number that’s kinda-sorta in the 1 1/2 range.

One example of that: there’s a common claim that human faces have proportions based on &phi. You can see a bunch of that nonsense here. The thing is, the “evidence” for the claim consists of rectangles drawn around photographs of faces – and if you look closely at those rectangles, what you find is that the placement of the corners isn’t consistent. When you define, say, “the distance between the eyes”, you can measure that as distances between inner-edges, or between pupils, or between outer edges. Most of these claims use outer edges. But where’s the outer edge of an eye? It’s not actually a well-defined point. You can pick a couple of different places in a photo as “the” edge. They’re all close together, so there’s not a huge amount of variation. But if you can fudge the width a little bit, and you can fudge other facial measurements just a little bit, you’ve got enough variation that if you’re looking for two measurements with a ratio close to φ, you’ll always find one.

Most of the φ nonsense is ultimately aesthetic: people claiming that the golden ratio has a fundamental beauty to it. They claim that facial features match it because it’s intrinsically beautiful, and so people whose faces have φ ratios are more beautiful, and that that led to sexual-selection which caused our faces to embody the ratio. I think that’s bunk, but it’s hard to make a mathematical argument against aesthetics.

But then, you get the real crackpots. There are people who think φ has amazing scientific properties. In the words of the crank I’m writing about today, understanding φ (and the “correct” value of π derived from it) will lead humanity to “enter into a veritable Space Age”.

I’m talking about a guy who calls himself “Jain 108”. I’m not quite sure what to call him. Mr. Jain? Mr. 108? Dr 108? Most of the time on his website, he just refers to himself as “Jain” (or sometimes “Jain of Oz”) so I’ll go with “Jain”).

Jain believes that φ is the key to mathematics, science, art, and human enlightenment. He’s a bit hard to pin down, because most of his website is an advertisement for his books and seminars: if you want to know “the truth”, you’ve got to throw Jain some cash. I’m not willing to give money to crackpots, so I’m stuck with just looking at what he’s willing to share for free. (But I do recommend browsing around his site. It’s an impressive combination of newage scammery, pomposity, and cluelessness.)

What you can read for free is more than enough to conclude that he’s a total idiot.

I’m going to focus my mockery on one page: “Is Pi a Lie?”.

On this page, Jain claims to be able to prove that the well-known value of π (3.14159265….) is wrong. In fact, that value is wrong, and the correct value of π is derived from φ! The correct value of π is \frac{4}{\sqrt{\phi}}, or about 3.144605511029693.

For reasons that will be soon explained, traditional Pi is deficient because historically it has awkwardly used logical straight lines to measure illogical curvature. Thus, by using the highest level of mathematics known as Intuitive Maths, the True Value of Pi must be a bit more than anticipated to compensate for the mysterious “Area Under The Curve”. When this is done, the value, currently known as JainPi, = 3.144… can be derived, by knowing the precise Height of the Cheops Pyramid which is based on the Divine Phi Proportion (1.618…). Instead of setting our diameter at 1 unit or 1 square, something magical happens when we set the diameter at the diagonal length of a Double Square = 2.236… which is the Square Root of 5 (meaning 2.236… x 2.236… = 5). This is the critical part of the formula that derives Phi \frac{1+\sqrt{5}}{2}, and was used by ancient vedic seers as their starting point to construct their most important diagram or ‘Yantra’ or power-art called the Sri Yantra. With a Root 5 diameter, the translation of the Phi’s formula into a geometric construct derives the royal Maltese Cross symbol, concluding that Phi is Pi, that Phi generates Pi, and that Pi must be derived with a knowledge of the Harmonics of Phi. When this is understood and utilized, we will collectively enter into a veritable Space Age.

How did we get the wrong value? It’s based on the “fact” that the computation of π is based on the use of “logical” straight lines to measure “illogical” curvurature. (From just that one sentence, we can already conclude that Jain knows nothing about logic, except what he learned from Mr. Spock on Star Trek.) More precisely, according to Jain:

In all due good respects, we must first honour Archimedes of Syracuse 2,225 years ago, who gave the world his system on how to calculate Pi, approximated to 22÷7, by cutting the circle into say 16 slices of a pizza, and measuring the 16 edge lengths of these 16 triangular polygons (fig 3), to get a good estimate for the circumference of a circle. The idea was that if we kept making the slices of pizza smaller and smaller, by subsequently cutting the circle into 32 slices, then 64, then 128 then 256 slices, we would get a better and more accurate representation for the circumference. The Fundamental Flawed Logic or Error with Archimede’s Increasing Polygon Method was that he failed to measure The Area Under The Curve. In fact, he assumed that The Area Under The Curve, just magically disappeared. Even in his time, Archimedes admitted that his value was a mere estimate!

This explanation does a beautiful job of demonstrating how utterly ignorant Jain is of math. Archimedes may have been the first person from the western tradition to have worked out a mechanism to compute a value for π – and his mechanism was a good one. But it’s far from the only one. But let’s ignore that for a moment. Jain’s supposed critique, if true, would mean that modern calculus doesn’t work. The wedge-based computation of π is a forerunner of the common methods of calculus. In reality, when we compute the value of almost any integral using calculus, our methods are based on the concept of drawing rectangles under the curve, and narrowing those rectangles until they’re infinitely small, at which point the “area under the curve” missed by the rectangles becomes zero. If the wedge computation of π is wrong because it misses are under the curve, then so will every computation using integral calculus.

Gosh, think we would have noticed that by now?

Let’s skip past that for a moment, and come back to the many ways that π comes into reality. π is the ratio of the diameter of a circle to its radius. Because circles are such a basic thing, there are many ways of deriving the value of π that come from its fundamental nature. Many of these have no relation to the wedge-method that Jain attributes to Archimedes.

For example, there is Viete’s product:

\frac{2}{\pi} = \left(\frac{\sqrt{2}}{2}\right)\left(\frac{\sqrt{2 + \sqrt{2}}}{2}\right)\left(\frac{\sqrt{2 + \sqrt{2 + \sqrt{2}}}}{2}\right)(...)

Or there’s the Gregory-Leibniz series:

\frac{\pi}{4} = 1 - \frac{1}{3} + \frac{1}{5} - \frac{1}{7} + \frac{1}{9} - ...

These have no relation to the wedge-method – they’re derived from the fundamental nature of π. And all of them produce the same value – and it’s got no connection at all to φ.

As supportive evidence for the incorrectness of π, Jain gives to apocryphal stories about NASA and the moon landings. First, he claims that the first moon landing was off by 20 kilometers, and that the cause of this was an incorrect value of π: that the value of π used in computing trajectories was off by 0.003:

NASA admitted that when the original Mooncraft landing occurred, the targeted spot was missed by about 20km?
What could have been wrong with the Calculations?
NASA subsequently adjusted their traditional mathematical value for Pi (3.141592…) by increasing it in the 3rd decimal by .003!

Let’s take just a moment, and consider that.

It’s a bit difficult to figure out how to address that, because he’s not mentioning what part of the trajectory was messed up. Was it the earth-to-moon transit of the full apollo system? Or was it the orbit-to-ground flight of the lunar lander? Since he doesn’t bother to tell us, we’ll look at both.

π does matter when computing the trajectory of the earth-to-moon trip – because it involves the intersection of two approximate circles – the orbit of the earth around the sun, and the orbit of the moon around the earth. (Both of these are approximations, but they’re quite useful ones; the apollo trajectory computations did rely on a value for π.

Let’s look at earth-to-moon. I’m going to oversimplify ridiculously – but I’m just trying to give us a ballpark order-of-magnitude guess as just how much of a difference Mr. Jain’s supposed error would cause. THe distance from the earth to the moon is about 384,000 kilometers. If we assume that π is a linear factor in the computation, then a difference in the value of pi of around 1 part in 1000 would cause a difference in distance computations of around 384 kilometers. Mr. Jain is alleging that the error only caused a difference of 20 kilometers. He’s off by a factor of 15. We can hand-wave this away, and say that the error that caused the lander to land in the “wrong” place wasn’t in the earth-moon trajectory computation – but we’re still talking about the apollo unit being in the wrong place by hundreds of kilometers – and no one noticing.

What if the problem was in the computation of the trajectory the lander took from the capsule to the surface of the moon? The orbit was a nearly circular one at about 110 kilometers above the lunar surface. How much of an error would the alleged π difference cause? About 0.1 kilometer – that is, about 100 meters. Less than what Jain claims by a factor of 200.

The numbers don’t work. These aren’t precise calculations by any stretch, but they’re ballpark. Without Jain providing more information about the alleged error, they’re the best we can do, and they don’t make sense.

Jain claims that in space work, scientists now use an adjusted value of π to cover the error. This piece I can refute by direct knowledge. My father was a physicist who worked on missiles, satellites, and space probes. (He was part of the Galileo team.) They used good old standard 3.14159 π. In fact, he explained how the value of π actually didn’t need to be that precise. In satellite work, you’re stuck with the measurement problems of reality. In even the highest precision satellite work, they didn’t use more that 4 significant digits of precision, because the manufacturing and measurement of components was only precise to that scale. Beyond that, it was always a matter of measure and adjust. Knowing that π was 3.14159265356979323 was irrelevant in practice, because anything beyond “about 3.1416” was smaller that the errors in measurement.

Mr. Jain’s next claim is far worse.

Also, an ex-Engineer from NASA, “Smokey” admitted (via email) that when he was making metal cylinders for this same Mooncraft, finished parts just did not fit perfectly, so an adjusted value for Pi was also implemented. At the time, he thought nothing about it, but after reading an internet article called The True Value of Pi, by Jain 108, he made contact.

This is very, very simple to refute by direct experience. This morning, I got up, shaved with an electric razor (3 metal rotors), made myself iced coffee using a moka pot (three round parts, tight fitted, with circular-spiral threading). After breakfast, I packed my backpack and got in my car to drive to the train. (4 metal cylinders with 4 precisely-fitted pistons in the engine, running on four wheels with metal rims, precisely fitted to circular tires, and brakes clamping on circular disks.) I drove to the train station, and got on an electric train (around 200 electric motors on the full train, with circular turbines, driving circular wheels).

All those circles. According to Jain, every one of those circles isn’t the size we think it is. And yet they all fit together perfectly. According to Jain, every one of those circular parts is larger that we think it should be. To focus on one thing, every car engine’s pistons – every one of the millions of pistons created every year by companies around the world – requires more metal to produce than we’d expect. And somehow, in all that time, no one has ever noticed. Or if they’ve noticed, every single person who ever noticed it has never mentioned it!

It’s ludicrous.

Jain also claims that the value of e is wrong, and comes up with a cranky new formula for computing it. Of course, the problem with e is the same as the problem wiht π: in Jain’s world, it’s really based on φ.

In Jain’s world, everything is based on φ. And there’s a huge, elaborate conspiracy to keep it secret. Any Jain will share the secret with you, showing you how everything you think you know is wrong. You just need to buy his books ($77 for a hard-copy, or $44 for an ebook.) Or you could pay for him to travel to you and give you a seminar. But he doesn’t list a price for that – you need to send him mail to inquire.

Skirt Steak with Shoyu Gastrique

I made an amazing dinner tonight, out of a total ad-lib. So I want to write it down so I’ll remember it. This turned out to be the best steak I’ve ever made.

The Steak

  • 1 pound skirt steak
  • Kombu leaves
  • 1 teaspoon finely minced garlic
  • 1 teaspoon finely minced ginger
  • 1/4 cup soy sauce
  • 1/4 cup mirin
  • 1/4 cup sake
  1. Mix together the liquid ingredents in a baking dish. dip the steak into it, to coat both sides.
  2. Press kombu leaves into both sides of the steak, and then put it in the dish, and leave it in the fridge for a couple of hours.
  3. Take the steak out, pat dry.
  4. Toss the steak on a very hot grill, and cook for 2 minutes on one side, and about one minute on the other.
  5. Let rest for a few minutes, then slice on the bias. (Save any juices that drip off while it’s resting, and add them to the sauce!)

The Sauce

  • 1/4 cup sugar
  • 1/4 cup water
  • 1/4 cup apple cider vinegar
  • 1/4 cup sake
  • 1 tablespoon shoyu
  1. Mix together the sugar and water, and cook on high heat, whisking until the sugar is dissolved.
  2. Keep cooking on high heat until all the water evaporates, the sugar starts to caramelize, and turns amber.
  3. Add the cider and sake, and reduce the heat. Then whisk vigorously until everything is dissolved. (When you add the liquids to the caramel, it will seize up; you need to whisk until it’s redissolved.
  4. Turn off the heat, then add the shoyu and any juices that dripped off the steak.

That’s it. Best damned steak ever.

Truth in Type Theory

Now, we’re getting to the heart of type theory: judgements. Judgements are the basic logical statements that we want to be able to prove about computations in type theory. There’s a basic set of judgements that we want to be able to make.

I keep harping on this, but it’s the heart of type theory: type theory is all about understanding logical statements as specifications of computations. Or, more precisely, in computer science terms, they’re about understanding true logical statements as halting computations.

In this post, we’ll see the ultimate definition of truth in type theory: every logical proposition is a set, and the proposition is true if the set has any members. A non-halting computation is a false statement – because you can never get it to resolve an expression to a canonical value.

So remember as we go through this: judgements are based on the idea of logical statements as specifications of computations. So when we talk about a predicate P, we’re using its interpretation as a specification of a computation. When we look at an expression 3+5, we understand it not as a piece of notation that describes the number 8, but as a description of a computation that adds 3 to 5. “3+5” not the same computation as “2*4” or “2+2+2+2”, but as we’ll see, they’re equal because they evaluate to the same thing – that is, they each perform a computation that results in the same canonical value – the number 8.

In this post, we’re going to focus on a collection of canonical atomic judgement types:

A \text{set}
This judgement says that A is a set.
A = B
A and Bare equal sets.
a \in A
a is an element of the set A
a_1 == a_2 \in A
a_1 and a_2 are equal members of the set A.
A is a proposition.
The proposition A is true.

The definition of the meanings of judgements is, necessarily, mutually recursive, so we’ll have to go through all of them before any of them is complete.

An object A is a Set
When I say that A is a set in type theory, that means that:

  • I know the rules for how to form the canonical expressions for the set;
  • I’ve got an equivalence relation that says when two canonical members of the set are equal.
Two Sets are Equal

When I say that A and B are equal sets, that means:

  • A and B are both sets.
  • If a is a canonical member of A, then a is also a canonical member of B, and vice versa.
  • If a and b are canonical members of A, and they’re also equal in A, then a and b are also equal canonical members of B (and vice versa).

The only tricky thing about the definition of this judgement is the fact that it defines equality is a property of a set, not of the elements of the set. By this definition, tt’s possible for two expressions to be members of two different sets, and to be equal in one, but not equal in the other. (We’ll see in a bit that this possibility gets eliminated, but this stage of the definition leaves it open!)

An object is a member of a set
When I say a \in A, that means that if I evaluate a, the result will be a canonical member of A.
Two members of a set are equal
If a \in A and b \in A and a = b, that means when if you evaluate a, you’ll get a canonical expression a; and when you evaluate b, you’ll get a canonical expression b. For a == b to be true, then a, that is, the canonical expressions resulting from their evaluation must also be equal.

This nails down the problem back in set equivalence: since membership equivalence is defined in terms of evaluation as canonical values, and every expression evaluates to exactly one canonical expression (that’s the definition of canonical!), then if two objects are equal in a set, they’re equal in all sets that they’re members of.

An object A is a proposition
Here’s where type theory really starts to depart from the kind of math that we’re used to. In type theory, a proposition is a set. That’s it: to be a proposition, A has to be a set.
The proposition A is true
And the real meat of everything so far: if we have a proposition A, and A is true, what that means is that a has at least one element. If a proposition is a non-empty set, then it’s true. If it’s empty, it’s false.

Truth in type theory really comes down to membership in a set. This is, subtly, different from the predicate logic that we’re familiar with. In predicate logic, a quantified proposition can, ultimately, be reduced to a set of values, but it’s entirely reasonable for that set to be empty. I can, for example, write a logical statement that “All dogs with IQs greater that 180 will turn themselves into cats.” A set-based intepretation of that is the collection of objects for which it’s true. There aren’t any, because there aren’t any dogs with IQs greater than 180. It’s empty. But logically, in terms of predicate logic, I can still say that it’s “true”. In type theory, I can’t: to be true, the set has to have at least one value, and it doesn’t.

In the next post, we’ll take these atomic judgements, and start to expand them into the idea of hypothetical judgements. In the type-theory sense, that means statements require some other set of prior judgements before they can be judged. In perhaps more familiar terms, we’ll be looking at type theory’s equivalent of the contexts in classical sequent calculus – those funny little \Gammas that show up in all of the sequent rules.

Weekend Recipe: 3-cup chicken

This is a traditional chinese dish that my wife grew up eating in Taiwan. For some reason, she never told me about it, until she saw an article with a recipe in the NY Times. Of course, I can’t leave recipes alone; I always put my own spin on it. And the recipe in the article had some (in my opinion) glaring problems. For example, it called for cooking with sesame oil. Sesame oil is a seasoning, not a cooking oil. It’s got a very strong flavor, and it burns at stir-fry temperature, which makes any dish cooked in it taste absolutely awful. You cook in neutral oils with high smoke points, like peanut, canola, or soybean; and then you add a drop of sesame as part of the sauce, so that it’s moderated and doesn’t burn. Anyway, below is my version of the dish.

  • 2 pounds of chicken thighs, cut into bite-sized pieces.
  • About 8 large cloves of garlic, thickly sliced.
  • About a 1-inch section of fresh ginger, cut into disks.
  • 5 whole dried szechuan chili peppers (or more, if you like those lovely things!)
  • A good bunch of thai basil leaves, removed from the stems, but left whole. (About a cup, if it’s packed pretty tight. Don’t skimp – these are the best part of the dish!)
  • 4 scallions, thinly sliced, whites and greens separated.
  • 1/3 cup soy sauce.
  • 1/4 cup mirin
  • 1/2 cup sake
  • 1 tablespoon sugar
  • 1 teaspoon cornstarch, dissolved in water.
  • 1/4 teaspoon sesame oil (just a drop, for flavor).
  • Enough canola oil (or similarly bland, high-smoke-point cooking oil) to took – a couple of tablespoons at most.
  1. Get your wok smoking hot. Add enough oil to coat the bottom, and swirl it around.
  2. Add in half of the chicken, and cook until it’s nicely browned, then remove it. (It won’t be cooked all the way through yet, don’t worry!)
  3. Repeat with the other half of the chicken.
  4. Make sure there’s enough oil in the bottom of the wok, then toss in the garlic, ginger, chili peppers, and scallion whites. Stir fry them until the garlic starts to get just a little bit golden.
  5. Add the chicken back in, and add the soy, mirin, sake, and sugar. Get it boiling, and keep stirring things around until the chicken is cooked through.
  6. Add the basil and scallions, and keep stirring until the basil wilts, and the whole thing smells of that wonderful thai basic fragrance.
  7. Add the cornstarch and sesame oil, and cook until the sauce starts to thicken.
  8. Remove it from the heat, and serve on a bed of white rice, along with some simple stir-fried vegetables. (I used a batch of beautiful sugar-snap peas, quickly stir fried with just a bit of garlic, and a bit of soy sauce.)

A couple of notes on ingredients:

  • This is a dish where the soy sauce matters. Don’t use cheap generic american soy sauce; that stuff is just saltwater with food coloring. For some things, that’s actually OK. But in this dish, it’s the main flavor of the sauce, so it’s important to use something with a good flavor. Get a good quality chinese soy (I like Pearl River Bridge brand), or a good japanese shoyu.
  • For the sugar, if you’ve got turbinado (or even better, real chinese rock sugar), use that. If not, white sugar is Ok.
  • Definitely try to get thai basil. It’s very different from italian basil – the leaves are thinner (which makes them much easier to eat whole, as you do in this dish), and they’ve got a very different flavor – almost like Italian basic mixed with a bit of anise and a bit of menthol. It’s one of my favorite herbs, and it’s actually gotten pretty easy to find.
  • Szechuan peppers can be hard to find – you pretty much need to go to an Asian grocery. They’re worth it. They’ve got a very distinctive flavor, and I don’t know of any other dried pepper that works in a sauce like them. You don’t actually eat the peppers – the way you cook them, they actually burn a bit – but they bloom their flavor into the oil that you use to cook the rest of the dish, and that totally changes the sauce.

Canonical Expressions in Type Theory

Sorry for the gap in posts. I’ve been trying to post more regularly, and was just hitting a rhythm, when my son brought home a particularly vicious bug, and I got sick. I’ve spent the last couple of weeks being really, really sick, and then trying to get caught up at work. I’m mostly recovered, except for some lingering asthma, so I’m trying to get back to that twice-per-week posting schedule.

In the last couple of posts, we looked at Martin-Löf’s theory of expressions. The theory of expressions is purely syntactic: it’s a way of understanding the notation of expressions. Everything that we do in type theory will be written with expressions that follow the syntax, the arity, and the definitional equivalency rules of expression theory.

The next step is to start to understand the semantics of expressions. In type theory, when it comes to semantics, we’re interested in two things: evaluation and judgements. Evaluation is the process by which an expression is reduced to its simplest form. It’s something that we care about, but it’s not really a focus of type theory: type theory largely waves its hands in the air and says “we know how to do this”, and opts for normal-order evaluation. Judgements are where things get interesting.

Judgements are provable statements about expressions and the values that they represent. As software people, when we think about types and type theory, we’re usually thinking about type declarations: type declarations are judgements about the expressions that they apply to. When you write a type declaration in a programming language, what you’re doing is asserting the type theory judgement. When the compiler “type-checks” your program, what it’s doing in type theory terms is checking that your judgements are proven by your program.

For example, we’d like to be able to make the judgement A set – that is, that A is a set. In order to make the judgement that A is a set in type theory, we need to know two things:

  1. How are canonical instances of the set A formed?
  2. Given two canonical instances of A, how can we determine if they’re equal?

To understand those two properties, we need to take a step back. What is a canonical instance of a set?

If we think about how we use predicate logic, we’re always given some basic set of facts as a starting point. In type theory, the corresponding concept is a primitive constant. The primitive constants include base values and primitive functions. For example, if we’re working with lispish expressions, then cons(1, cons(2, nil)) is an expression, and cons, nil, 1 and 2 are primitive constants; cons is the head of the expression, and 1 and cons(2, nil) are the arguments.

A canonical expression is a saturated, closed expression whose head is a primitive constant.

The implications of this can be pretty surprising, because it means that a canonical expression can contain unevaluated arguments! The expression has to be saturated and closed – so its arguments can’t have unbound variables, or be missing parameters. But it can contain unevaluated subexpressions. For example, if we were working with Peano arithmetic in type theory, succ(2+3) is canonical, even though “2+3” hasn’t be evaluated.

In general, in type theory, the way that we evaluate an expression is called normal order evaluation – what programming language people call lazy evaluation: that’s evaluating from the outside in. Given a non-canonical expression, we evaluate from the outside in until we get to a canonical expression, and then we stop. A canonical expression is considered the result of a computation – so we can see succ(2+3) as a result!

A canonical expression is the evaluated form of an expression, but not the fully evaluated form. The fully evaluated form is when the expression and all of its saturated parts are fully evaluated. So in our previous example, the saturated part 2+3 wasn’t evaluated, so it’s not fully evaluated. To get it to be fully evaluated, we’d need to evaluate 2+3, giving us succ(5); then, since succ(5) is saturated, it’s evaluated to 6, which is the fully evaluated form of the expression.

Next post (coming monday!), we’ll use this new understanding of canonical expressions, and start looking at judgements, and what they mean. That’s when type theory starts getting really fun and interesting.

The Bad Logic of Good People Can’t be Sexists

One of my constant off-topic rants around here is about racism and sexism. This is going to be a nice little post that straddles the line. It’s one of those off-topic-ish rants about sexism in our society, but it’s built around a core of bad logic – so there is a tiny little bit of on-topicness.

We live in a culture that embodies a basic conflict. On one hand, racism and sexism are a deeply integrated part of our worldview. But on the other wand, we’ve come to believe that racism and sexism are bad. This puts us into an awkward situation. We don’t want to admit to saying or doing racist things. But there’s so much of it embedded in every facet of our society that it takes a lot of effort and awareness to even begin to avoid saying and doing racist things.

The problem there is that we can’t stop being racist/sexist until we admit that we are. We can’t stop doing sexist and racist things until we admit that we do sexist and racist things.

And here’s where we hit the logic piece. The problem is easiest to explain by looking at it in formal logical terms. We’ll look at it from the viewpoint of sexism, but the same argument applies for racism.

  1. We’ll say \text{Sexist}(x) to mean that “x” is sexist.
  2. We’ll say \text{Bad}(x) to mean that x is bad, and \text{Good}(x) to mean that x is good.
  3. We’ll have an axiom that bad and good are logical opposites: \text{Bad}(x) \Leftrightarrow \lnot \text{Good}(x).
  4. We’ll have another axiom that sexism is bad: \forall x: \text{Sexist}(x) \Rightarrow \text{Bad}(x).
  5. We’ll say \text{Does}(p, x) means that person p does an action x.

The key statement that I want to get to is: We believe that people who do bad things are bad people: \forall p, x: \text{Does}(p, x) \land \text{Bad}(x) \Rightarrow \text{Bad}(p).

That means that if you do something sexist, you are a bad person:

  • s is a sexist action: \text{Sexist}(s).
  • I do something sexist: \text{Does}(\textbf{markcc}, s).
  • By rule 5 above, that means that I am sexist.
  • If I am sexist, then by rule 4 above, I am bad.

We know that we aren’t bad people: I’m a good person, right? So we reject that conclusion. I’m not bad; therefore, I can’t be sexist, therefore whatever I did couldn’t have been sexist.

This looks shallow and silly on the surface. Surely mature adults, mature educated adults couldn’t be quite that foolish!

Now go read this.

If his crime was to use the phrase “boys with toys”, and that is your threshold for sexism worthy of some of the abusive responses above, then ok – stop reading now.

My problem is that I have known Shri for many years, and I don’t believe that he’s even remotely sexist. But in 2015 can one defend someone who’s been labeled sexist without a social media storm?

Are people open to the possibility that actually Kulkarni might be very honourable in his dealings with women?

In an interview a week or so ago, Professor Shri Kulkarni said something stupid and sexist. The author of that piece believes that Professor Kulkarni couldn’t have said something sexist, because he knows him, and he knows that he’s not sexist, because he’s a good guy who treats women well.

The thing is, that doesn’t matter. He messed up, and said something sexist. It’s not a big deal; we all do stupid things from time to time. He’s not a bad person because he said something sexist. He just messed up. People are, correctly, pointing out that he messed up: you can’t fix a problem until you acknowledge that it exists! When you say something stupid, you should expect to get called on it, and when you do, you should accept it, apologize, and move on with your life, using that experience to improve yourself, and not make the mistake again.

The thing about racism and sexism is that we’re immersed in it, day in and day out. It’s part of the background of our lives – it’s inescapable. Living in that society means means that we’ve all absorbed a lot of racism and exism without meaning to. We don’t have to like that, but it’s true. In order to make things better, we need to first acklowledge the reality of the world that we live in, and the influence that it has on us.

In mathematical terms, the problem is that good and bad, sexist and not sexist, are absolutes. When we render them into pure two-valued logic, we’re taking shades of gray, and turning them into black and white.

There are people who are profoundly sexist or racist, and that makes them bad people. Just look at the MRAs involved in Gamergate: they’re utterly disgusting human beings, and the thing that makes them so despicably awful is the awfulness of their sexism. Look at a KKKer, and you find a terrible person, and the thing that makes them so terrible is their racism.

But most people aren’t that extreme. We’ve just absorbed a whole lot of racism and sexism from the world we’ve lived our lives in, and that influences us. We’re a little bit racist, and that makes us a little bit bad – we have room for improvement. But we’re still, mostly, good people. The two-valued logic creates an apparent conflict where none really exists.

Where do these sexist/racist attitudes come from? Picture a scientist. What do you see in your minds eye? It’s almost certainly a white guy. It is for me. Why is that?

  1. In school, from the time that I got into a grade where we had a dedicated science teacher, every science teacher that I had was a white guy. I can still name ’em: Mr. Fidele, Mr. Schwartz, Mr. Remoli, Mr. Laurie, Dr. Braun, Mr. Hicken, etc.
  2. On into college, in my undergrad days, where I took a ton of physics and chemistry (I started out as an EE major), every science professor that I had was a white guy.
  3. My brother and I used to watch a ton of trashy sci-fi movie some free movie apps from the internet. In those movies, every time there was a character who was a scientist, he was a white guy.
  4. My father was a physicist working in semiconductor manufacturing for satellites and military applications. From the time I was a little kid until they day he retired, he had exactly one coworker who wasn’t a white man. (And everyone on his team complained bitterly that the black guy wasn’t any good, that he only got and kept the job because he was black, and if they tried to fire him, he’d sue them. I really don’t believe that my dad was a terrible racist person; I think he was a wonderful guy, the person who is a role model for so much of my life. But looking back at this? He didn’t mean to be racist, but I think that he was.)

In short, in all of my exposure to science, from kindergarten to graduate school, scientists were white men. (For some reason, I encountered a lot of women in math and comp sci, but not in the traditional sciences.) So when I picture a scientist, it’s just natural that I picture a man. There’s a similar story for most of us who’ve grown up in the current American culture.

When you consider that, it’s both an explanation of why we’ve got such a deeply embedded sexist sense about who can be a scientist, and an explanation how, despite the fact that we’re not deliberately being sexist, our subconscious sexism has a real impact.

I’ve told this story a thousand times, but during the time I worked at IBM, I ran the intership program for my department one summer. We had a deparmental quota of how many interns each department could pay for. But we had a second program that paid for interns that were women or minority – so they didn’t count against the quota. The first choice intern candidate of everyone in the department was a guy. When we ran out of slots, the guy across the hall from me ranted and raved about how unfair it was. We were discriminating against male candidates! It was reverse sexism! On and on. But the budget was what the budget was. Two days later, he showed up with a resume for a young woman, all excited – he’d found a candidate who was a woman, and she was even better than the guy he’d originally wanted to hire. We hired her, and she was brilliant, and did a great job that summer.

The question that I asked my office-neighbor afterwards was: Why didn’t he find the woman the first time through the resumes? He went through the resumes of all of the candidates before picking the original guy. The woman that he eventually hired had a resume that was clearly better than the guy. Why’d he pass her resume to settle on the guy? He didn’t know.

That little story demonstrates two things. One, it demonstrates the kind of subconscious bias we have. We don’t have to be mustache-twirling black-hatted villains to be sexists or racists. We just have to be human. Two, it demonstrates the way that these low-level biases actually harm people. Without our secondary budget for women/minority hires, that brilliant young woman would never have gotten an internship at IBM; without that internship, she probably wouldn’t have gotten a permanent job at IBM after graduation.

Professor Kulkarni said something silly. He knew he was saying something he shouldn’t have, but he went ahead and did it anyway, because it was normal and funny and harmless.

It’s not harmless. It reinforces that constant flood of experience that says that all scientists are men. If we want to change the culture of science to get rid of the sexism, we have to start with changing the deep attitudes that we aren’t even really aware of, but that influence our thoughts and decisions. That means that when someone says we did something sexist or racist, we need to be called on it. And when we get called on it, we need to admit that we did something wrong, apologize, and try not to make the same mistake again.

We can’t let the black and white reasoning blind us. Good people can be sexists or racists. Good people can do bad things without meaning to. We can’t allow our belief in our essential goodness prevent us from recognizing it when we do wrong, and making the choices that will allow us to become better people.

Bad Comparisons with Statistics

When a friend asks me to write about something, I try do it. Yesterday, a friend of mine from my Google days, Daniel Martin, sent me a link, and asked to write about it. Daniel isn’t just a former coworker of mine, but he’s a math geek with the same sort of warped sense of humor as me. He knew my blog before we worked at Google, and on my first Halloween at Google, he came to introduce himself to me. He was wearing a purple shirt with his train ticket on a cord around his neck. For those who know any abstract algebra, get ready to groan: he was purple, and he commuted. He was dressed as an Abelian grape.

Anyway, Daniel sent me a link to this article, and asked me to write about the error in it.

The real subject of the article involves a recent twitter-storm around a professor at Boston University. This professor tweeted some about racism and history, and she did it in very blunt, not-entirely-professional terms. The details of what she did isn’t something I want to discuss here. (Briefly, I think it wasn’t a smart thing to tweet like that, but plenty of white people get away with worse every day; the only reason that she’s getting as much grief as she is is because she dared to be a black woman saying bad things about white people, and the assholes at Breitbart used that to fuel the insatiable anger and hatred of their followers.)

But I don’t want to go into the details of that here. Lots of people have written interesting things about it, from all sides. Just by posting about this, I’m probably opening myself up to yet another wave of abuse, but I’d prefer to avoid and much of that as I can. Instead, I’m just going to rip out the introduction to this article, because it makes a kind of incredibly stupid mathematical argument that requires correction. Here are the first and second paragraphs:

There aren’t too many African Americans in higher education.

In fact, black folks only make up about 4 percent of all full time tenured college faculty in America. To put that in context, only 14 out of the 321—that’s about 4 percent—of U.S. astronauts have been African American. So in America, if you’re black, you’ve got about as good a chance of being shot into space as you do getting a job as a college professor.

Statistics and probability can be a difficult field of study. But… a lot of its everyday uses are really quite easy. If you’re going to open your mouth and make public statements involving probabilities, you probably should make sure that you at least understand the first chapter of “probability for dummies”.

This author doesn’t appear to have done that.

The most basic fact of understanding how to compare pretty much anything numeric in the real world is that you can only compare quantities that have the same units. You can’t compare 4 kilograms to 5 pounds, and conclude that 5 pounds is bigger than 4 kilograms because 5 is bigger than four.

That principle applies to probabilities and statistics: you need to make sure that you’re comparing apples to apples. If you compare an apple to a grapefruit, you’re not going to get a meaningful result.

The proportion of astronauts who are black is 14/321, or a bit over 4%. That means that out of every 100 astronauts, you’d expect to find four black ones.

The proportion of college professors who are black is also a bit over 4%. That means that out of every 100 randomly selected college professors, you’d expect 4 to be black.

So far, so good.

But from there, our intrepid author takes a leap, and says “if you’re black, you’ve got about as good a chance of being shot into space as you do getting a job as a college professor”.

Nothing in the quoted statistic in any way tells us anything about anyone’s chances to become an astronaut. Nothing at all.

This is a classic statistical error which is very easy to avoid. It’s a unit error: he’s comparing two things with different units. The short version of the problem is: he’s comparing black/astronaut with astronaut/black.

You can’t derive anything about the probability of a black person becoming an astronaut from the ratio of black astronauts to astronauts.

Let’s pull out some numbers to demonstrate the problem. These are completely made up, to make the calculations easy – I’m not using real data here.

Suppose that:

  • the US population is 300,000,000;
  • black people are 40% of the population, which means that there are are 120,000,000 black people.
  • there are 1000 universities in America, and there are 50 faculty per university, so there are 50,000 university professors.
  • there are 50 astronauts in the US.
  • If 4% of astronauts and 4% of college professors are black, that means that there are 2,000 black college professors, and 2 black astronauts.

In this scenario, as in reality, the percentage of black college professors and the percentage of black astronauts are equal. What about the probability of a given black person being a professor or an astronaut?

The probability of a black person being a professor is 2,000/120,000,000 – or 1 in 60,000. The probability of a black person becoming an astronaut is just 2/120,000,000 – or 1 in 60 million. Even though the probability of a random astronaut being black is the same as a the probability of a random college professor being black, the probability of a given black person becoming a college professor is 10,000 times higher that the probability of a given black person becoming an astronaut.

This kind of thing isn’t rocket science. My 11 year old son has done enough statistics in school to understand this problem! It’s simple: you need to compare like to like. If you can’t understand that, if you can’t understand your statistics enough to understand their units, you should probably try to avoid making public statements about statistics. Otherwise, you’ll wind up doing something stupid, and make yourself look like an idiot.

(In the interests of disclosure: an earlier version of this post used the comparison of apples to watermelons. But given the racial issues discussed in the post, that had unfortunate unintended connotations. When someone pointed that out to me, I changed it. To anyone who was offended: I am sorry. I did not intend to say anything associated with the racist slurs; I simply never thought of it. I should have, and I shouldn’t have needed someone to point it out to me. I’ll try to be more careful in the future.)

Expressions and Arity (Part 2): Equivalence

Continuing where I left off: we were talking about arity in Martin-Löf’s theory of expressions. There are two basic problems that arity solves: it makes certain kinds of impossible-to-evaluate expressions be invalid in the theory; and it helps enable some way of comparing expressions for equality. Arity solves both of those problems by imposing a simple type system over expressions.

At the end of the last post, I started giving a sketch of what arities look like. Now we’re going to dive in, and take a look at how to determine the arity of an expression. It’s a fairly simple system of rules.

Before diving in, I want to stress the most important thing about the way that these rules work is that the expressions are totally syntactic and static. This is something that confused me the first time I tried to read about expression theory. When you see an expression, you think about how it’s evaluated. But expression theory is a purely syntactic theory: it’s about analyzing expressions as syntactic entities. There are, deliberately, no evaluation rules. It’s about understanding what the notations mean, and how to determine when two expressions are equivalent.

If, under the rules of Martin-Löf’s expression theory, two expressions are equivalent, then if you were to chose a valid set of evaluation rules, the two expressions will evaluate to the same value. But expression equivalence is stronger: expressions are equivalent only if you can prove their equivalence from their syntax.

That clarified, let’s start by looking at the rules of arity in expressions.

Variables and Constants
Every variable and every primitive constant has a pre-defined arity; if x is a variable or primitive constant with arity \alpha, then the expression x has arity \alpha.
Definitions
In a definition D := e, the arity of the defined name D is the same as the arity of the expression e.
Applications
If a is an expression of arity \alpha \twoheadrightarrow \beta, and b is a expression of arity \alpha, then a(b) is an expression of arity \beta.
Abstractions
If e is an expression of arity \beta and x is a variable of arity \alpha, then (x)e is an expression of arity \alpha \twoheadrightarrow \beta.
Combinations
If e_1 is an expression af arity \alpha_1, e_2 is an expression of arity \alpha_2, …, and e_n is an expression of arity \alpha_n, then a combination expression e_1, e_2, ..., e_n is an expression of arity \alpha_1 \otimes \alpha_2 \otimes \ldots \otimes \alpha_n.
Selections
If e is an expression of arity \alpha_1 \otimes \alpha_2 \otimes \ldots \otimes \alpha_n where n \ge 2, then (e).i is an expression af arity e_i.

Let’s try working through an example: x^2 + 3x + 7.

  1. As we saw in this post, this is equivalent to the simple AST-form: (x)+(+(*(x,x), *(3, x)),7).
  2. x” is a variable of arity 0; “3” and “7” are constants of arity 0; “+” and “*” are constants of arity (0 \otimes 0) \twoheadrightarrow 0.
  3. From the combination rule, since x and 3 both have arity 0, (x, x) and (3, x) each have arity 0 \otimes 0.
  4. Since (x, x) has arity 0 \otimes 0, and * has arity (0 \otimes 0) \twoheadrightarrow 0, *(x, x) has arity 0. The same thing works for *(3, x).
  5. Since the arities of the *(x, x) and *(3, x) are both 0, the combination of the pair (the arguments to the inner “+”) are 0 \otimes 0, and the arity of the inner sum expression is thus 0.
  6. Since 7 has arity 0, the combination of it with the inner sum is 0 \otimes 0, and the arity of the outer sum is 0.
  7. Since x is a variable of arity 0, and outer sum expression has arity 0, the abstract has arity 0 \twoheadrightarrow 0.

If you’re familiar with type inference in the simply typed lambda calculus, this should look pretty familiar; the only real difference is that the only thing that arity really tracks is applicability and parameter counting.

Just from this much, we can see how this prevents problems. If you try to compute the arity of 3.1 (that is, the selection of the first element from 3), you find that you can’t: there is no arity rule that would allow you to do that. The selection rule only works on a product-arity, and 3 has arity 0.

The other reason we wanted arity was to allow us to compare expressions. Intuitively, it should be obvious that the expression e and the expression (x)e(x) are in some sense equal. But we need some way of being able to actually precisely define that equality.

The kind of equality that we’re trying to get at here is called definitional equality. We’re not trying to define equality where expressions a and b evaluate to equal values – that would be easy. Instead, we’re trying to get at something more subtle: we want to capture the idea that the expressions are different ways of writing the same thing.

We need arity for this, for a simple reason. Let’s go back to that first example expression: (x)+(+(*(x,x), *(3, x)),7). Is that equivalent to (y)+(+(*(y,y), *(3, y)),7)? Or to 8x+1? If we apply them to the value 3, and then evaluate them using standard arithmetic, then all three expressions evaluate to 25. So are they all equivalent? We want to be able to say that the first two are equivalent expressions, but the last one isn’t. And we’d really like to be able to say that structurally – that is, instead of saying something evaluation-based like “forall values of x, eval(f(x)) == eval(g(x)), therefore f == g”, we want to be able to do something that says f \equiv g because they have the same structure.

Using arity, we can work out a structural definition of equivalence for expressions.

In everything below, we’l write a: \alpha to mean that a has arity \alpha, and a \equiv b : \alpha to mean that a and b are equivalent expressions of arity \alpha. We’ll define equivalence in a classic inductive form by structure:

Variables and Constants
If x is a variable or constant of arity \alpha, then x \equiv x : alpha. This is the simplest identity rule: variables and constants are equivalent to themselves.
Definitions
If a := b is a definition, and b: \alpha, then a \equiv b : \alpha. This is a slightly more complex form of an identity rule: if there’s a definition of b as the value of a, then a and b are equivalent.
Application Rules
  1. If a \equiv a and b \equiv b, then a(b) \equiv a. If an applyable expression a is equivalent to another applyable expression a, then applying a to an expression b is equivalent to applying a to an expression b if the argument b is equivalent to the argument b. That’s a mouthful, but it’s simple: if you have two function application expressions, they’re equivalent if both the function expressions and the argument expressions are equivalent.
  2. If x is a variable of arity \alpha, and a is an expression of arity \alpha and b is an expression of arity \beta, then ((x)b)(a)  b[x := a]: \beta. This is arity’s version of the classic beta rule of lambda calculus: applying an abstraction to an argument means substituting the argument for all references to the abstracted parameter in the body of the abstraction.
Abstraction Rules
  1. If x is a variable of arity \alpha, and b \equiv b: \beta, then (x)b \equiv (x)b: \alpha \twoheadrightarrow \beta. If two expressions are equivalent, then two abstractions using the same variable over the same expression is equivalent.
  2. If x and y are both variables of arity \alpha, and b is an expression of arity \beta, then (x)b \equiv (y)(b[x := y]): \alpha \twoheadrightarrow \beta, provided y is not free in b.
    Basically, the renaming variables in abstractions don’t matter, as long as you aren’t using the variable in the body of the abstraction. So (x)(3+4y) is equivalent to (z)(3+4y), but it’s not equivalent to (y)(3+4y), because y is a free variable in 3+4y, and the abstraction would create a binding for y.

  3. This is arities version of the eta-rule from lambda calculus: If x is a variable of arity \alpha, and b is an expression of arity \alpha \twoheadrightarrow \beta, then (x)(b(x)) \equiv b: \alpha \twoheadrightarrow \beta. This is a fancy version of an identity rule: abstraction and application cancel.

Combination Rules
  1. If a_1 \equiv a_1, a_2 \equiv a_2, …, a_n \equiv a_n, then a_1, a_2, ..., a_n \equiv a_1. This one is simple: if you have two combination expressions with the same arity, then they’re equivalent if their elements are equivalent.
  2. If e: \alpha_1 \otimes \alpha_2 \otimes ... \otimes \alpha_n, then e.1, e.2, ..., e.n \equiv: e : \alpha_1 \otimes \alpha_2 \otimes ... \otimes \alpha_n. Another easy one: if you take a combination expression, and you decompose it using selections, and then recombine those selection expressions into a combination, it’s equivalent to the original expression.
Selection Rules
  1. If a \equiv a, then a.i \equiv a. This is the reverse of combinations rule one: if you have two equivalent tuples, then their elements are equivalent.
  2. If a_1: \alpha_1, a_2: \alpha_2, ..., a_n: \alpha_n, then (a_1, a_2, ..., a_n).i \equiv a_i. An element of a combination is equivalent to itself outside of the combination.
Symmetry
If a: \alpha, then a \equiv a: \alpha.
Symmetry
If a \equiv b: \alpha, then b \equiv a: \alpha.
Transitivity
If a \equiv b: \alpha, and b \equiv c: \alpha, then a \equiv c: \alpha.

Jumping back to our example: Is x^2 + 3x + 7 equivalent to x^2 + 3x + 7? If we convert them both into their canonical AST forms, then yes. They’re identical, except for one thing: the variable name in their abstraction. By abstraction rule 2, then, they’re equivalent.