Diagrams in Category Theory

One of the things that I find niftiest about category theory is category diagrams. A lot of things that normally turn into complex equations or long-winded logical statements can be expressed in diagrams by capturing the things that you’re talking about in a category, and then using category diagrams to express the idea that you want to get accross.

A category diagram is a directed graph, where the nodes are objects from a category, and the edges are morphisms. Category theorists say that a graph commutes if, for any two paths through arrows in the diagram from node A to node B, the composition of all edges from the first path is equal to the composition of all edges from the second path.

As usual, an example will make that clearer.

This diagram is a way of expression the associativy property of morphisms: f º (g º h) = (f º g) º h. The way that the diagram illustrates this is: (g º h) is the morphism from A to C. When we compose that with f, we wind up at D. Alternatively, (f º g) is the arrow from B to D; if we compose that with H, we wind up at D. The two paths: f º (A → C), and (B → D) &ordm h are both paths from A to D, therefore if the diagram commutes, they must be equal.

Let’s look at one more diagram, which we’ll use to define an interesting concept, the principal morphism between two objects. The principle morphism is a single arrow from A to B, and any composition of morphisms that goes from A to B will end up being equivalent to it.

In diagram form, a morphism m is principle if (∀ x : A → A) (∀ y : A → B), the following diagram commutes.

In words, this says that f is a principal morphism if for every endomorphic arrow x, and for every arrow y from A to B, f is is the result of composing x and y. There’s also something interesting about this diagram that you should notice: A appears twice in the diagram! It’s the same object; we just draw it in two places to make the commutation pattern easier to see. A single object can appear in a diagram as many times as you want to to make the pattern of commutation easy to see. When you’re looking at a diagram, you need to be a bit careful to read the labels to make sure you know what it means. (This paragraph was corrected after a commenter pointed out a really silly error; I originally said “any identity arrow”, not “any endomorphic arrow”.)

One more definition by diagram: x and y are a retraction pair, and A is a retract of B (written A < B) if the following diagram commutes:

That is, x : A → B, and y : B → A are a retraction pair if y º x = 1A.

Why so many languages? Programming languages, Computation, and Math.

Back at my old digs last week, I put up a post about programming languages and types. It produced an interesting discussion, which ended up shifting topics a bit, and leading to a very interesting question from one of the posters, and since the answer actually really does involve math, I’m going to pop it up to the front page here.

In the discussion, I argued that programmers should know and use many different programming languages; and that that’s not just a statement about todays programming languages, but something that I think will always be true: that there will always be good reasons for having and using more than one programming language.

One of the posters was quite surprised by this, and wanted to know why I didn’t think that it was possible, at least in principle, to design a single ideal programming language that would be good for any program that I needed to write.

It’s a good question, which I think connects very naturally into the underlying mathematics of computation. As I discussed back on the old blog, one of the very fundamental facts concerning computation is the Church-Turing thesis: that computation has a fundamental limit, and that there are many different ways of performing mechanical computation, but ultimately, they all end up with the same fundamental capabilities. Any computing system that reaches that limit is called an effective computing system (ECS). Anything that one ECS can do, any other ECS can do too. That doesn’t mean that they’re all identical. A given computation can be really easy to understand when described in terms of one ECS, and horribly difficult in another. For example, sorting a list of numbers is pretty easy to understand if it’s written in lambda calculus; but it’s a nightmare written for a Minsky machine; and it’s somewhere darned close to impossible for a human to understand written out as a cell-grid for Conway’s life.

How does this connect back to programming languages? Well, what is a programming language really? From a theoretical point of view, it’s a language for specifying a computing machine. Each program is a description of a specific computing machine. The language is a particular way of expressing the computing machine that you want to build. Underlying each programming language is an effective computing system: a fundamental model of how to perform computations. That’s the real fundamental difference between programming languages: what fundamental way of representing computation underlies the language.

Assuming that you’re looking at a good programming language, it’s good for a particular task when that task is easy to describe in terms of the fundamental ECS underlying the language; it’s bad for a task when that task is not easy to describe in terms of the languages underlying ECS.

If a language has a single ECS underlying it, then there are tasks that it’s good for, and tasks that it’s not so good for. If you try to smoosh multiple ECSs together under the covers of one language, then you don’t really have one language: when you’re writing for a vonNeumann machine, you’re really using one language, and when you’re writing for Actors, you’re using another. You’ve just forced two languages together into the same framework – but they’re still two languages, and you’ve probably compromised the quality of expression of those two languages by forcing them together.

A First Glance at Category Theory

To get started, what is category theory?

Back in grad school, I spent some time working with a thoroughly insane guy named John Case who was the new department chair. When he came to the university, he brought a couple of people with him, to take temporary positions. One of them was a category theorist whose name I have unfortunately forgotten. That was the first I’d ever heard of cat theory. So I asked John what the heck this category theory stuff was. His response was “abstract nonsense”. I was astonished; a guy as wacky and out of touch with reality as John called something abstract nonsense? It turned out to be a classic quotation, attributed to one of the founders of category theory, Norman Steenrod. It’s not an entirely bad description.

Category theory is one of those fields of mathematics that fascinates me: where you take some fundamental concept, and abstract it down to its bare essentials in order to understand just what it really is, what it really means. Just like group theory takes the idea of an algebraic operation, strip it down to the bare minimum, and discovering the meaning of symmetry; category theory looks at what happens if you take the concept of a function as a mapping from one thing to another, and strip it down to the bare minimum, and see what you can discover?

The fundamental thing in category theory is an arrow, also called a morphism. A morphism is an abstraction of the concept of homomorphism, which I talked about a bit when I was writing about group theory. We take that concept of a function mapping from one set of values to another, and strip it down: the basic concept is something that provides a map from one thing to some other thing – and that’s all we want.

In classic style, we’ll define a category C as a tuple: (O, M, º), where:

  • O, or Obj(C), is a set of objects. Objects are anything – we don’t care. Anything that we can define maps over.
  • M, or Mor(C) is a set of arrows or morphisms. Every arrow has a unique source and a unique target, both of which are objects in M. Given two objects a and b in O, we also write Mor(a,b) for the set of morphisms from a to b. To talk about a specific morphism from a to b, we write it as “name : a → b”; as in “f : int → int”.
  • º is a binary operation that is the abstraction of function composition; º is a mapping from (Mor(a,b), Mor(b,c)) to Mor(a,c). It’s got the basic properties of function composition:
    1. Associativity: (∀ f : a → b, g : b → c, h : c → d) h º (g º f) = (h º g) º f
    2. Identity: (∀ a,b ∈ O(C)) (exists 1a, 1b ∈ Mor(C)) (∀ f : a → b) 1b º f = f = f º 1a.

One neat little trick to simplify things is that we can actually throw away Obj(C), and replace it with the set of identity morphisms: since there is exactly one identity morphism per object, there’s no real need to distinguish between the identity morphism and the object. It’s a nice trick because it means that we have nothing but morphisms all the way down; but it’s really just a trick. We can talk about Obj(C); or Id(C); but we still need to be able to talk about the objects in some way, whether they’re just a subset of the morphisms, or something distinct.

Now, we get to something about category theory that I really don’t like. Category theory is front-loaded with rather a lot of definitions about different kinds of morphisms and different kinds of categories. The problem with that is that these definitions are very important, but we don’t have enough of the theory under our belts to be able to get much of a sense for why we should care about them, or what their intuitive meaning is. But that’s the way it goes sometimes; you have to start somewhere. It will make sense eventually, and you’ll see why these definitions matter.

There are a lot of special types of morphisms, defined by properties. Here’s the basic list:

  • A monic (or monomorphism) is an arrow f : a → b such that (∀ g1 , g2: x → a) f º g1 = f º g2 ⇒ g1 = g2. (That is, if any two arrows composed with f in f º g end up at the same object only if they are the same.)
  • An epic (or epimorphism) is an arrow f : a → b such that (∀ g1 , g2: b → x) g1 º f = g2 º f ⇒ g1 = g2. (This is almost the same as a monic, but it’s from the other side of the composition; instead of (f º g) in the definition, it’s (g º f).)
  • An isomorphism is an arrow f : a → b such that (∃ g : b → a) f º g = 1b ∧ g º f = 1a.
  • An endomorphism is an arrow f : a → b such that a = b.
  • An automorphism is an arrow that is both an endmorphism and an isomorphism.

One last definition, just because it gives me a chance to point out something useful about category theory. A functor is a morphism in the category of all categories. What that means is that it’s a structure-preserving mapping between categories. It’s neat in a theoretical way, because it demonstrates that we’re already at a point where we’re seeing how category theory can make it easier to talk about something complicated: we’re using it to describe itself! But the concept of functor also has a lot of applications; in particular, the module system of my favorite programming language makes extensive use of functors.

In Ocaml, a module is something called a structure, which is a set of definitions with constrained types. One of the things you often want to be able to do is to write a piece of code in a way that allows you to make it parametric on some other structure. The way you do that is to write a functor: a “function” from a structure to a structure. For example, to implement a generic binary tree, you need a type of values that you’ll put in the tree; and an operation to compare values. The way you do that is to write a functor which takes a structure defining a type and a comparison operator, and mapping it to a structure which is an implementation of a binary tree for that type and comparison.

The Ocaml functor is a category theoretic functor: category theory provides an easy way to talk about the concept of the compile-time “function” from structure to structure.

Election Fraud? Or just bad math?

I’ve gotten an absolutely unprecedented number of requests to write about RFK Jr’s Rolling Stone article about the 2004 election.

RFK Jr’s article tries to argue that the 2004 election was stolen. It does a wretched, sloppy, irresponsible job of making the argument. The shame of it is that I happen to believe, based on the information that I’ve seen, that the 2004 presidential election was stolen. But RFK Jr’s argument is just plain bad: a classic case of how you can use bad math to support any argument you care to make. As a result, I think that the article does just about the worst thing that it could do: to utterly discredit anyone who questions the results of that disastrous election, and make it far more difficult for anyone who wanted to do a responsible job of looking into the question.

Let’s get right into it. He starts his argument by claiming that the exit polls indicated a different result than the official election results:

The first indication that something was gravely amiss on November 2nd, 2004, was the inexplicable discrepancies between exit polls and actual vote counts. Polls in thirty states weren’t just off the mark — they deviated to an extent that cannot be accounted for by their margin of error. In all but four states, the discrepancy favored President Bush.

The key sentence that indicates just how poorly RFK Jr understands the math? “they deviated to an extent that cannot be accounted for by their margin of error”. That is a statement that is, quite simply, nonsensical. The margin of error in a poll is a statistical measure based on the standard deviation. Contrary to popular opinion, a poll with a margin of error of “4%” doesn’t mean that the actual quantity being measured must be within plus or minus 4% of the poll result.

A margin of error is measured to within a level of confidence. Most of the time, the MoE that we see cited is the MoE with 95% confidence. What this means is that 95% of the time, the sampled (polled) result is within that +/- n% range. But there is no case in which a result is impossible: the margin of error is an expression of how confident the poller is in the quality of their measurement: nothing more than that. Like any other measurement based on statistical sampling, the sample can deviate from the population by any quantity: a sample can be arbitrarily bad, even if you’re careful about how you select it.

Elections have consistently shown a bias in the exit polls: a bias in favor of the democratic vote. For some reason, which has not been adequately studied, exit polls almost always err on the side of sampling too many democratic voters. This could be the result of any number of factors: it could be a question of time (when were the polled people asked?); it could be a question of location (where were the pollsters located relative to the polling place?); it could be a social issue (the group of people that consistently votes for the democratic party may be more willing/have more time to answer pollsters questions); it could be something else entirely.

But you can’t conclude that an election was stolen on the basis of a discrepancy between official election results and exit polls. The best you can do is conclude that you need to look at both the election and the exit polling process to try to determine the reason for the discrepancy.

According to Steven F. Freeman, a visiting scholar at the University of Pennsylvania who specializes in research methodology, the odds against all three of those shifts occurring in concert are one in 660,000. ”As much as we can say in sound science that something is impossible,” he says, ”it is impossible that the discrepancies between predicted and actual vote count in the three critical battleground states of the 2004 election could have been due to chance or random error.”

That entire quote is, to put it crudely, utter bullshit. Anyone who would make that statement should be absolutely disqualified from ever commenting on a statistical result.

Now, thanks to careful examination of Mitofsky’s own data by Freeman and a team of eight researchers, we can say conclusively that the theory is dead wrong. In fact it was Democrats, not Republicans, who were more disinclined to answer pollsters’ questions on Election Day. In Bush strongholds, Freeman and the other researchers found that fifty-six percent of voters completed the exit survey — compared to only fifty-three percent in Kerry strongholds.(38) ”The data presented to support the claim not only fails to substantiate it,” observes Freeman, ”but actually contradicts it.”

Again, nonsense. There are two distinct questions in that paragraph, which are being deliberately conflated:

  1. In each given polling place, what percentage of people who voted were willing to participate in exit polls?
  2. In each given polling place, what percentage of the people who were willing to participate in exit polls were voters for each of the major parties?

The fact that a smaller percentage of people in places that tended to vote for the democratic candidate were willing to participate in exit polls is entirely independent of whether or not in a specific polling place a larger percentage of democratic voters than republican voters were willing to participate in the exit polls. This is a deliberate attempt to mislead readers about the meanings of the results – aka, a lie.

What’s more, Freeman found, the greatest disparities between exit polls and the official vote count came in Republican strongholds. In precincts where Bush received at least eighty percent of the vote, the exit polls were off by an average of ten percent. By contrast, in precincts where Kerry dominated by eighty percent or more, the exit polls were accurate to within three tenths of one percent — a pattern that suggests Republican election officials stuffed the ballot box in Bush country.

It could indicate that. It could also indicate that democratic voters were consistently more willing to participate in exit polls than republican voters. And therefore, in polling places that were strongly democratic, the sampling was quite representative; but in polling places that were strongly republican, the sampling was lousy.

Just to give an idea of how this works. Suppose we have two polling places, each of which has 20,000 voters. Suppose in district one, there is a 60%/40% split in favor of democratic voters; and in district two, there’s the opposite; 60% republican, 40% democrat. And let’s just use similar numbers for simplicity; suppose that in both polling places, 60% of democrats were willing to participate in exit polls, and 40% of republicans were willing. What’s the result?

  1. District one will have 12000 votes for the democrat, and 8000 for the republican. The exit polls will get 7200 democratic voters, and 3200 republican voters, or 69% of the vote going to democrats according to exit poll, versus 60% actual.
  2. District two will have the opposite number of votes: 8000 for the democrat, and 12000 for the republican. The exit polls would get 4800 democrats, and 4800 votes for republicans – predicting a 50/50 split.

The democratic margin of victory in the democratic area was increased; the republican margin was decreased by slightly more.

It continues very much in this same vein: giving unreliable samples undue evidence; bait-and-switch of statistics; and claims of measurement errors being impossible. But none of the mathematical arguments are true.

Was there fraud in the election? Almost certainly. Particularly in Ohio, there are some serious flaws that we know about. But this article manages to mix the facts of partisan manipulation of the election with so much gibberish that it discredits the few real facts that it presents.

RFK Jr. should be ashamed of himself. But based on his past record, I rather doubt that he is.

Next Topic Poll Results; or, the losers win

As I mentioned here, back on the old home of goodmath, I was taking a poll of what good math topic to cover next. In that poll, graph theory and topology were far away the most popular topics, tying for most votes (8 each), compared to no more than 2 votes for any other subject.
So, the next topic I’m going to talk about is: category theory.
There is actually a reason for that. I’m not just ignoring what people voted for. Based on the poll, I was planning on writing about topology, so I started doing some background reading on toplogy. What came up in the first chapter of the book I picked up? Explanations of how to interpret category-theory diagrams, because the author of the text found cat theory to be useful for explaining some of the concepts of topology.
I also looked up some articles on graph theory – in particular, graph isomorphisms, because that’s something I’ve done some work on. And what do I find when I start to read them? Again, category theory diagrams.
And what do I find when I look at wikipedia, to see if I missed anything in my recent series of posts on the semantics of lambda calculus? Category theory.
Category theory is a wierd subject, which has an amazing way of generating incredibly polarizing attitudes among mathematicians. But it’s cropping up more and more in numerous fields of mathematics, and it’s widely used in computer science. There seem to be significant doubts among many people as to whether or not category theory represents anything dramatically new, whether or not it provides new insights that couldn’t have been gained by any other fields of mathematics. But whether or not it’s a source of new knowledge, it seems to be undeniable that it is extremely useful as a tool for understanding and explaining other mathematical fields.
So I will definitely write about topology and graph theory soon. But first, it’s going to be category theory.

Site Banner Request (Update)

A lot of people have said that they’d be willing to try making a banner for the site, but that they’d like me to provide a bit more info on what I’d like.

  1. Relatively subdued colors; no hot pink. My color preferences generally run towards blues and purples, but pretty much anything that it’s flourescent is fine.
  2. Headers here run roughly 760 by 90 or so, so roughly that size.
  3. Easy to read text, including the name of the blog, and the subtitle that are currently there. I’d rather not have funny fonts mixed into the title.
  4. Something in the background that suggests the kind of math I do. Since my approach to math is much more focused on discrete math topics like structures and logic, I’d prefer to see something like graphs, category diagrams, topologies, or knots than equations.

I’ve already gotten one header suggestion, which I like a lot, so I’ll put it here for you to see the kind of thing I like.
As I said above, I’d prefer to see stuff in the background that’s more up my alley, but I really like the overall look of this one.
Whoever designs the banner that ends up getting picked will, of course, be credited on the “about” page of the blog; I’ll also let you pick a topic for me to write a series of posts about; and if I ever collect the stuff I write for the blog into a book, I’ll send you a free copy.

Site Banner Request

If you take a look around scienceblogs, a lot of the folks here have really beautiful banners for their blogs. Like, for example:

Unfortunately, while I’m a good math geek and a passable musician, I’m a really horrible artists. So I’m putting a request out to you guys, the readers. Can someone out there with some artistic ability try making a cool goodmath/badmath banner for me?

The beauty of math; the humor of stupidity.

%d bloggers like this: