The Futility of Pi Denialism

To me, the strangest crackpots I’ve encountered through this blog are the π denialists.

When people have trouble with Cantor and differently sized infinities, I get it. It defies our intuitions. It doesn’t seem to make sense.

When you look at Gödel incompleteness theorem – it’s really hard to wrap your head around. It doesn’t seem to make sense. I get it.

When you talk about things like indescribable numbers, it’s crazy. How could it possibly be true? I get it.

But π?

It’s a pretty simple number: the ratio of the diameter of the circle and the circle’s circumference. There’s nothing all that difficult about what it means. And there are so many different ways of calculating it! We can use the nature of a circle, and derive series that compute it. We can write simple programs, do tactile demos, measure actual physical phenomena. And yet, there are people who fervently believe that it’s all a sham: that the value of π isn’t what we say it is. It’s exactly 4. Or it’s exactly 22/7. Or it’s exactly $\frac{4}{\phi^{\frac{1}{2}}}$. Or it’s not a number, it’s an acceleration.

It’s amazing. I constantly get mail – mostly from fans of either Jain (the author of the &phi;-based &pi; mentioned above), or from followers of Miles Mathis (he of “&pi; isn’t a ratio, it’s an acceleration” fame), insisting that I’m part of the great mathematical conspiracy to deny the true factual value of &pi;.

And yet… It’s so simple to demonstrate how wrong that is.

My favorite version is a simple program.

Here’s the idea, followed by the code.

• Take the unit square – the region of the graph from (0, 0) to (1, 1), and inside of it, an arc of the circle of radius 1 around (0,0).
• Pick a random point, (x, y), anywhere inside of that square.
• If the distance from the origin ($x^2 + y^2$) is less than one, then the point is inside the circle. If it isn’t, then it’s outside of the circle.
• The probability, $p$, of any given random point being inside that circle is equal to the ratio of the area of the circle to the area of the square. The area of that region of the circle is: $\pi*1^2/4$, and the area of the the square is $1^2$. So the probability is $(1/4)\pi/1$, or $\pi/4$.
• So take a ton of random points, and count how many are inside the circle.
• The ratio of points inside the circle to total random points is $\pi/4$. The more random points you do this with, the closer you get to π.

We can turn that into a simple Python program:

from random import random

def computePi(points):
inside = 0
for i in range(points):
x = random()
y = random()
if (x*x + y*y) < 1.0:
inside = inside + 1
return (inside*1.0)/points * 4.0

for i in range(30):
pi = computePi(2**i)
print(f"Pi at 2**{i} iterations = {pi}")


The exact value that you’ll get when you run this depends on the random number generator, and the initial seed value. If you don’t specify a seed, most random number libraries will use something like last 32 digits of the current system time in nanoseconds, so you’ll get slightly different results each time you run it. I just ran it, and got:

Pi at 2**0 iterations = 4.0
Pi at 2**1 iterations = 4.0
Pi at 2**2 iterations = 3.0
Pi at 2**3 iterations = 2.0
Pi at 2**4 iterations = 3.5
Pi at 2**5 iterations = 2.75
Pi at 2**6 iterations = 3.0625
Pi at 2**7 iterations = 3.125
Pi at 2**8 iterations = 3.109375
Pi at 2**9 iterations = 3.1875
Pi at 2**10 iterations = 3.171875
Pi at 2**11 iterations = 3.126953125
Pi at 2**12 iterations = 3.12109375
Pi at 2**13 iterations = 3.14013671875
Pi at 2**14 iterations = 3.169677734375
Pi at 2**15 iterations = 3.1324462890625
Pi at 2**16 iterations = 3.14453125
Pi at 2**17 iterations = 3.147247314453125
Pi at 2**18 iterations = 3.138519287109375
Pi at 2**19 iterations = 3.1364669799804688
Pi at 2**20 iterations = 3.1443214416503906
Pi at 2**21 iterations = 3.141223907470703
Pi at 2**22 iterations = 3.141301155090332
Pi at 2**23 iterations = 3.1419320106506348
Pi at 2**24 iterations = 3.1415367126464844
Pi at 2**25 iterations = 3.1421539783477783
Pi at 2**26 iterations = 3.1420511603355408
Pi at 2**27 iterations = 3.1415300369262695
Pi at 2**28 iterations = 3.141532242298126
Pi at 2**29 iterations = 3.1415965482592583


I suspect that I could do a lot better using a special number library to reduce or eliminate the floating point roundoff errors, but I don’t really think it’s worth the time. Just this much, using a really simple, obvious, intuitive method produces a better result than any of the numbers pushed by the crackpots.

To support that previous statement: the best crackpot value for π is the one based on the golden ratio. That version insists that the true value of π is 3.14460551103. But you can see – by using the simple metric of counting points inside and outside the circle – that the actual value is quite different from that.

That’s what makes this breed of denialism so stupid. π isn’t complicated: it’s a simple ratio. And it’s easy to test using simple concepts. Pi relates the diameter (or radius) of a circle to the circumference or area of that circle. So any test that works with circles can easily show you what π is. There’s nothing mysterious or counterintuitive or debatable about it. It is what it is, and you can test it yourself.

Category Theory Lesson 3: From Arrows to Lambda

Quick personal aside: I haven’t been posting a lot on here lately. I keep wanting to get back to it; but each time I post anything, I’m met by a flurry of crap: general threats, lawsuit threats, attempts to steal various of my accounts, spam to my contacts on linkedin, subscriptions to ashley madison or gay porn sites, etc. It’s pretty demotivating. I shouldn’t let the jerks drive me away from my hobby of writing for this blog!

I started this series of posts by saying that Category Theory was an extremely abstract field of mathematics which was really useful in programming languages and in particular in programming language type systems. We’re finally at one of the first places where you can really see how that’s going to work.

If you program in Scala, you might have encountered curried functions. A curried function is something that’s in-between a one-parameter function and a two parameter function. For a trivial example, we could write a function that adds two integers in its usual form:

  def addInt(x: Int, y: Int): Int = x + y


That’s just a normal two parameter function. Its curried form is slightly different. It’s written:

  def curriedAddInt(x: Int)(y: Int): Int = x +y


The curried version isn’t actually a two parameter function. It’s a shorthand for:

  def realCurrentAddInt(x: Int): (Int => Int) = (y: Int) => x + y


That is: currentAddInt is a function which takes an integer, x, and returns a function which takes one parameter, and adds x to that parameter.

Currying is the operation of taking a two parameter function, and turning it into a one-parameter function that returns another one-parameter function – that is, the general form of converting addInt to realAddInt. It might be easier to read its type: realCurrentAddInt: Int => (Int => Int): It’s a function that takes an int, and returns a new function from int to int.

So what does that have to do with category theory?

One of the ways that category theory applies to programming languages is that types and type theory turn out to be natural categories. Almost any programming language type system is a category. For example, the figure below shows a simple view of a programming language with the types Int, Bool, and Unit. Unit is the initial object, and so all of the primitive constants are defined with arrows from Unit.

For the most part, that seems pretty simple: a type T is an object in the programming language category; a function implemented in the language that takes a parameter of type A and returns a value of type B is an arrow from A to B. A multi-parameter function just uses the cartesian product: a function that takes (A, B) and returns a C is an arrow from $A \times B \rightarrow C$.

But how could we write the type of a function like our curried adder? It’s a function from a value to a function. The types in our language are objects in the category. So where’s the object that represents functions from A to B?

As we do often, we’ll start by thinking about some basic concepts from set theory, and then generalize them into categories and arrows. In set theory, we can define the set of functions from $A$ to $B$ as: $B^A={f: A \rightarrow B}$ – that is, as exponentiation of the range of the produced functions.

• There’s a product object $B^A \times A$.
• There’s an arrow from $B^A \times A \rightarrow B$, which we’ll call eval.

In terms of the category of sets, what that means is:

• You can create a pair of a function from $A \rightarrow B$ and an element of $A$.
• There is a function named eval which takes that pair, and returns an instance of $B$.

Like we saw with products, there’s a lot of potential exponential objects $C$ which have the necessary product with $A$, and arrow from that product to $B$. But which one is the ideal exponential? Again, we’re trying to get to the object with thie universal property – the terminal object in the category of pseudo-exponentials. So we use the same pattern as before. For any potential exponential, there’s an arrow from the potential exponential to the actual exponential, and the potential exponential with arrows from every other potential exponential is the exponential.

Let’s start putting that together. A potential exponential $C$ for $B^A$ is an object where the following product and arrow exist:

There’s an instance of that pattern for the real exponential:

We can create a category of these potential exponentials. In that category, there will be an arrow from every potential exponential to the real exponential. Each of the potential exponentials has the necessary property of an exponential – that product and eval arrow above – but they also have other properties.

In that category of potential exponentials of $B^A$, there’s an arrow from an object $X$ to an object $Y$ if the following conditions hold in the base category:

• There is an arrow $\text{curry}(x,y): X \rightarrow Y$ in the base category.
• There is an arrow $\text{curry}(x,y)\times id_A: X\times A \rightarrow Y\times A$
• $\text{eval}_y(\text{curry}(x,y)\times id_A=\text{eval}_y y$

It’s easiest to understand that by looking at what it means in Set:

• We’ve got sets $X$ and $Y$, which we believe are potential exponents.
• $X$ has a function $\text{eval}_x: X \times A \rightarrow B$.
• $Y$ has a function $\text{eval}_y: Y \times A \rightarrow B$.
• There’s a function $\text{curry}: X \rightarrow Y$ which converts a value of $X$ to a value of $Y$, and a corresponding function $\text{curry}(\cdot)\times\text{id}_A: X\times A \rightarrow Y\times A$, which given a pair $(x, a) \in X\times A$ transforms it into a pair $(y, a) \in Y\times A$, where evaluating $\text{eval}_x(x, a)=\text{eval}_y(\text{curry}(x, a))$. In other words, if we restrict the inputs to $Y$ to be effectively the same as the inputs to $X$, then the two eval functions do the same thing. (Why do I say restrict? Because $\text{eval}_y$ might have a larger domain than the range of $X$, but these rules won’t capture that.)

An arrow in the category of potential products is a pair of two arrows in the base category:  one from $C \rightarrow B^A$, and one from $C\times A \rightarrow B^A \times A$ . Since the two arrows are deeply related (they’re one arrow in the category of potential exponentials), we’ll call them $\text{curry}(g)$ and $\text{curry}(g)\times id_A$. (Note that we’re not really taking the product of an arrow here: we haven’t talked about anything like taking products of arrows! All we’re doing is giving the arrow a name that helps us understand it. The name makes it clear that we’re not touching the right-hand component of the product.)

Since the exponential is the terminal, which means that that pair of curry arrows must exist for every potential exponential to the true exponential. So the exponential object is the unique (up to isomorphism) object for which the following is true:

• There’s an arrow $\text{eval}: B^A \times A \rightarrow A$. Since $B^A$ is the type of functions from $A$ to $B$, $\text{eval}$ represents the application of one of those functions to a value of type $A$ to produce a result of type $B$.
• For each two-parameter function $g:C\times A\rightarrow B$, there is a unique function (arrow) $\text{curry}(g)$ that makes the following diagram commute

Now, how does all this relate to what we understand as currying?

It shows us that in category theory we can have an object that is effectively represents a function type in the same category as the object that represents the type of values it operates on, and you can capture the notion of applying values of that function type onto values of their parameter type using an arrow.

As I said before: not every category has a structure that can support exponentiation. The examples of this aren’t particularly easy to follow. The easiest one I’ve found is Top the category of topological spaces. In Top, the exponent doesn’t exist for many objects. Objects in Top are topological spaces, and arrows are continuous functions between them. For any two objects in Top, you can create the necessary objects for the exponential. But for many topological spaces, the required arrows don’t exist. The functions that they correspond to exist in Set, but they’re not continuous – and so they aren’t arrows in Top. (The strict requirement is that for an exponential $X^Y$ to exist, $Y$ must be a locally compact Hausdorff space. What that means is well beyond the scope of this!)

Cartesian Closed Categories

If you have a category $C$, and for every pair of objects $A$ and $B$ in the category $C$, there exists an exponential object $B^A \in C$, then we’ll say that $C$ has exponentiation. Similarly, if for every pair of objects $A, B \in Ob(C)$, there exists a product object $A\times B$, we say that the category has products.

There’s a special kind of category, called a cartesian closed category, which is a category  where:

1. Every pair of objects has both product and exponent objects; and
2. Which has at least one terminal object. (Remember that terminals are something like singletons, and so they work as a way of capturing the notion of being a single element of an object; so this requirement basically says that the category has at least one value that “functions” can be applied to.)

That may seem like a very arbitrary set of rules: what’s so important about having all products, exponents, and a terminal object?

It means that we have a category which can model types, functions, and function application. Lambda calculus proves that that’s all you need to model computation. Closed cartesian categories are, basically, a formal model of a computing system! Any cartesian closed category is a model for a simply typed $\lambda$-calculus; and $\lambda$-calculus is something known as the internal language of a cartesian closed category.

What “internal language” means formally is complicated, but in simple terms: you can take any computation in lambda calculus, and perform the computation by chasing arrows in a category diagram of a closed cartesian category that includes the values of that calculus. Alternatively, every computation step that you perform evaluating a $\lambda$-calculus expression corresponds to an arrow in a CCC.

References

For this post, I’ve made heavy use of:

Introduction and Motivation

One thing that I keep bumping up against as an engineer who loves functional a programming is category theory. It often seems like there are two kinds of functional programmers: people who came into functional programming via engineering, and people who came into functional programming via math. The problem is that a lot of the really interesting work in languages and libraries for functional programming are being built from the mathematical side, but for people on the engineering side, it’s impenetrable: it’s like it’s written in a whole different language, and even basic discussions about programming go off the rails, because the basic abstractions don’t make any sense if you don’t know category theory.

But how do you learn category theory? It seems impenetrable to mere humans. For example, one of the textbooks on category theory that several people told me was the most approachable starts chapter one with the line:

A group extension of an abelian group $H$ by an abelian group $G$ consists of a group $E$ together with an inclusion of $G \hookrightarrow E$ as a normal subgroup and a surjective homomorphism $E \twoheadrightarrow H$ that displays $H$ as the quotient group $E|G$.

If you’re not a professional mathematician, then that is pure gobbledigook. But that seems to be typical of how initiates of category theory talk about it. But the basic concepts, while abstract, really aren’t all that tricky. In many ways, it feels a lot like set theory: there’s a simple conceptual framework, on which you can build extremely complicated formalisms. The difference is that while many people have spent years figuring out how to make the basics of set theory accessible to lay-people, but that effort hasn’t been applied to set theory.

What’s the point?

Ok, so why should you care about category theory?

Category theory is a different way of thinking, and it’s a language for talking about abstractions. The heart of engineering is abstraction. We take problems, and turn them into abstract structures. We look at the structures we create, and recognize commonalities between those structures, and then we create new abstractions based on the commonalities. The hardest part of designing a good library is identifying the right abstractions.

Category theory is a tool for talking about structures, which is particularly well suited to thinking about software. In category theory, we think in terms of arrows, where arrows are mappings between objects. We’ll see what that means in detail later, but the gist of it is that one example of arrows mapping between objects is functions mapping between data types in a computer program.

Category theory is built on thinking with orrows, and building structures using arrows. It’s about looking at mathematical constructions built with arrows, and in those structures, figuring out what the fundamental parts are. When we abstract enough, we can start to see that things that look very different are really just different realizations of the same underlying structure. Category theory gives us a language and a set of tools for doing that kind of abstraction – and then we can take the abstract structures that we identify, and turn them into code – into very generic libraries that express deep, fundamental structure.

Monoids in Code

We’ll get started by looking at a simple mathematical structure called a monoid, and how we can implement it in code; and then, we’ll move on to take an informal look at how it works in terms of categories.

Most of the categorical abstractions in Scala are implemented using something called a typeclass, so we’ll start by looking at typeclasses. Typeclasses aren’t a category theoretical notion, but they make it much, much easier to build categorical structures. And they do give us a bit of categorical flavor: a typeclass defines a kind of metatype – that is, a type of type – and we’ll see, that kind of self-reflective abstraction is a key part of category theory.

The easiest way to think about typeclasses is that they’re a kind of metatype – literally, as the name suggests, they define classes where the elements of those classes are types. So a typeclass provides an interface that a type must provide in order to be an instance of the metatype. Just like you can implement an interface in a type by providing implementations of its methods, you can implement a typeclass by providing implementations of its operations.

In Scala, you implement the operations of a typeclasses using a language construct called an implicit parameter. The implicit parameter attaches the typeclass operations to a meta-object that can be passed around the program invisibly, providing the typeclass’s operations.

Let’s take a look at an example. An operation that comes up very frequently in any kind of data processing code is reduction: taking a collection of values of some type, and combining them into a single value. Taking the sum of a list of integers, the product of an array of floats, and the concatenation of a list of strings are all examples of reduction. Under the covers, these are all similar: they’re taking an ordered group of values, and performing an operation on them. Let’s look at a couple of examples of this:

def reduceFloats(floats: List[Float]): Float =
floats.foldRight(0)((x, y) => x + y)

def reduceStrings(strings: Seq[String]): String =
strings.foldRight("")((x, y) => x.concat(y))


When you look at the code, they look very similar. They’re both just instantiations of the same structural pattern:

def reduceX(xes: List[X]): X =
xes.foldRight(xIdentity)((a, b) => Xcombiner(a, b))


The types are different; the actual operation used to combine the values is different; the base value in the code is different. But they’re both built on the same pattern:

• There’s a type of values we want to combine: Float or String. Everything we care about in reduction is a connected with this type.
• There’s a collection of values that we want to combine, from left to right. In one case, that’s a List[Float], and in the other, it’s a Seq[String]. The type doesn’t matter, as long as we can iterate over it.
• There’s an identity value that we can use as a starting point for building the result; 0 for the floats, and "" (the empty string) for the strings.
• There’s an operation to combine two values: + for the floats, and concat for the strings.

We can capture that concept by writing an interface (a trait, in Scala terms) that captures it; that interface is called a typeclass. It happens that this concept of reducible values is called a monoid in abstract algebra, so that’s the name we’ll use.

trait Monoid[A]  {
def empty: A
def combine(x: A, y: A): A
}


We can read that as saying “A is a monoid if there are implementations of empty and combine that meet these constraints”. Given the declaration of the typeclass, we can implement it as an object which provides those operations for a particular type:

object FloatAdditionMonoid extends Monoid[Float] {
def empty: Float = 0.0
def combine(x: Float, y: Float): Float = x + y
}

object StringConcatMonoid extends Monoid[String] {
def empty: String = ""
def combine(x: String, y: String): String = x.concat(y)
}


FloatAdditionMonoid implements the typeclass Monoid for the type Float. And since we can write an implementation of Monoid for Float or String, we can say that the types Float and String are instances of the typeclass Monoid.

Using our implementation of Monoid, we can write a single, generic reduction operator now:

def reduce[A](values: Seq[A], monoid: Monoid[A]): A =
values.foldRight(monoid.empty)(monoid.combine)


We can use that to reduce a list of floats:

reduce([1.0, 3.14, 2.718, 1.414, 1.732], FloatAdditionMonoid)


And we can do a bit better than that! We can set up an implicit, so that we don’t need to pass the monoid implementation around. In Scala, an implicit is a kind of dynamically scoped value. For a given type, there can be one implicit value of that type in effect at any point in the code. If a function takes an implicit parameter of that type, then the nearest definition in the execution stack will automatically be inserted if the parameter isn’t passed explicitly.

def reduce[A](values: Seq[A])(implicit A: Monoid[A]): A =
list.foldRight(A.empty)(A.combine)


And as long as there’s a definition of the Monoid for a type A in scope, we can can use that now by just writing:

implicit object FloatAdditionMonoid extends Monoid[Float] {
def empty: Float = 0.0
def combine(x: Float, y: Float): Float = x + y
}

val floats: List[Float] = ...
val result = reduce(floats)


Now, anywhere that the FloatAdditionMonoid declaration is imported, you can call reduce on any sequence of floats, and the implicit value will automatically be inserted.

Using this idea of a monoid, we’ve captured the concept of reduction in a common abstraction. Our notion of reduction doesn’t care about whether we’re reducing strings by concatenation, integers by addition, floats by multiplication, sets by union. Those are all valid uses of the concept of a monoid, and they’re all easy to implement using the monoid typeclass. The concept of monoid isn’t a difficult one, but at the same time, it’s not necessarily something that most of us would have thought about as an abstraction.

We’ve got a typeclass for a monoid; now, we’ll try to connect it into category theory. It’s a bit tricky, so we won’t cover it all at once. We’ll look at it a little bit now, and we’ll come back to it in a later lesson, after we’ve absorbed a bit more.

From Sets to Arrows

For most of us, if we’ve heard of monoids, we’ve heard of them in terms of set theory and abstract algebra. So in that domain, what’s a monoid?

A monoid is a triple $(V, 1, *)$, where:

• $V$ is a set of values;
• $1$ is a value in $V$;
• $*$ is a total binary operator where:
• $1$ is an identity of $*$: For any value $v \in V: v*1 = 1*v = v$.
• $*$ is associative: for any values $v, w, x \in V: (v * w) * x = v * (w * x)$

That’s all just a formal way of saying that a monoid is a set with a binary associative operator and an identity value. The set of integers can form a monoid with addition as the operator, and 0 as identity. Real numbers can be a monoid with multiplication and 1. Strings can be a monoid with concatenation as the operator, and empty string as identity.

But we can look at it in a different way, too, by thinking entirely in terms of function.
Let’s forget about the numbers as individual values, and instead, let’s think about them in functional terms. Every number is a function which adds itself to its parameter. So “2” isn’t a number, it’s a function which adds two to anything.

How can we tell that 2 is a function which adds two to things?

If we compose it with 3 (the function that adds three to things), we get 5 (the function that adds five to things). And how do we know that? Because it’s the same thing that we get if we compose 3 with 1, and then compose the result of that with 1 again. 3+1+1=5, and 3+2=5. We can also tell that it’s 2, because if we just take 1, and compose it with itself, what we’ll get back is the object that we call 2.

In this scheme, all of the numbers are related not by arithmetic, not by an underlying concept of quantity or cardinality or ordinality, but only by how they compose with each other. We can’t see anything else – all we have are these functions. But we can recognize that they are the natural numbers that we’re familiar with.

Looking at it this way, we can think of the world of natural numbers as a single point, which represents the set of all natural numbers. And around that point, we’ve got lots and lots of arrows, each of which goes from that point back to itself. Each of those arrows represents one number. The way we tell them apart is by understanding which arrow we get back when we compose them. Take any arrow from that point back to that point, and compose it with the arrow 0, and what do you get? The arrow you started with. Take any arrow that you want, and compose it with 2. What do you get? You get the same thing that you’d get if you composed it with 1, and then composed it with one again.

That dot, with those arrows, is a category.

What kind of advantage do we get in going from the algebraic notion of a set with a binary operation, to the categorical notion of an object with a bunch of composable arrows? It allows to understand a monoid purely as a structure, without having the think about what the objects are, or what the operator means.

Now, let’s jump back to our monoid typeclass for a moment.

trait Monoid[A]  {
def empty: A
def combine(x: A, y: A): A
}


We can understand this as being a programmable interface for the categorical object that we just described. All we need to do is read “:” as “is an arrow in”: It says that A is a monoid if:

• It has an element called empty which is an arrow in A.
• It has an operation called combine which, given any two arrows in A, composes them into a new arrow in A.

There are, of course, other conditions – combine needs to be associative, and empty needs to behave as the identity value. But just like when we write an interface for, say, a binary search tree, the interface only defines the structure not the ordering condition, the typeclass defines the functional structure of the categorical object, not the logical conditions.

This is what categories are really all about: tearing things down to a simple core, where everything is expressed in terms of arrows. It’s almost reasoning in functions, except that it’s even more abstract than that: the arrows don’t need to be functions – they just need to be composable mappings from things to things.

Deeper Into Arrows

We can abstract a bit more, and look at the entire construction, including the identity and associativity constraints entirely in terms of arrows. To really understand this, we’ll need to spend some time diving deeper into the actual theory of categories, but as a preview, we can describe a monoid with the following pair of diagrams (copied from wikipedia):

In these diagrams, any two paths between the same start and end-nodes are equivalent (up to isomorphism). When you understand how to read this diagrams, these really do define everything that we care about for monoids.

For now, we’ll just run through and name the parts – and then later, in another lesson, we’ll come back, and we’ll look at this in more detail.

• $\mu$ is an arrow from $M\times M \rightarrow M$, which we’ll call a multiplication operator.
• $\eta$ is an arrow from $I \rightarrow M$, called unit.
• $\alpha$ is an arrow from $(M\times M)\times M \rightarrow M \times (M\times M)$ which represents the associativity property of the monoid.
• $\lambda$ is a morphism which represents the left identity property of the monoid (that is, $1*x=x$), and $\rho$ is a morphism representing the right identity property $(x*1=x)$.

This diagram, using these arrows, is a way of representing all of the key properties of a monoid via nothing but arrows and composition. It says, among other things, that:

• $(M \times M) \times M$ composes with multiplication to be $M \times M$.
That is, applying multiplication to $(M \times M) \times M$ evaluates to (M \times M).
• $(M \times M) \times M$ composed with associativity can become $M \times (M \times M)$.

So it’s a monoid – but it’s a higher level monoid. In this, $M$ isn’t just an object in a category: it’s an entire category. These arrows are arrows between categories in a category of categories.

What we’ll see when we get deeper into category theory is how powerful this kind of abstraction can get. We’ll often see a sequence of abstractions, where we start with a simple concept (like monoid), and find a way to express it in terms of arrows between objects in a category. But then, we’ll lift it up, and look at how we can see in not just as a relation between objects in a category, but as a different kind of relation between categories, by constructing the same thing using a category of categories. And then we’ll abstract even further, and construct the same thing using mappings between categories of categories.

(You can find the next lesson <a href=”http://www.goodmath.org/blog/2019/02/20/category-theory-lesson-2-basics-of-categorical-abstraction/”>here</a>.)

Understanding Global Warming Scale Issues

Aside from the endless stream of Cantor cranks, the next biggest category of emails I get is from climate “skeptics”. They all ask pretty much the same question. For example, here’s one I received today:

My personal analysis, and natural sceptisism tells me, that there are something fundamentally wrong with the entire warming theory when it comes to the CO2.

If a gas in the atmosphere increase from 0.03 to 0.04… that just cant be a significant parameter, can it?

I generally ignore it, because… let’s face it, the majority of people who ask this question aren’t looking for a real answer. But this one was much more polite and reasonable than most, so I decided to answer it. And once I went to the trouble of writing a response, I figured that I might as well turn it into a post as well.

The current figures – you can find them in a variety of places from wikipedia to the US NOAA – are that the atmosphere CO2 has changed from around 280 parts per million in 1850 to 400 parts per million today.

Why can’t that be a significant parameter?

There’s a couple of things to understand to grasp global warming: how much energy carbon dioxide can trap in the atmosphere, and hom much carbon dioxide there actually is in the atmosphere. Put those two facts together, and you realize that we’re talking about a massive quantity of carbon dioxide trapping a massive amount of energy.

The problem is scale. Humans notoriously have a really hard time wrapping our heads around scale. When numbers get big enough, we aren’t able to really grasp them intuitively and understand what they mean. The difference between two numbers like 300 and 400ppm is tiny, we can’t really grasp how in could be significant, because we aren’t good at taking that small difference, and realizing just how ridiculously large it actually is.

If you actually look at the math behind the greenhouse effect, you find that some gasses are very effective at trapping heat. The earth is only habitable because of the carbon dioxide in the atmosphere – without it, earth would be too cold for life. Small amounts of it provide enough heat-trapping effect to move us from a frozen rock to the world we have. Increasing the quantity of it increases the amount of heat it can trap.

Let’s think about what the difference between 280 and 400 parts per million actually means at the scale of earth’s atmosphere. You hear a number like 400ppm – that’s 4 one-hundreds of one percent – that seems like nothing, right? How could that have such a massive effect?!

But like so many other mathematical things, you need to put that number into the appropriate scale. The earths atmosphere masses roughly 5 times 10^21 grams. 400ppm of that scales to 2 times 10^18 grams of carbon dioxide. That’s 2 billion trillion kilograms of CO2. Compared to 100 years ago, that’s about 800 million trillion kilograms of carbon dioxide added to the atmosphere over the last hundred years. That’s a really, really massive quantity of carbon dioxide! scaled to the number of particles, that’s something around 10^40th (plus or minus a couple of powers of ten – at this scale, who cares?) additional molecules of carbon dioxide in the atmosphere. It’s a very small percentage, but it’s a huge quantity.

When you talk about trapping heat, you also have to remember that there’s scaling issues there, too. We’re not talking about adding 100 degrees to the earths temperature. It’s a massive increase in the quantity of energy in the atmosphere, but because the atmosphere is so large, it doesn’t look like much: just a couple of degrees. That can be very deceptive – 5 degrees celsius isn’t a huge temperature difference. But if you think of the quantity of extra energy that’s being absorbed by the atmosphere to produce that difference, it’s pretty damned huge. It doesn’t necessarily look like all that much when you see it stated at 2 degrees celsius – but if you think of it terms of the quantity of additional energy being trapped by the atmosphere, it’s very significant.

Calculating just how much energy a molecule of CO2 can absorb is a lot trickier than calculating the mass-change of the quantity of CO2 in the atmosphere. It’s a complicated phenomenon which involves a lot of different factors – how much infrared is absorbed by an atom, how quickly that energy gets distributed into the other molecules that it interacts with… I’m not going to go into detail on that. There’s a ton of places, like here, where you can look up a detailed explanation. But when you consider the scale issues, it should be clear that there’s a pretty damned massive increase in the capacity to absorb energy in a small percentage-wise increase in the quantity of CO2.

Godel part 2: Arithmetic and Logic

In the last post, we saw how to take statements written in the logic of the Principia Mathematica, and convert them into numerical form using Gödel numbering. For the next step in Gödel’s proof, we need to go meta-mathematical.

Ultimately, we want to write first-order statements that can reason about first order statements. But the entire structure of the principia and its logic is designed to make
that impossible. First order statements can only reason about numbers and their properties.

But now, we’ve got the ability to represent statements – first order, second order, third order, any order. What we still need is a way of describing the properties of those numerical statements in terms of operations that can be expressed using nothing but first order statements.

The basic trick to incompleteness is that we’re going to use the numerical encoding of statements to say that a predicate or relation is represented by a number. Then we’re going to write predicates about predicates by defining predicates on the numerical representations of the first-order predicates. That’s going to let us create a true statement in the logic that can’t be proven with the logic.

To do that, we need to figure out how to take our statements and relations represented as numbers, and express properties of those statements and relations in terms of arithmetic. To do that, we need to define just what it means to express something arithmetically. Gödel did that by defining “arithmetically” in terms of a concept called primitive recursion.

I learned about primitive recursion when I studied computational complexity. Nowadays, it’s seen as part of theoretical computer science. The idea, as we express it in modern terms, is that there are many different classes of computable functions. Primitive recursion is one of the basic complexity classes. You don’t need a Turing machine to compute primitive recursive functions – they’re a simpler class.

The easiest way to understand primitive recursion is that it’s what you get in a programming language with integer arithmetic, and simple for-loops. The only way you can iterate is by repeating things a bounded number of times. Primitive recursion has a lot of interesting properties: the two key ones for our purposes here are: number theoretic proofs are primitive recursive, and every computation of a primitive recursive function is guaranteed to complete within a bounded amount of time.

The formal definition of primitive recursion, the way that Gödel wrote it, is quite a bit more complex than that. But it means the same thing.

We start with what it means to define a formula via primitive recursion. (Note the language that I used there: I’m not explaining what it means for a function to be primitive recursive; I’m explaining what it means to be defined via primitive recursion.) And I’m defining formulae, not functions. In Gödel’s proof, we’re always focused on numerical reasoning, so we’re not going to talk about programs or algorithms, we’re going to about the definition of formulae.

A formula $phi(x_1, x_2, ..., x_n)$ is defined via primitive recursion if, for some other formulae $\psi$ and $\mu$:

• Base: $\phi(0, x_2, ..., x_n) = \psi(x_2, ..., x_n)$
• Recursive: $\phi(i+1, x_2, ..., x_n) = \mu(i, \phi(i, x_2, ..., x_n), x_2, ..., x_n)$.

So, basically, the first parameter is a bound on the number of times that $phi$ can invoked recursively. When it’s 0, you can’t invoke $\phi$ any more.

A formula is primitive recursive if it defined from a collection of formulae $\phi_1, ..., \phi_n$ where any formula $\phi_i$ is defined via primitive recursion from $\phi_1, ..., \phi_{i-1}$, or the primitive succ function from Peano arithmetic.

For any formula $phi_i$ in that sequence, the degree of the formula is the number of other primitive recursive formulae used in its definition.

Now, we can define a primitive recursive property: $R(x_1, ..., x_n)$ is primitive recursive if and only if there exists a primitive recursive function $\phi$ such that $\phi(x_1, ..., x_n) = 0$.

With primitive recursive formulae and relations defined, there’s a bunch of theorems about how you can compose primitive recursive formulae and relations:

1. Every function or relation that you get by substituting a primitive recursive function for a variable in a primitive recursive function/relation is primitive recursive.
2. If R and S are primitive relations, then ¬R, R∧S, R∨S are all primitive recursive.
3. If $\phi(x_1, ..., x_n)$ and $\psi(x_1, ..., x_n)$ are primitive recursive functions, then the relation $R(x_1, ..., x_n) \Leftrightarrow (\phi(x_1, ..., x_n) = \psi(x_1, ..., x_n)$ is also primitive recursive.
4. Let $xv$ and $zv$ be finite-length tuples of variables. If the function $\phi(xv)$ and the relation $R(y, zv)$ are primitive recursive, then so are the relations:
• $S(xv, zv) \Leftrightarrow (\exists y \le \phi(xv). R(y, zv))$
• $T(xv, zv) \Leftrightarrow (\forall y \le A(xv). R(y, zv))$
5. Let $xv$ and $zv$ be finite-length tuples of variables. And let $text{argmin}[y le f(x).R(x)]$ be the smallest value of $x$ for which $y le f(x)$ and $R(x)$ is true, or 0 if there is no such value. Then if the function $phi(xv)$ and the relation $R(y, zv)$ are primitive recursive, then so is the function $P(xv, zv) = (\text{argmin}[y \le A(xv). R(y, zv))]$.

By these definitions, addition, subtraction, multiplication, and integer division are all primitive recursive.

Ok. So, now we’ve got all of that out of the way. It’s painful, but it’s important. What we’ve done is come up with a good formal description of what it means for something to be an arithmetic property: if we can write it as a primitive recursive relation or formula, it’s arithmetic.

Quick note: I’m experimenting with showing ads on the blog. In my time with scientopia, I dumped a frankly ridiculous amount of money into keeping things online – it ended up coming to around \$4K/year. Obviously, this site is a lot cheaper, since I only need to support the one blog, but still, I’d like to get to the point where it’s self-supporting.

If the ads are annoying, let me know. I’m trying to strike a compromise. I’d like to get some money coming in to cover hosting costs, but I don’t want to degrade the experience of my readers. The way I’ve set it up, the ads shouldn’t have popups, and they shouldn’t have any interactive animations. If you see anything like that, please let me know as soon as possible, and I’ll do my best to get it fixed. At the moment, the visual appearance of the ads is less than idea, but I’ll be tweaking it until it’s reasonable. (For now, things are still in process with Google, so all you see is a big yellowish box as a placeholder; as soon as someone at Google gets around to approving it, that should be replaced by ads.

If ads are offensive, again, let me know. I don’t want to be giving exposure to blatantly offensive rubbish.

If ads are pseudoscience or that sort of thing, I don’t care. I believe that the people who read this blog are intelligent enough to realize what a sham that crap is, and I have no problem taking money from them – every cent they waste buying ads on my blog is a cent that they’re not using for something a lot worse!

A Recipe for Gefilte Fish

My mom died last friday night, a little bit after midnight. I’ll probably write something about her, when I’m feeling up to it, but not yet. Being a jewish dad, when I’m depressed, what do I do? I cook.

Last year, for the first time ever, I made homemade gefilte fish for Pesach. If you didn’t grow up Jewish, odds are you’re not familiar with gefilte fish. It’s a traditional ashkenazi (that is, eastern european jewish) dish. It was originally a ground fish mixture cooked inside the body cavity of a fish. It evolved into just the stuffing mixture, simmered in a fish stock. These days, most people just buy it in a jar. If you grew up with it, even out of the jar, it’s a treat; if you didn’t, and you’ve been exposed to it, it looks and smells like dog food.

Honestly, I love the stuff. In general, I’m not a big fan of most traditional Jewish foods. But there’s something about gefilte fish. But even as I enjoy it, I can see the gross side. It’s crazy overprocessed – I mean, come on – it’s fish that will keep, unrefrigerated, for years!

But made fresh, it’s a whole different kettle of fish. This stuff is really good. You’ll definitely recognize the flavor of this as gefilte fish, but it’s a much cleaner flavor. It tastes like fish, not like stale overprocessed fish guts.

So this year, I’m depressed over my mom; after the funeral, I sent my wife out to buy me a bunch of fish, and I made up a batch. This time, I kept notes on how I did it – and it turned out even better than last year.

It’s got a bit of a twist in the recipe. I’m married to a chinese woman, so when the Jewish holidays roll around, I always try to find some way of putting an asian spin on the food, to reflect the nature of our family. So when I cooked the gefilte fish, instead of cooking it in the traditional simple fish broth, I cooked it in dashi. It’s not chinese, but it’s got a lot of flavors that are homey for a chinese person.

So… here’s the recipe for Mark’s homemade salmon dashi gefilte fish!

Ingredients

• 2 whole pike, gutted and cleaned, but with skin, head, and bones
• 2 whole red snapper, gutted and cleaned, but with skin, head and bones
• 2 pounds salmon filet
• 3/4 to 1 cup matzoh meal
• 3 eggs
• salt (to taste)
• 2 sheets of konbu (japanese dried kelp)
• 2 handfulls dried shaved bonito
• 4 or 5 slices of fresh ginger, crushed
• 2 onions
• 2 large carrots

(For the fish for this, you really want the bones, the skins, and the head. If you’ve got a fish market that will fillet it for you, and then give you all of the parts, have them do that. Otherwise, do it yourself. Don’t worry about how well you can fillet it – it’s going to get ground up, so if you do a messy job, it’s not a problem.)

Instructions

1. First thing, you need to make the stock that you’ll eventually cook the gefilte fish in:
1. If the fish store didn’t fillet the fish for you, you need to remove the filets from the fish, and then remove the skin from the filets.
2. Put all of the bones, skin, and head into a stock pot.
3. Cover the fish bones with with water.
4. Add one onion, and all of the garlic and ginger to the pot.
5. Heat to a boil, and then simmer for two hours.
6. Strain out all of the bones, and put the stock back into the pot and bring to a boil.
7. Add the kombu to the stock, and let it simmer for 30 minutes.
8. Remove from the heat, and strain out the kombu.
9. Add the bonito (off the heat), and let it sit for 15 minutes.
10. Strain out the bonito and any remaining solids.
2. While the stock is simmering, you can get started on the fish:
1. Cut all of the fish into chunks, and put them through a meat grinder with a coarse blade (or grind them coarsely in batches in best food processor you can get your hands on.
r.)
2. Cut the onion and carrots into chunks, and put them through the grinder as well.
3. Beat the eggs. Fold the eggs and the salt into the ground fish mixture.
5. Refrigerate for two hours.
3. Now you’re ready to cook the gefilte fish!
1. Heat the stock up to a gentle simmer.
2. Scoop up the fish into balls containing about two tablespoons of the fish mixture, and roll them into balls.
3. Add the fish balls into the simmering stock. Don’t overcrowd the pot – add no more than can fit into the pot in a single layer.
4. Simmer for 10-15 minutes until the fish balls are cooked through.
5. Remove the balls from the simmering liquid. Repeat until all of the fish is cooked.
6. Put all the cooked fish balls back into the stock, and refrigerate.

On outing in the sciblogging community

I’m coming in to this a bit late, but since I really do care about the online science blogging community,I still have something that I want to say.

For those who don’t know, there’s a complete horses ass named Henry Gee. Henry is an editor at the science journal Nature. Poor Henry got into some fights with DrIsis (a prominent science blogger), and DrIsis was mean to him. The poor little guy was so hurt that he decided that he needed to get back at her – and so, Henry went ahead and he outed her, announcing her real name to the world.

This was a thoroughly shitty thing to do.

It’s not that I think Isis didn’t do anything wrong. We’ve got history, she and I. My experience with her led me to conclude that she’s a petty, vicious bully that takes great pleasure in inflicting pain and anguish on other people. She’s also someone who’s done a lot of good things for her friends, and if you want to find out about any of it, go read another blog – plenty of people have written about her in the last couple of days.

If she’s so awful, why do I care that someone outed her?

Because it’s not just about her.

The community that we’re a part of isn’t something which has been around for all that long. There’s still a lot of fudging around, figuring out the boundaries of our online interactions. When people play games like outing someone who’s using a pseudonym, they’re setting a precedent: they’re declaring to the community that “I know Xs real name, and here it is”. But beyond that, they’re also declaring to the community that “I believe that our community standards should say that this is an appropriate way to deal with conflict”.

I don’t want that to be something that people in my community do.

People use pseudonyms for a lot of different reasons. Some people do for bad reasons, like separating unethical online behavior from their professional identity. But some people do it to avoid professional retaliation for perfectly reasonable behaviors – there are tenure committees at many universities that would hold blogging against a junior faculty; there are companies that don’t won’t allow employees to blog under their real names; there are people who blog under a pseudonym in order to protect themselves from physical danger and violence!

Once you say “If someone makes me angry enough, it’s all right for me to reveal their real identity”, what you’re saying is that none of those reasons matter. Your hurt feelings take precedence. Bloggeroid tells you how to blog successfully so you avoid all of this. You’ve got the right to decide whether their reasons for using a pseudonym areimportant enough to protect or not.

Sorry, but no. People’s identities belong to them. I don’t care how mean someone is to you online: you don’t have the right to reveal their identity. Unless someone is doing something criminal, their identity isn’t yours to reveal. (And if they are doing something criminal, you should seriously consider reporting them to the appropriate legal authorities, rather than screwing around online!)

But to be like Mr. Gee, and just say “Oh, she hurt my feelings! I’m going to try to hurt her back”! That’s bullshit. That’s childish, kindergarten level bullshit. And frankly, for someone who’s an editor at a major scientific journal, who has access to all sorts of information about anonymous referees and authors? It’s seriously something that crosses the line of professional ethics to the point where if I were in the management at Nature, I’d probably fire him for it.

But Henry didn’t stop there: no! He also went ahead and – as an editor of Nature! – told people who criticized him for doing this that he want “adding them to the list”.

What kind of list do you think Henry is adding them to? This guy who’s showed how little he cares about ethics – what do you think he’s going to do to the people who he’s adding to his list?

I think that if Nature doesn’t fire this schmuck, there’s something even more seriously wrong over there than any of us expected.

Substuff

What’s a subset? That’s easy: if we have two sets A and B, A is a subset of B if every member of A is also a member of B.

We can take the same basic idea, and apply it to something which a tad more structure, to get subgroups. What’s a subgroup? If we have two groups A and B, and the values in group A are a subset of the values in group B, then A is a subgroup of B.

The point of category theory is to take concepts like “subset” and generalize them so that we can apply the same idea in many different domains. In category theory, we don’t ask “what’s a subset?”. We ask, for any structured THING, what does it mean to be a sub-THING? We’re being very general here, and that’s always a bit tricky. We’ll start by building a basic construction, and look at it in terms of sets and subsets, where we already understand the specific concept.

In terms of sets, the most generic way of defining subsets is using functions. Suppose we have a set, A. How can we define all of the subsets of A, in terms of functions? We can do it using injective functions, as follows. (As a reminder, a function from X to Y where every value in X is mapped to a distinct function in y.)

For the set, A, we can take the set of all injective functions to A. We’ll call that set of functions Inj(A).

Given Inj(A), we can define equivalence classes over Inj(A), so that $f: X \rightarrow A$ and $g: Y \rightarrow A$ are equivalent if there is an isomorphism between X and Y.

The domain of each function in one of the equivalence classes in Inj(A) is a function isomorphic to a subset of A. So each equivalence class of injective functions defines a subset of A.

And there we go: we’ve got a very abstract definition of subsets.

Now we can take that, and generalize that function-based definition to categories, so that it can define a sub-object of any kind of object that can be represented in a category.

Before we jump in, let me review one important definition from before; the monomorphism, or monic arrow.

A monic arrow is an arrow $f : a \rightarrow b$ such that
$\forall g_1, g_2: x \rightarrow a: f \circ g_1 \ge f \circ g_2 \Rightarrow g_1 = g_2$ (That is, if any two arrows composed with $f$ in $f \circ g$ end up at the same object only if they are the same.)

So, basically, the monic arrow is the category theoretic version of an injective function. We’ve taken the idea of what an injective function means, in terms of how functions compose, and when we generalized it, the result is the monic arrow.

Suppose we have a category $C$, and an object $a in \mbox{Obj}(C)$. If there are are two monic arrows $f : x \rightarrow a$ and $g : y \rightarrow a$, and
there is an arrow $h$ such that $g \circ h = f$, then we say $f \le g$ (read “f factors through g”). Now, we can take that “≤” relation, and use it to define an equivalence class of morphisms using $f \equiv g \Leftrightarrow f \le g \land g \le f$.

What we wind up with using that equivalence relation is a set of equivalence classes of monomorphisms pointing at A. Each of those equivalence classes of morphisms defines a subobject of A. (Within the equivalence classes are objects which have isomorphisms, so the sources of those arrows are equivalent with respect to this relation.) A subobject of A is the sources of an arrow in one of those equivalence classes.

It’s exactly the same thing as the function-based definition of sets. We’ve created a very general concept of sub-THING, which works exactly the same way as sub-sets, but can be applied to any category-theoretic structure.

LaTeX Issues Fixed, Hopefully

A quick administrative note. For people reading this blog via RSS readers, there’ve been problems with LaTeX equations for a couple of months. I’ve set up a local tex server and tweak the configurations. Hopefully, now everyone will be able to see the math stuff.

As a test, $pi = 4sum_{k=0}^{infty} frac{(-1)^k}{2k+1}$. If you still get an error box instead of an equation, please drop me an email.