# The Fedex Problem

Over the weekend on twitter, someone tweeted a link to a really fun paper looking paper on arXiv, called The Fedex Problem. It’s a really interesting paper, because it takes a real-world problem, analyzes mathematically, and shows some of the fascinating complexity that lies under the surface of something that seems simple. (I can’t find the original tweet, so I don’t know who first shared it – sorry!

The problem that they looked at involves the logistics of rapid delivery by services like Federal Expression.

When Fedex was founded, they built their overnight delivery business on a new model of how to handle deliveries, called a hub system.

What most delivery businesses traditionally did was take packages in at a local warehouse, and sort them into groups depending on the warehouse that serviced their destination. Each group was then shipped separately, from the source warehouse to the destination warehouse. To do overnight delivery, each warehouse, then, would need to have a full package-sorting system, and they would need to to send a shipment to every other warehouse every single day. For $N$ warehouses, that means that they needed $N$ full sorting systems, and $N^2$ shipments every day. The costs of the redundant shipping systems and the massive number of shipments are huge, and completely non-scalable. Imagine a system with just 100 local warehouses – that’s only 2 for each state! – you’d need to have 10,000 shipments per day. In practice, the number of fedex warehouses just in the US is considerably larger than just 100: it simply can’t work.

What Fedex chose to do instead was build a hub system. Every package that arrives at a local warehouse gets shipped to the same place, called the hub. The only sorting system is at the hub – individual local warehouses don’t sort, they just ship to the hub. The hub sorts according to destination warehouse, and then ships the packages to the appropriate warehouse for delivery.

The Fedex hub system solved both of the problems of the traditional approach. The individual local shipping points don’t need to do sorting: they just ship to the hub, which takes care of all of it. And since every package goes to the hub, instead of needing $N^2$ shipments, they only needed $2N$ shipments per day: $N$ shipments of unsorted packages from local warehouses to the hub, and $N$ shipments of sorted packages from the hub to the local warehouses for local deliveries.

How well this system works is very dependent on the proper location of the hub. To demonstrate why, consider the worst possible hub location for American shipments: Hawaii. Now, every package from anywhere in the mainland to anywhere else in the mainland needs to travel at least an extra 2500 miles to the hub, and an extra 2500 miles back! The cost of shipping it dominated by fuel costs – adding any extra distance to the average shipment costs fuel, and thus money. So the location of hub has a dramatic effect on the shipment costs.

FedEx located their hub in Memphis, Tennessee, because according to its founders, Memphis was close to the “center” of the shipping region for its original target market cities, and thus minimized the total shipping distances. (That wasn’t their sole concern; another factor was that Memphis is in an area that is rarely incapacitated by weather or other natural disasters – but the travel distance was the biggest factor.)

What this paper does is take the Fedex problem, and generalize it. What, exactly, does it mean for a given point to be an ideal hub? Given a network of points in a Euclidean space, how can you compute the ideal hub? And finally, if we look at the actual Federal Express distribution system, using great-circle routing between points, how close is Memphis to being the ideal hub?

If you look at the problem naively, it seems like it should be simple. The ideal hub should just be the “average” location. That’s basically what the FedEx founders said – they picked the geographical center of their original markets. Unfortunately, reality is rarely kind enough to make things fit our intuitions, and that’s not quite right.

To see why, let’s first look at the definition of the ideal hub. What we want to do is to minimize the total distance from local shipping points to the hub. To keep the notation simple, we’ll write the problem down as if we were looking at the one-dimensional version of the problem. Each location is a single value, and we’ll write the distance between two points $a$ and $b$ as the absolute value of $a-b$: $\left|a-b\right|$.

If the set of $i$ points in the network are called $x_1 \dots x_i$, then the total distance value for a hub at $x$ is given by the function:

$f(x) = \Sigma_i 2\left| x - x_i \right|$

Since the 2 is a constant factor that affects all x-values easily, for the purposes of simplicity, we can drop it out. So the problem of selecting an ideal hub ultimately reduces to an optimization problem: find the value $x$ where $f(x)$ is a minimum.

Again, naively, it seems as though the average is best. But we can see that it’s not with a simple example. Let’s just look at a one-dimensional case – a list of locations scattered along a line: {1, 2, 3, 5, 13, 14, 60}. The point that’s at the average is 14, with a total one-way travel distance of 14 + 12 + 11 + 9 + 0 + 1 + 46 = 93. But by doing some brute-force, we can see that the total distance for $x=5$ is just 4 + 3 + 2 + 0 + 8 + 9 + 55 = 81. So the mean isn’t the optimal hub location. In fact, you can prove that in the one-dimensional case, the optimal hub is the median location, not the mean.

In this example, the average isn’t far off – but even in a quick off-the-cuff example, you can see that average isn’t the answer: the mean can be quite different from the median, particularly when you’ve got a lot of clustering, or a couple of outliers. When you extend to more than one dimension, then things can get bad pretty quickly.

The general problem has been looked at in many forms. It’s known, variously, as the Fermat-Torricelli problem, the Weber problem, or the single facility location problem.

Unfortunately, getting an answer to the problem is a lot harder than just naming it. According to the authors of this paper, even the fact of the existence of a unique ideal hub isn’t trivial. Despite all of the people who’ve worked on the problem, they couldn’t find a single complete proof of it! They go ahead and provide one. They prove that:

For any finite set of non-collinear points in any euclidean space $R^N$,
there is an idea hub, located within the convex hull of the set of points.

The convex hull qualifier there deserves a bit of explanation. It’s an interesting idea that ends up coming up in a bunch of different mathematical contexts. The easiest way to explain it is by the easiest method of finding the convex hull in two dimensions: suppose you’ve got a bunch of points on a plane. Draw the plane on graph paper, and put a nail at the location of each point. Take a rubber band, and stretch it so that every nail on the graph is inside of the rubber band. Let go. The rubber band will end up touching some of the nails; some will be inside, but untouched. The rubber band is the convex hull of the points. It’s basically a general notion of the smallest blob that contains all of the points in the set. The authors present a short, clean proof that the ideal hub location will always be within the “blob” of the points.

The paper goes into some depth in discussing some simple cases of the ideal hub, and how they can be calculated geometrically. But for the general problem, the geometric approaches don’t work. To get the ideal hub for a large collection of points, they’re left with brute force. In the brute force method, you just take every point $x$ in the network, and compute its $f(x)$ value. Since it’s a finite network, you’ll eventually find the ideal hub. (You can, of course, apply
some nice heuristics to simplify the problem: the set of candidates that might be optimal hubs is going to be a lot smaller than the set of all ponits in the network!) But still, computing the optimal hub is quite complicated – $O(N^2)$ in the number of points in the network.

After all of that work, where is the ideal hub? How different is it from the hub that Federal Express actually uses?

As it turns out, Fedex isn’t too far off, but it’s definitely non-optimal. Using US census data, combined with great-circle distances between the census tracts, the ideal hub is in central Indiana. FedEx’s hub is 315 miles off, to the south-south-west of the ideal location.

Interestingly, when UPS (probably FedEx’s biggest rival) set up a hub system, their hub is in Indianapolis – just 85 miles away from the ideal hub.

To me, the most interesting thing about the paper comes after they showed the ideal hub by brute force. If you’ve got more than four points – even just five! – in a two-dimensional space, there’s no simple geometric solution to find the ideal hub. It seems like it should be simple, but it isn’t. The proof of that is really fascinatingly simple: it comes down to group theory and symmetry. Read the paper for the details. A geometric solution is limited by the difference in dimensionality between a particular one-dimensional rotational symmetry group, and the number of objects in the network. As a result, for 4 or less non-collinear points, there’s a way of finding a single unique geometric configuration that can be exploited to characterize the ideal hub. For more than 4 points, there isn’t – there’s no single configuration anymore, which blows away the usefulness of geometric solutions.

If you find this at all interesting, then it’s worth downloading and reading the paper. There’s a whole lot their that I haven’t covered – more than what I have. But I hope this little taste is enough to pique your curiosity.

For some reason, lately I’ve been seeing a bunch of mentions of Banach Tarski. B-T is a fascinating case of both how counter-intuitive math can be, and also how profoundly people can misunderstand things.

For those who aren’t familiar, Banach-Tarski refers to a topological/measure theory paradox. There are several variations on it, all of which are equivalent.

The simplest one is this: Suppose you have a sphere. You can take that sphere, and slice it into a finite number of pieces. Then you can take those pieces, and re-assemble them so that, without any gaps, you now have two spheres of the exact same size as the original.

Alternatively, it can be formulated so that you can take a sphere, slice it into a finite number of pieces, and then re-assemble those pieces into a bigger sphere.

This sure as heck seems wrong. It’s been cited as a reason to reject the axiom of choice, because the proof that you can do this relies on choice. It’s been cited by crackpots like EE Escultura as a reason for rejecting the theory of real numbers. And there are lots of attempts to explain why it works. For example, there’s one here that tries to explain it in terms of density. There’s a very cool visualization of it here, which tries to make sense of it by showing it in the hyperbolic plane. Personally, most of the attempts to explain it intuitively drive me crazy. One one level, intuitively, it doesn’t, and can’t make sense. But on another level, it’s actually pretty simple. No matter how hard you try, you’re never going to make the idea of turning a finite-sized object into a larger finite-sized object make sense. But on another level, once you think about infinite sets – well, it’s no problem.

The thing is, when you think about it carefully, it’s not really all that odd. It’s counterintuitive, but it’s not nearly as crazy as it sounds. What you need to remember is that we’re talking about a mathematical sphere – that is, an infinite collection of points in a space with a particular set of topological and measure relations.

Here’s an equivalent thing, which is a bit simpler to think about:

Take a line segment. How many points are in it? It’s infinite. So, from that infinite set, remove an infinite set of points. How many points are left? It’s still infinite. Now you’ve got two infinite sets of the same size. So, now you can use one of the sets to create the original line segment, and you can use the second one to create a second, identical line segment.

Still counterintuitive, but slightly easier.

How about this? Take the set of all natural numbers. Divide it into two sets: the set of even naturals, and the set of odd naturals. Now you have two infinite sets,
the set {0, 2, 4, 6, 8, …}, and the set {1, 3, 5, 7, 9, …}. The size of both of those sets is the ω – which is also the size of the original set you started with.

Now take the set of even numbers, and map it so that for any given value i, f(i) = i/2. Now you’ve got a copy of the set of natural numbers. Take the set of odd naturals, and map them with g(i) = (i-1)/2. Now you’ve got a second copy of the set of natural numbers. So you’ve created two identical copies of the set of natural numbers out of the original set of natural numbers.

The problem with Banach-Tarski is that we tend to think of it less in mathematical terms, and more in concrete terms. It’s often described as something like “You can slice up an orange, and then re-assemble it into two identical oranges“. Or “you can cut a baseball into pieces, and re-assemble it into a basketball.” Those are both obviously ridiculous. But they’re ridiculous because they violate one of our instinct that derives from the conservation of mass. You can’t turn one apple into two apples: there’s only a specific, finite amount of stuff in an apple, and you can’t turn it into two apples that are identical to the original.

But math doesn’t have to follow conservation of mass in that way. A sphere doesn’t have a mass. It’s just an uncountably infinite set of points with a particular collection of topological relationship and geometric relationships.

Going further down that path: Banach-Tarski relies deeply of the axiom of choice. The “pieces” that you cut have non-measurable volume. You’re “cutting” from the collection of points in the sphere in a way that requires you to make an uncountably infinite number of distinct “cuts” to produce each piece. It’s effectively a geometric version of “take every other real number, and put them into separate sets”. On that level, because you can’t actually do anything like that, it’s impossible and ridiculous. But you need to remember: we aren’t talking about apples or baseballs. We’re talking about sets. The “slices” in B-T aren’t something you can cut with a knife – they’re infinitely subdivided, not-contiguous pieces. Nothing in the real world has that property, and no real-world process has the ability to cut like that. But we’re not talking about the real world; we’re talking about abstractions. And on the level of abstractions, it’s no stranger than creating two copies of the set of real numbers.

# Topoi Prerequisites: an Intro to Pre-Sheafs

I’m in the process of changing jobs. As a result of that, I’ve actually got some time between leaving the old, and starting the new. So I’ve been trying to look into Topoi. Topoi are, basically, an alternative formulation of mathematical logic. In most common presentations of logic, set theory is used as the underlying mathematical basis – set theory and a mathematical logic built alongside it provide a complete foundational structure for mathematics.

Topoi is a different approach. Instead of starting with set theory and a logic with set theoretic semantics, Topoi starts with categories. (I’ve done a bunch of writing about categories before: see the archives for my category theory posts.)

Reading about Topoi is rough going. The references I’ve found so far are seriously rough going. So instead of diving right in, I’m going to take a couple of steps back, to some of the foundational material that I think helps make it easier to see where the category theory is coming from. (As a general statement, I find that category theory is fascinating, but it’s so abstract that you really need to do some work to ground it in a way that makes sense. Even then, it’s not easy to grasp, but it’s worth the effort!)

A lot of category theoretic concepts originated in algebraic topology. Topoi follows that – one of its foundational concepts is related to the topological idea of a sheaf. So we’re going to start by looking at what a sheaf is.

# Topological Spaces and Continuity

In the last topology post, I introduced the idea of a metric space, and then used it to define open and closed sets in the space.

Today I’m going to explain what a topological space is, and what continuity means in topology.

A topological space is a set $X$ and a collection $T$ of subsets of $X$, where the following conditions hold:

1. $\emptyset \in T \land X \in T$:both the empty set and the entire set $T$ are in the set of subsets, $T$. $X$ is going to be the thing that defines the structure of the topological space.
2. $\forall C \in {\bf 2}^T: \bigcup_{c \in C} \in T$: the union of collection of subsets of $T$ is also a member of $T$.
3. $\forall s,t \in T: s \cap t \in T$: the intersection of any two elements of $T$ is also a member of $T$.

The collection $T$ is called a topology on $X$. The members of $T$ are the open sets of the topology. The closed sets are the set complements of the members of $T$. Finally, the elements of the topological space $X$ are called points.

The connection to metric spaces should be pretty obvious. The way we built up open and closed sets over a metric space can be used to produce topologies. The properties we worked out for the open and closed sets are exactly the properties that are required of the open and closed sets of the topology.

The idea of the topology $X$ is that it defines the structure of X. We say collection when we talk about it, because it’s not a proper set: a topology can be (and frequently is) considerably larger than what’s allowable for a set.

What it does is define the notion of nearness for the points of a set. Take three points in the set $X$: $a$, $b$, and $c$. X contains a series of open sets around each of $a$, $b$, and $c$. At least conceptually, there’s a smallest open set containing each of them. Given the smallest open set around $a$, there is a larger open set around it, and a larger open set around it. On and on, ever larger. Closeness in a topological space gets its meaning from those open sets. Take that set of increasingly large open sets around $a$. If you get to an open set around $a$ that contains $b$ before you get to one that contains $c$, then $b$ is closer to $a$ than $c$ is.

There are many ways to build a topology other than starting with a metric space, but that’s definitely the easiest way. One of the most important ideas in topology is the notion of continuity. In some sense, it’s the fundamental abstraction of topology. Now that we know what a topological space is, we can define what continuity means.

A function from topological space $T$ to topological space $U$ is continuous if and only if for every open set $C subseteq U$, the inverse image of $f$ on $C$ is an open set.

Of course that makes no sense unless you know what the heck an inverse image is. If C is a set of points, then the image $f(C)$ is the set of points ${ f(x) | x \in C }$. The inverse image of $f$ on $C$ is the set of points ${ x | f(x) \in C}$.

Even with the definition, it’s a bit hard to visualize what that really means. But basically, if you’ve got an open set in $U$, what this says is that anything that maps to that open set must also have been an open set. You can’t get an open set in $U$ using a continuous function from $T$ unless what you started with was an open set. What that’s really capturing is that there are no gaps in the function. If there were a gap, then the open spaces would no longer be open.

Think of the metric spaces idea of open sets. Imagine an open set with a cube cut out of the middle. It’s definitely not continuous. If you took a function on that open set, and its inverse image was the set with the cube cut out, then the function is not smoothly mapping from the open set to the other topological space. It’s mapping part of the open set, leaving a big ugly gap.

If you read my old posts on category theory, here’s something nifty.

The set of of topological spaces and continuous functions form a category, with the spaces as objects and continuous functions as arrows. We call this category Top.

Aside from the interesting abstract connection, when you look at algebraic topology, it’s often easiest to talk about topological spaces using the constructs of category theory.

For example, one of the most fundamental ideas in topology is homeomorphism: a homeomorphism is a bicontinuous bijection (a bicontinuous function is a continuous function with a continuous inverse; a bijection is a bidirectional total function between sets.)

In terms of the category ${\bf Top}$, a homeomorphism between topological spaces is a homomorphism between objects in Top. That much alone is pretty nice: if you’ve gotten the basics of category theory, it’s a whole lot easier to understand that a homeomorphism is an homo-arrow in ${\bf Top}$.

But there’s more: from the perspective of topology, any two topological spaces with a homeomorphism between them are identical. And – if you go and look at the category-theoretic definition of equality? It’s exactly the same: so if you know category theory, you get to understand topological equality for free!

# Metric Spaces

One of the things that topologists like to say is that a topological set is just a set with some structure. That structure is, basically, the nearness relation – a relation that allows us to talk about what points are near other points.

So to talk about topology, you need to be able to talk about nearness. The way that we do that in topology is through a fundamental concept called an open sphere. An open sphere defines the set of all points that are close to a particular point according to some metric. That’s not the only way of defining it; there are various other ways of explaining it, but I find the idea of using a metric to be the easiest one to understand.

Of course, there’s a catch. (There’s always a catch, isn’t there?) The catch is, we need to define just what we mean by “according to some metric”.Fundamentally, we need to understand just what we mean by distance. Remember – we’re starting with a completely pure set of points. Any structure like a plane, or a sphere, or anything like that will be defined in term of our open spheres – which, in turn, will be defined by the distance metric. So we can’t use any of that to define distance.

## Defining Distance

So. Suppose we’ve got a totally arbitrary set of points, $S$, consisting of elements $s_1, s_2, s_3, s_4, ..., s_n, ...$. What’s the distance between $s_i$ and $s_j$?

Let’s start by thinking about a simple number line with the set of real numbers. What’s the distance between two numbers on the number line? It’s a measure of how far over the number line you have to go to get from one point to the other. But that’s cheating: how far you have to go is really just a re-arrangement of the words; it’s defining distance in terms of distance.

But now, suppose that you’ve got your real number line, and you’ve got a ruler. Then you can measure distances over the number line. The ruler defines what distances are. It’s something in addition to the set of pointsthat allows you to define distance.

So what we really want to do is to define an abstract ruler. In pure mathematical terms, that ruler is just a function that takes two elements, $s_i$ and $s_j$, and returns a real number. That real number is the distance between those two points.

To be a metric, a distance function $d$ needs to have four fundamental properties:

Non-Negativity
$\forall s_i, s_j \in S: d(s_i, s_j) \geq 0$: distance is never negative.
Identity
$\forall s_i, s_j \in S: d(s_i, s_j) = 0 \iff i=j$; that is,
the distance from a point to itself is 0; and no two distinct points are separated by a 0 distance.
Symmetry
$\forall s_i, s_j \in S: d(s_i, s_j) = d(s_j, s_i)$. It doesn’t matter which way you measure: the distance between two points is always the same.
Triangle Inequality
$\forall s_i, s_j, s_k \in S: d(s_i, s_j) + d(s_j, s_k) \geq d(s_i, s_k)$.

A metric space is a pair $(S, d)$ of a set, and a metric over the set.

For example:

1. The real numbers are a metric space with the ruler-metric function. You can easily verify that properties of a metric function all work with the ruler-metric. In fact, they are are all things that you can easily check with a ruler and a number-line, to see that they work. The function that you’re creating with the ruler is: $d(x,y) = |x-y|$ (the absolute value of $x - y$). So the ruler-metric distance from 1 to 3 is 2.
2. A cartesian plane is a metric space whose distance function is the euclidean distance: $d((a_x,ay_), (b_x,b_y)) = ((a_x-b_x)^2 + (a_y-b_y)^2 )^{frac{1}{2}}$. In fact, for every $n$, the euclidean n-space is a metric space using the euclidean distance.
3. A checkerboard is a metric space if you use the number of kings moves as the distance function.
4. The Manhattan street grid is a metric space where the distance function between two intersections is the sum of the number of horizontal blocks and the number of vertical blocks between them.

With that, we can define the open spheres.

## Open and Closed Sets

You can start moving from metric spaces to topological spaces by looking at open sets. Take a metric space, $(S,d)$, and a point $p \in S$. An open sphere $B(p,r)$ (a ball of radius r around point p) in $S$ is the set of points $x$ such that d(p,x) < r.

Now, think of a subset $T \subseteq S$. A point $p \in S$ is in the interior of $T$ if/f there is some point $r$ where $B(p,r) \in T$. $T$ is an open subset of $S$ if every element of $T$ is in its interior. A subset of space formed by an open ball is always an open subset. An open subset of a metric space $S$ is also called an open space in $S$.

Here’s where we can start to do some interesting things, that foreshadow what we’ll do with topological spaces. If you have two open spaces $T$ and $U$ in a metric space $S$, then $T cup U$ is an open space in $S$. So if you have open spaces, you can glue them together to form other open spaces.

In fact, in a metric space $S$, every open space is the union of a collection of open spheres in $S$.

In addition to the simple gluing, we can also prove that every intersection of two open subsets is open. In fact, the intersection of any finite set of open subsets form an open subset. So we can assemble open spaces with all sorts of bizarre shapes by gluing together collections of open balls, and then taking intersections between the shapes we’ve built.

So now, think about a subspace $T$ of a metric space $S$. We can say that a point $p$ is adherent to $T$ if $\forall r < 0; B(p, r) \cap T \neq \emptyset$. The closure of $T$, written $\overline{T}$ is the set of all points adherent to $T$.

A subset $T$ of $S$ is called a closed subset if and only if $T=\overline{T}$. Intuitively, $T$ is closed if it contains the surface that forms its boundary. So in 3-space, a solid sphere is a closed space. The contents of the sphere (think of the shape formed by the air in a spherical balloon) is not a closed space; it’s bounded by a surface, but that surface is not part of the space.

# An Introduction to Topology

When I took a poll of topics that people wanted my to write about, an awful lot of you asked me to write about topology. I did that before – right after I moved my blog to ScienceBlogs. But it’s been a while. So I’m going to go back to those old posts, do some editing and polishing, correct some errors, and repost them. Along the way, I’ll add a few new posts.

I’ve said before that the way that I view math is that it’s fundamentally about abstraction. Math is taking complex ideas, breaking down to simple concepts, and then exploring what those concepts really mean, and exactly what you can build using them.

I argue that topology, at its deepest level, is about continuity and nearness. In a continuous surface, what does in mean for things to be close to one another? What kind of structures can you build using nothing but nearness relationships? What is a structure defined solely by that notion of nearness?

# Chaos

One mathematical topic that I find fascinating, but which I’ve never had a chance to study formally is chaos. I’ve been sort of non-motivated about blog-writing lately due to so many demands on my time, which has left me feeling somewhat guilty towards those of you who follow this blog. So I decided to take this topic about which I know very little, and use the blog as an excuse to learn something about it. That gives you something interesting to read, and it gives me something to motivate me to write.

I’ll start off with a non-mathematical reason for why it interests me. Chaos is a very simple idea with very complex implications. The simplicity of the concept makes it incredibly ripe for idiots like Michael Crichton to believe that he understands it, even though he doesn’t have a clue. There’s an astonishingly huge quantity of totally bogus rubbish out there, where the authors are clueless folks who sincerely believe that their stuff is based on chaos theory – because they’ve heard the basic idea, and believed that they understood it. It’s a wonderful example of my old mantra: the worst math is no math. If you take a simple mathematical concept, and render it into informal non-mathematical words, and then try to reason from the informal stuff, what you get is garbage.

So, speaking mathematically, what is chaos?

To paraphrase something my book-editor recently mentioned: in math, the vast majority of everything is bad. Most functions are non-continuous. Most topologies are non-differentiable. Most numbers are irrational. Most irrational numbers are undescribable. And most complex systems are completely unstable.

Modern math students have, to a great degree, internalized this basic idea. We pretty much expect badness, so the implications of badness don’t surprise us. We’ve grown up mathematically knowing that there are many, many interesting things that we would really like to be able to do, but that can’t be done. That realization, that there are things that we can’t even hope to do, is a huge change from the historical attitudes of mathematicians and scientists – and it’s a very recent one. A hundred years ago, people believed that we could work out simple, predictable, closed-form solutions to all interesting mathematical problems. They expected that it might be very difficult to find a solution; and they expected that it might take a very long time, but they believed that it was always possible.

For one example that has a major influence on the study of chaos: John Von Neumann believed that he could build a nearly perfect weather prediction computer: it was just a matter of collecting enough data, and figuring out the right equations. In fact, he expected to be able to do more than that: he expected to be able to control the weather. He thought that the weather was likely to be a system where there were unstable points, and that by introducing small changes at the unstable points, that weather managers would be able to push the weather in a desired direction.

Of course, Von Neumann knew that you could never gather enough data to perfectly predict the weather. But most systems that people had studied could be approximated. If you could get measurements that were correct to within, say, 0.1%, you could use those measurements to make predictions that would be extremely close to correct – within some multiple of the precision of the basic measurements. Small measurement errors would mean small changes in the results of a prediction. So using reasonably precise but far from exact or complete measurements, you could make very accurate predictions.

For example, people studied the motion of the planets. Using the kinds of measurements that we can make using fairly crude instruments, people have been able to predict solar eclipses with great precision for hundreds of years. With better precision, measuring only the positions of the planets, we can predict all of the eclipses and alignments for the next thousand years – even though the computations will leave out the effects of everything but the 8 main planets and the sun.

Mathematicians largely assumed that most real systems would be similar: once you worked out what was involved, what equations described the system you wanted to study, you could predict that system with arbitrary precision, provided you could collect enough data.

Unfortunately, reality isn’t anywhere near that simple.

Our universe is effectively finite – so many of the places where things break seem like they shouldn’t affect us. There are no irrational numbers in real experience. Nothing that we can observe has a property whose value is an indescribable number. But even simple things break down.

Many complex systems have the property that they’re easy to describe – but where small changes have large effects. That’s the basic idea of chaos theory: that in complex dynamical systems, making a minute change to a measurement will produce huge, dramatic changes after a relatively short period of time.

For example, we compute weather predictions with the Navier-Stokes equations. N-S are a relatively simple set of equations that describe how fluids move and interact. We don’t have a closed-form solution to the N-S equations – meaning that given a particular point in a system, we can’t compute the way fluid will flow around it without also computing separately how fluid will flow around the points close to it, and we can’t compute those without computing the points around them, and so on.

So when we make weather predictions, we create a massive grid of points, and use the N-S equations to predict flows at every single point. Then we use the aggregate of that to make weather predictions. Using this, short-term predictions are generally pretty good towards the center of the grid.

But if you try to extend the predictions out in time, what you find is that they become unstable. Make a tiny, tiny change – alter the flow at one point by 1% – and suddenly, the prediction for the weather a week later is dramatically different. A difference of one percent in one out of a million cells can, over the space of a month, be responsible for the difference between a beautiful calm summer day and a cat-5 hurricane.

That basic bit is called sensitivity to initial conditions is a big part of what defines chaos – but it’s only a part. And that’s where the crackpots go wrong. Just understanding the sensitivity and divergence isn’t enough to define chaos – but to people like Crichton, understanding that piece is understanding the whole thing.

To really get the full picture, you need to dive in to topology. Chaotic systems have a property called topological mixing. Topological mixing is an idea which isn’t too complex informally, but which can take a lot of effort to describe and explain formally. The basic idea of it is that no matter where you start in a space, given enough time, you can wind up anywhere at all.

To get that notion formally, you need to look at the phase space of the system. You can define a dynamical system using a topological space called the phase space of the system. Speaking very loosely, the phase space P of a system is a topological space of points where each point p∈P corresponds to one possible state of the system, and the topological neighborhoods of p are the system states reachable on some path from p.

So – image that you have a neighborhood of points, G in a phase space. From each point in G, you traverse all possible forward paths through the phase space. At any given moment t, G will have evolved to form a new neighborhood of points, Gt. For the phase space to be chaotic, it has to have the property that for any arbitrary pair of neighborhoods G and H in the space, no matter how small they are, no matter how far apart they are, there will be a time t such that Gt and Ht will overlap.

But sensitivity to initial conditions and topological mixing together still aren’t sufficient to define chaos.

Chaotic systems must also have a property called dense periodic orbits. What that means is that Chaotic systems are approximately cyclic in a particular way. The phase space has the property that if the system passes through a point p in the neighborhood P, then after some finite period of time, the system will pass through another point in P. That’s not to say that it will repeat exactly: if it did, then you would have a repeating system, which would not be chaotic! But it will come arbitrarily close to repeating. And that almost-repetition has to have a specific property: the union of the set of all of those almost-cyclic paths must be equivalent to the entire phase-space itself. (We say, again speaking very loosely, that the set of almost-cycles is dense in the phase space.)

That’s complicated stuff. Don’t worry if you don’t understand it yet. It’ll take a lot of posts to even get close to making that comprehensible. But that’s what chaos really is: a dynamical system with all three properties: sensitivity to initial conditions, overlaps in neighborhood evolution, and dense periodic orbits.

In subsequent posts, I’ll spend some more time talking about each of the three key properties, and showing you examples of interesting chaotic systems.

# Understanding Non-Euclidean Hyperbolic Spaces – With Yarn!

One of my fellow ScienceBloggers, Andrew Bleiman from Zooilogix, sent me an amusing link. If you’ve done things like study topology, then you’ll know about non-euclidean spaces. Non-euclidean spaces are often very strange, and with the exception of a few simple cases (like the surface of a sphere), getting a handle on just what a non-euclidean space looks like can be extremely difficult.

One of the simple to define but hard to understand examples is called a hyperbolic space. The simplest definition of a hyperbolic space is a space
where if you take open spheres of increasing radius around a point, the amount of space in those open spheres increases exponentially.

If you think of a sheet of paper, if you take a point, and you draw progressively larger circles around the point, the size of the circles increases
with the square of the radius: for a circle with radius R, the amount of space inside the circle is proportional to R2. If you did it in three dimensions, the amount of space in the spheres would be proportional to R3. But it’s always a fixed exponent.

In a hyperbolic space, you’ve got a constant N, which defines the “dimensionality” of the space – and the open spheres around it enclose a
quantity of space proportional to NR. The larger the open circle around
a point, the higher the exponent.

What Andrew sent me is a link about how you can create models of hyperbolic
spaces using simple crochet.
And then you can get a sense of just how a hyperbolic space works by playing with the thing you crocheted!

It’s absolutely brilliant. Once you see it, it’s totally obvious
that this is a great model of a hyperbolic space, and just about anyone
can make it, and then experiment with it to get an actual tactile sense
of how it works!

It just happens that right near where I live, there’s a great yarn shop whose owners my wife and I have become friends with. So if you’re interested in trying this out, you should go to their shop, Flying Fingers, and buy yourself some yarn and crochet hooks, and crochet yourself some hyperbolic surfaces! And tell Elise and Kevin that I sent you!

# Geometric L-systems

As I alluded to yesterday, there’s an analogue of L-systems for things more complicated than curves. In fact, there are a variety of them. I’m going to show you one simple example, called a geometric L-system, which is useful for generating a certain kind of iterated function fractal; other variants work in a similar way.

# Wonderful Mobius Transformation Video

Via The Art of Problem-Solving, a great video on Mobius transformations. I never really got how the inversion transformation fit in with the others before seeing this!