Dried shrimp are, in my opinion, a very undervalued and underused ingredient. They’re very traditional in a lot of real Chinese cooking, and they give things a really nice taste. They’ve also got an interesting, pleasant chewy texture. So when there was a dried shrimp dish on the menu, I wanted it. (The restaurant also had dan dan noodles, which are a favorite of my wife, but she was kind, and let me indulge.)

The dish was absolutely phenomenal. So naturally I wanted to figure out how to make it at home. I finally got around to doing it tonight, and I got *really* lucky: everything worked out perfectly, and it turned out almost exactly like the restaurant. My wife picked the noodles at the chinese grocery that looked closest, and they were exactly right. I guessed at the ingredients from the flavors, and somehow managed to get them spot on on the first try.

That kind of thing almost never happens! It always takes a few tries to nail down a recipe. But this one just turned out the first try!

So what’s the dish like? It’s very Chinese, and very different from what most Americans would expect. If you’ve had a mazeman ramen before, I’d say that’s the closest thing to it. It’s a light, warm, lightly dressed noodle dish. The sauce is *very* strong if you taste it on its own, but when it’s dressed onto hot noodles, it mellows quite a bit. The dried shrimp are briney and shrimpey, but not overly fishy. All I can say is, try it!

There are two parts to the sauce: a soy mixture, and a scallion oil. The scallion oil should be made a day in advance, and allowed to stand overnight. So we’ll start with that.

- one large bunch scallions
- 1 1/2 cups canola oil
- two slices crushed ginger
- generous pinch salt

- Coarsely chop the scallions – whites and greens.
- Put the scallions, ginger, and salt into a food processor, and pulse until they’re well chopped.
- Add the oil, and let the processor run on high for about a minute. You should end up with a thick pasty pale green goo.
- Put it in the refrigerator, and let it sit overnight.
- The next day, push through a sieve, to separate the oil from the scallion pulp. Discard the scallions. You should be left with an amazing smelling translucent green oil.

Next, the noodles and sauce.

- Noodles. We used a kind of noodle called guan miao noodle. If you can’t find that,

then white/wheat soba or ramen would be a good substitute. - 1/2 cup soy sauce
- 2 tablespoons sugar
- 1 cup chicken stock
- 2 slices ginger
- one clove garlic
- 2 tablespoons dried shrimp

- Cover the dried shrimp with cold water in a bowl, and let sit for 1/2 hour.
- Put the dried shrimp, soy sauce, sugar, chicken stock, ginger, and garlic into a saucepan, and simmer on low heat for five minutes. Then remove the garlic and ginger.
- For each portion, take about 2 tablespoons of the soy, and two tablespoons of the scallion oil, and whisk together to form something like a vinaigrette.
- Cook the noodles according to the package. (For the guan miao noodles, they boiled in
*unsalted*water for 3 minutes.) - Toss with the soy/oil mixture.
- Serve the dressed noodles into bowls, and put a few of the simmered dried shrimp on top.
- Drizzle another teaspoon each of the scallion oil and soy sauce over each serving.
- Scatter a few fresh scallions on top.

And eat!

]]>I’ve written about Cantor crankery so many times. In fact, it’s one of the largest categories in this blog’s index! I’m pretty sure that I’ve covered pretty much every anti-Cantor argument out there. And yet, not a week goes by when another idiot doesn’t pester me with their “new” refutation of Cantor. The “new” argument is always, without exception, a variation on one of the same old boring ones.

But I haven’t written any Cantor stuff in quite a while, and yet another one landed in my mailbox this morning. So, what the heck. Let’s go down the rabbit-hole once again.

We’ll start with a quick refresher.

The argument that the cranks hate is called Cantor’s diagonalization. Cantor’s diagonalization as argument that according to the axioms of set theory, the cardinality (size) of the set of real numbers is strictly larger than the cardinality of the set of natural numbers.

The argument is based on the set theoretic definition of cardinality. In set theory, two sets are the same size if and only if there exists a one-to-one mapping between the two sets. If you try to create a mapping between set A and set B, and in every possible mapping, every A is mapped onto a unique B, but there are leftover Bs that no element of A maps to, then the cardinality of B is larger than the cardinality of A.

When you’re talking about finite sets, this is really easy to follow. If I is the set {1, 2, 3}, and B is the set {4, 5, 6, 7}, then it’s pretty obvious that there’s no one to one mapping from A to B: there are more elements in B than there are in A. You can easily show this by enumerating every possible mapping of elements of A onto elements of B, and then showing that in every one, there’s an element of B that isn’t mapped to by an element of A.

With infinite sets, it gets complicated. Intuitively, you’d think that the set of even natural numbers is smaller than the set of all natural numbers: after all, the set of evens is a strict subset of the naturals. But your intuition is wrong: there’s a very easy one to one mapping from the naturals to the evens: {n → 2n }. So the set of even natural numbers is the same size as the set of all natural numbers.

To show that one infinite set has a larger cardinality than another infinite set, you need to do something slightly tricky. You need to show that *no matter what mapping you choose* between the two sets, some elements of the larger one will be left out.

In the classic Cantor argument, what he does is show you how, given *any* purported mapping between the natural numbers and the real numbers, to find a real number which is not included in the mapping. So no matter what mapping you choose, Cantor will show you how to find real numbers that aren’t in the mapping. That means that every possible mapping between the naturals and the reals will omit members of the reals – which means that the set of real numbers has a larger cardinality than the set of naturals.

Cantor’s argument has stood since it was first presented in 1891, despite the best efforts of people to refute it. It is an uncomfortable argument. It violates our intuitions in a deep way. Infinity is *infinity*. There’s nothing bigger than infinity. What does it even *mean* to be bigger than infinity? That’s a non-sequiter, isn’t it?

What it means to be bigger than infinity is exactly what I described above. It means that if you have a two infinitely large sets of objects, and there’s no possible way to map from one to the other without missing elements, then one is bigger than the other.

There are legitimate ways to dispute Cantor. The simplest one is to reject set theory. The diagonalization is an implication of the basic axioms of set theory. If you reject set theory as a basis, and start from some other foundational axioms, you can construct a version of mathematics where Cantor’s proof doesn’t work. But if you do that, you lose a lot of other things.

You can also argue that “cardinality” is a bad abstraction. That is, that the definition of cardinality as size is meaningless. Again, you lose a lot of other things.

If you accept the axioms of set theory, and you don’t dispute the definition of cardinality, then you’re pretty much stuck.

Ok, background out of the way. Let’s look at today’s crackpot. (I’ve reformatted his text somewhat; he sent this to me as plain-text email, which looks awful in my wordpress display theme, so I’ve rendered it into formatted HTML. Any errors introduced are, of course, my fault, and I’ll correct them if and when they’re pointed out to me.)

We have been told that it is not possible to put the natural numbers into a one to one with the real numbers. Well, this is not true. And the argument, to show this, is so simple that I am absolutely surprised that this argument does not appear on the internet.

We accept that the set of real numbers is unlistable, so to place them into a one to one with the natural numbers we will need to make the natural numbers unlistable as well. We can do this by mirroring them to the real numbers.

Given any real number (between 0 and 1) it is possible to extract a natural number of any length that you want from that real number.

Ex: From π-3 we can extract the natural numbers 1, 14, 141, 1415, 14159 etc…

We can form a set that associates the extracted number with the real number that it was extracted from.

Ex: 1 → 0.14159265…

Then we can take another real number (in any arbitrary order) and extract a natural number from it that is not in our set.

Ex: 1 → 0.14159266… since 1 is already in our set we must extract the next natural number 14.

Since 14 is not in our set we can add the pair 14 → 0.1415926l6… to our set.

We can do the same thing with some other real number 0.14159267… since 1 and 14 is already in our set we will need to extract a 3 digit natural number, 141, and place it in our set. And so on.

So our set would look something like this…

A) 1 → 0.14159265… B) 14 → 0.14159266… C) 141 → 0.14159267… D) 1410 → 0.141 E) 14101 → 0.141013456789… F) 5 → 0.567895… G) 55 → 0.5567891… H) 555 → 0.555067891… … … Since the real numbers are infinite in length (some terminate in an infinite string of zero’s) then we can always extract a natural number that is not in the set of pairs since all the natural numbers in the set of pairs are finite in length. Even if we mutate the diagonal of the real numbers, we will get a real number not on the list of real numbers, but we can still find a natural number, that is not on the list as well, to correspond with that real number.

Therefore it is not possible for the set of real numbers to have a larger cardinality than the set of natural numbers!

This is a somewhat clever variation on a standard argument.

Over and over and over again, we see arguments based on finite prefixes of real numbers. The problem with them is that they’re based on *finite prefixes*. The set of all finite prefixes of the real numbers is that there’s an obvious correspondence between the natural numbers and the finite prefixes – but that still doesn’t mean that there are no real numbers that aren’t in the list.

In this argument, every finite *prefix* of π corresponds to a natural number. But π itself *does not*. In fact, every real number that actually *requires* an infinite number of digits has no corresponding natural number.

This piece of it is, essentially, the same thing as John Gabriel’s crankery.

But there’s a subtler and deeper problem. This “refutation” of Cantor contains the conclusion as an implicit premise. That is, it’s actually using the *assumption* that there’s a one-to-one mapping between the naturals and the reals to prove the conclusion that there’s a one-to-one mapping between the naturals and the reals.

If you look at his procedure for generating the mapping, it requires an enumeration of the real numbers. You need to take successive reals, and for each one in the sequence, you produce a mapping from a natural number to that real. If you can’t enumerate the real numbers as a list, the procedure doesn’t work.

If you can produce a sequence of the real numbers, then you don’t need this procedure: you’ve already got your mapping. 0 to the first real, 1 to the second real, 2 to the third read, 3 to the fourth real, and so on.

So, once again: sorry Charlie: your argument doesn’t work. There’s no Fields medal for you today.

One final note. Like so many other bits of bad math, this is a classic example of what happens when you try to do math with prose. There’s a reason that mathematicians have developed formal notations, formal language, detailed logical inference, and meticulous citation. It’s because in prose, it’s easy to be sloppy. You can accidentally introduce an implicit premise without meaning to. If you need to carefully state every premise, and cite evidence of its truth, it’s a whole lot harder to make this kind of mistake.

That’s the real problem with this whole argument. It’s built on the fact that the premise “you can enumerate the real numbers” is built in. If you wrote it in careful formal mathematics, you wouldn’t be able to get away with that.

]]>Barry Arrington, June 10, 2016 at 2:45 pm

daveS:

“That 2 + 3 = 5 is true by definition can be verified in a purely mechanical, absolutely certain way.”This may be counter intuitive to you dave, but your statement is false. There is no way to verify that statement. It is either accepted as self-evidently true, or not. Think about it. What more basic steps of reasoning would you employ to verify the equation? That’s right; there are none. You can say the same thing in different ways such as || + ||| = ||||| or “a set with a cardinality of two added to a set with cardinality of three results in a set with a cardinality of five.” But they all amount to the same statement.

That is another feature of a self-evident truth. It does not depend upon (indeed cannot be) “verified” (as you say) by a process of “precept upon precept” reasoning. As WJM has been trying to tell you, a self-evident truth is, by definition, a truth that is accepted because rejection would be upon pain of patent absurdity.

2+3=5 cannot be verified. It is accepted as self-evidently true because any denial would come at the price of affirming an absurdity.

It’s absolutely possible to *verify* the statement “2 + 3 = 5”. It’s also absolutely possible to *prove* that statement. In fact, both of those are more than possible: they’re downright *easy*, provided you accept the standard definitions of arithmetic. And frankly, only a total idiot who has absolutely no concept of what verification or proof *mean* would ever claim otherwise.

We’ll start with verification. What does that mean?

*Verification* is the process of testing a hypothesis to determine if it correctly predicts the outcome. Here’s how you *verify* that 2+3=5:

- Get two pennies, and put them in a pile.
- Get three pennies, and put them in a pile.
- Put the pile of 2 pennies on top of the pile of 3 pennies.
- Count the resulting pile of pennies.
- If there are 5 pennies, then you have
*verified*that 2+3=5.

Verification isn’t perfect. It’s the result of a single test that confirms what you expect. But verification is repeatable: you can repeat that experiment as many times as you want, and you’ll always get the same result: the resulting pile will always have 5 pennies.

*Proof* is something different. Proof is a process of using a formal system to demonstrate that within that formal system, a given statement necessarily follows from a set of premises. If the formal system has a valid model, and you accept the premises, then the proof shows that the conclusion must be true.

In formal terms, a proof operates within a formal system called a *logic*. The logic consists of:

- A collection of rules (called
*syntax rules*or*formation rules*) that define how to construct a valid statement are in the logical language. - A collection of rules (called
*inference rules*) that define how to use true statements to determine other true statements. - A collection of foundational true statements called
*axioms*.

)

Note that “validity”, as mentioned in the syntax rules, is a very different thing from “truth”. Validity means that the statement has the correct structural form. A statement can be valid, and yet be completely meaningless. “The moon is made of green cheese” is a valid sentence, which can easily be rendered in valid logical form, but it’s not true. The classic example of a meaningless statement is “Colorless green ideas sleep furiously”, which is syntactically valid, but utterly meaningless.

Most of the time, when we’re talking about logic and proofs, we’re using a system of logic called first order predicate logic, and a foundational system of axioms called ZFC set theory. Built on those, we define numbers using a collection of definitions called Peano arithmetic.

In Peano arithmetic, we define the natural numbers (that is, the set of non-negative integers) by defining 0 (the *cardinality* of the empty set), and then defining the other natural numbers using the *successor function*. In this system, the number zero can be written as ; one is *(the successor of )*; two is the successor of 1: . And so on.

Using Peano arithmetic, addition is defined recursively:

- For any number , .
- For any number numbers x and y: .

So, using peano arithmetic, here’s how we can prove that :

- In Peano arithemetic form, means .
- From rule 2 of addition, we can infer that is the same as . (In numerical syntax, 2+3 is the same as 1+4.)
- Using rule 2 of addition again, we can infer that (1+4=0+5); and so, by transitivity, that 2+3=0+5.
- Using rule 1 of addition, we can then infer that ; and so, by transitivity, 2+3=5.

You can get around this by pointing out that it’s certainly not a proof from first principles. But I’d argue that if you’re talking about the statement “2+3=5” in the terms of the quoted discussion, that you’re clearly already living in the world of FOPL with some axioms that support peano arithmetic: if you weren’t, then the statement “2+3=5” wouldn’t have any meaning at all. For you to be able to argue that it’s true but unprovable, you must be living in a world in which arithmetic works, and that means that the statement is both verifiable and provable.

If you want to play games and argue about axioms, then I’ll point at the Principia Mathematica. The Principia was an ultimately misguided effort to put mathematics on a perfect, sound foundation. It started with a minimal form of predicate logic and a tiny set of inarguably true axioms, and attempted to derive all of mathematics from nothing but those absolute, unquestionable first principles. It took them a ton of work, but using that foundation, you can derive all of number theory – and that’s what they did. It took them 378 pages of dense logic, but they ultimately build a rock-solid model of the natural numbers, and used that to demonstrate the validity of Peano arithmetic, and then in turn used that to prove, once and for all, that 1+1=2. Using the same proof technique, you can show from first principles, that 2+3=5.

But in a world in which we don’t play semantic games, and we accept the basic principle of Peano arithmetic as a given, it’s a simple proof. It’s a simple proof that can be found in almost any textbook on foundational mathematics or logic. But note how Arrington responds to it: by playing word-games, rephrasing the question in a couple of different ways to show off how much he knows, while completely avoiding the point.

What does it take to conclude that you can’t verify or prove something like 2+3=5? Profound, utter ignorance. Anyone who’s spent any time learning math should know better.

But over at Uncommon Descent? They don’t think they need to actually learn stuff. They don’t need to refer to ungodly things like textbook. They’ve got their God, and that’s all they need to know.

To be clear, I’m not anti-religion. I’m a religious Jew. Uncommon Descent and their rubbish don’t annoy me because I have a grudge against theism. I have a grudge against ignorance. And UD is a huge promoter of arrogant, dishonest ignorance.

]]>I’ve mentioned in the past that I’ve dealt with mental illness. But I’ve never gone in depth about it. There are a lot of reasons for that, but the biggest one is just that it’s really frightening to reveal that kind of personal information.

I’ve had trouble with two related things. I’ve written before about the fact that I’m being treated for depression. What I haven’t talked about is the fact that I’ve got very severe social anxiety.

To me, the depression isn’t such a big deal. I’m *not* saying that depression isn’t serious. I’m not even saying that in my case, my depression wasn’t/isn’t serious. But for me, depression is easily treated. I’m one of the lucky people who respond really well to medication.

Back when I first realized that something was wrong, and I realized that what I was feeling (or, more accurately, what I was *not* feeling) was depression, I went to see a doctor. On my first visit with him, he wrote me a prescription for Zoloft. Two weeks later, I started feeling better; between 5 and 6 weeks after starting to take the medication, I was pretty much recovered from that episode of depression.

I wouldn’t say that I’ve ever totally recovered. There’s always a lingering residue of depression, which I’m constantly struggling against. It doesn’t go away, but it’s manageable. As long as I’m aware of it, it doesn’t have a huge impact on my life.

On the other hand, social anxiety. For me, that’s a *really* big deal. That’s the thing that shapes (and warps) my entire life. And as hard as it is to talk about something like depression, talking about SA is much harder.

As bad as people react to depression, the reaction to social anxiety is worse. Depression is commonly viewed as more weakness than illness. But social anxiety is treated as a joke.

It’s no joke. For those of us who deal with it, it’s a huge source of pain. It’s had a huge effect on my life. But I’ve always been afraid to talk about it. The thing is, I think that things like this are important to talk about. Our society has a huge stigma against mental illness. I really believe that needs to change. And the only way that it will change is when we stop treating it as something to be ashamed of, or something that needs to stay hidden. And that means that I’ve got to be willing to talk about it.

Social anxiety is part of who I am, and I can’t escape that. But I can talk about what it is. And I can, publicly, say to kids who are in the same situation that I was in 30 years ago: Yes, being like this sucks. But despite in, you *can* live a good life. You can find friends who’ll care about you, find a partner who’ll love you, build a successful career, and thrive. Even if your SA never goes away, even if there’s always some pain because of it, it doesn’t have to rule your life. You *can* still be happy.

The first thing I need to do is to explain just what SA is. But I need to be very clear here: like anything else that involves peoples’ inner perceptions, I can only talk about what it’s like *for me*. Different people experience things differently, so what it’s like for me might be totally different from what it’s like for someone else. I don’t mean to in any way cast any shade on anyone else: their feelings and perceptions may be different from mine, but they’re just as valid. This is just *my* experience.

So. What is social anxiety?

It’s very difficult to explain. The best I can do is to say that it’s the absolute knowledge that I’m freak, combined with a terror of what will happen when anyone finds out. I know, on a deep physical level that if people figure out who/what I am, that they’ll hate me – and worse, that they’ll actively turn on me, attack me, harm me.

It’s not true. I know perfectly well that it’s not true. I can feel like this even with my closest friends – people who I know will always support me, who would never do anything to hurt me. But deep down, on a level below conscious thought, I *know* it. It doesn’t matter that intellectually I’m aware that it’s not true, because my physical reaction in social situations is based on what my subconscious knows.

So every time I walk into a room full of people, every time I walk into a store, every time I pick up the phone, every time I walk over to a coworker to ask a question, that’s what I’m feeling. That fear, that need to escape, that certainty that I’m going to mess up, and that when I do, I’m going to be ostracized or worse.

What makes it worse is the fact that the way I behave because of the social anxiety increases the odds that other people will think I’m strange – and when people see me that way, it increases the stress that I feel. When you’re putting a substantial part of your effort and concentration into squashing down the feeling of panic, you’re not paying full attention to the people you’re interacting with. At best, you come off as distant, inattentive, and rude. At worst, you’re seen reacting in odd ways, because you’ve missed some important social cue.

It’s not a small thing. Humans are social creatures. We need contact with other people. We can’t live without it. But my interactions are always colored by this fear. I have to fight against it every day, in everything I do. It colors every interaction I have with every person I encounter. It’s there, all the time.

When people talk about social anxiety, they mostly talk about it as being something like excessive shyness. I hope that this descriptions helps make it clear that that’s not what it’s really about.

Where’d this craziness come from?

For me, it’s really a kind of PTSD, or so a doctor who specializes in SA told me. I feel really guilty saying that, because to me, PTSD is something serious, and I have a hard time putting myself into the same basket as people who’ve gone through real trauma. But in medical terms, that’s what’s happening.

I’ve written about my past a little bit before. I had a rough childhood. Most of the time when you hear that, you think of family trouble, which couldn’t be farther from the truth for me. I had a really wonderful family. My parents and my siblings were/are great. But in school, I was *the* victim of abuse. I was a very small kid. I’m fairly tall now (around 5’11”) when I started high school, I wasn’t quite 5 feet tall. At the beginning of my junior year, I was still just 5’1″. So, I was short, skinny, hyperactive geeky kid. That’s pretty much the formula for getting picked on.

But I didn’t just get picked on. I got beaten up an a regular basis. I don’t say that lightly. I’m not talking about small stuff. The small abuses would have been bad enough, but that’s not what happened to me. This was serious physical abuse. To give one example, in gym class one day during my senior year, I had someone tackle me to the ground; then grab my little finger, say “I wonder what it would feel like if I broke this?”, and then snap it.

That was, pretty much, my life every day from 5th grade until I graduated high school. Everything I did became a reason to abuse me. If I answered a teachers question in class? That was a reason to beat me: I’m making them look bad. If I *didn’t* answer a question in class, *that* was a reason to beat me: I should be satisfying the teacher so that they don’t have to.

It wasn’t limited to school. My house was vandalized. The gas lines on our grill were cut. A swastika was burned into the street in front of my house. We had so many mailboxes destroyed that we literally build a detachable mount for the mailbox, and brought it in to the house every night. Then in retribution for depriving the assholes of the privilege of smashing our mailbox, they set the wooden mailbox post on fire.

Hearing this, you’d probably ask “Where was the principal/administration when all of this was going on?”. The answer? They didn’t really give a damn. The principal was an ex-nun, who believed that you shouldn’t punish children. If one children hits another, you shouldn’t tell them that hitting is wrong. You should sit them down and talk to them about “safe hands”, and what you need to do for your hands to be safe.

After the finger-breaking incident, my parents really freaked out, and went in to see the principal and assistant principal. Their reaction was to be furious at my parents. The AP literally shouted at my father, saying “What do you want, a god-damned armed guard to follow your kid around?”. (To which, I think, the response should have been “Fuck yeah. If you’re doing such a shit job protecting your students that the only way to stop them from having their bones broken for fun is to hire armed guards to follow them around, then you should damn well do that.”) Unfortunately, my parents didn’t believe in lawsuits; they wouldn’t sue the school, and they just didn’t have the money to move me to a private school. So I got to suffer.

(Even now, I would dearly love to find that principal… I’d really like to explain to her exactly what a god-damned idiot she is, and how ashamed she should be of the horrible job she did. A principal’s number one job is making sure that the school is a safe place for children to learn. She failed, horribly, at that – and, as far as I could tell, never felt the slightest bit of guilt over all of the things she allowed to happen in her school.)

So now, it’s literally 30 years since I got out of high school. But it’s very hard to get past the things that were pounded into you during your childhood. The eight years of daily abuse – from the time I was 10 years old until I turned 18 – basically rewired my personality.

The effects of that are what made me the way I am.

How does social anxiety really affect my daily life?

Socially, it’s almost crippling. I don’t have much of a social life. I’ve got a small group of very close friends who I don’t get to see nearly enough of, and I have a very hard time meeting new people. Even with people that I’ve known for a long time, I’m just not comfortable. Sometimes I really need social contact, but most of the time, I’d rather be alone in some quiet place, where I don’t need to worry about what other people think. I’d really like to be able to socialize more – in particular, there are a lot of people that I’ve met through this blog that I think of as friends, who I’d love to meet in person, but I never do. Even when I have the change, I usually manage to muck it up. (Because I always believe that people are looking for some reason to reject me, I see rejection in places where it doesn’t exist.)

Professionally, it’s been up and down. It definitely has held me back somewhat. In any job where you need to promote yourself, someone with SA is in deep trouble.

At one point, I even lost a job because of it. I didn’t get fired, but that’s only because I quit when it became obvious that that’s what was coming, and there was no point sticking around waiting for it. My manager at the time found out I was getting treated for SA. From the moment he found out, he stopped trusting anything I said about anything. To make matters worse, at the time, he was in trouble for a project that was literally 2 years overdue, and he needed a scapegoat. The “crazy” guy was the obvious target.

As an example of what I mean: one of the times he accused me of incompetence involved actors, which is a programming model that I used in my PhD dissertation. Actors are a model of concurrent computation in which everything is asynchronous. There are no visible locks – just a collection of active objects which can asynchronously send and receive messages. (I wrote a post about actors, with my own implementation of a really silly actor-based programming language here.)

We were working on a scheduling problem for our system. Our team had a meeting to discuss how to implement a particular component of that. After a lot of discussion, we agreed that we should implement it as an actor system. So I wrote a lightweight actors framework on top of our thread library, and implement the whole thing in actors. My coworkers reviewed the code, and accepted it with a lot of enthusiasm. My manager scheduled a private meeting where he accused my mental illness of impairing my judgement, because what kind of idiot would write something like that to be totally asynchronous?

So I left that company. Fortunately, skilled software engineers are in high demand in the NYC area, so finding a new job wasn’t a problem. I’ve had several different jobs since then. SA really hasn’t been a huge problem at any of them, thank goodness. It’s always a bit of a problem because my natural tendency is to try to disappear into the background, so it’s easy for people to not notice the work I’m doing. But I’ve mostly learned how to overcome that. It’s not easy, but I’ve managed.

When job-hunting, after that terrible experience, I learned to be careful to learn a bit about what the work culture of a company is like before I go to work there. I’ve tried to work something into conversations with people at the company after I have an offer, but before I accept it. It gives me a chance to see how they react to it. If I don’t like their reaction, if it seems like there’s a good chance that it’ll cause trouble, I’ll just take a different job someplace where it won’t be a problem. Like I said before, it’s a good time to be a software engineer in NYC – I can afford to turn down offers from companies that I don’t like.

So, yeah. I’m kind of crazy. Writing this is both difficult and terrifying. Posting it is going to be even worse. But I think it’s important to get stuff like this out there.

Despite all of this, I’ve wound up in a good place. I’m married to a lovely woman. I’ve got two smart, happy kids. I’ve got a great job, working with people that I really, genuinely like and enjoy working with, and they seem to like me back. It’s been a long, hard road to get here, but I’m pretty happy where I am.

This has gotten to be quite long, and I’ve been working on it on and off for a couple of months. I think that I’ve got to just let go, and post it as is. Feel free to ask questions about anything that I can clarify, and feel free to share your own stories in the comments. If you want to post something anonymously, feel free to email it to me (markcc@gmail.com), and I’ll post it for you so that theres nothing on the blog that could identify you.

Also note that I’m going to tightly moderate replies to this post. I’m *not* interested in having my blog turn into a place where jerks can abuse people sharing painful personal stories.

Unfortunately, I haven’t been able to find a verbatim quote from Musk about his argument, and I’ve seen a couple of slightly different arguments presented as being what Musk said. So I’m not really going to focus so much on Musk, but instead, just going to try to take the basic simulation argument, and talk about what’s wrong with it from a mathematical perspective.

The argument isn’t really all that new. I’ve found a couple of sources that attribute it to a paper published in 2003. That 2003 paper may have been the first academic publication, and it might have been the first to present the argument in formal terms, but I definitely remember discussing this in one of my philophy classes in college in the late 1980s.

Here’s the argument:

- Any advanced technological civilization is going to develop massive computational capabilities.
- With immense computational capabilities, they’ll run very detailed simulations of their own ancestors in order to understand where they came from.
- Once it is possible to run simulations, they will run
*many*of them to explore how different parameters will affect the simulated universe. - That means that advanced technological civilization will run
*many*simulations of universes where their ancestors evolved. - Therefore the number of simulated universes with intelligent life will be dramatically larger than the number of original non-simulated civilizations.

If you follow that reasoning, then the odds are, for any given form of intelligent life, it’s more likely that they are living in a simulation than in an actual non-simulated universe.

As an argument, it’s pretty much the kind of crap you’d expect from a bunch of half drunk college kids in a middle-of-the-night bullshit session.

Let’s look at a couple of simple problems with it.

The biggest one is a question of size and storage. The heart of this argument is the assumption that for an advanced civilization, nearly infinite computational capability will effectively become free. If you actually try to look at that assumption in detail, it’s not reasonable.

The problem is, we live in a quantum universe. That is, we live in a universe made up of discrete entities. You can take an object, and cut it in half only a finite number of times, before you get to something that can’t be cut into smaller parts. It doesn’t matter how advanced your technology gets; it’s got to be made of the basic particles – and that means that there’s a limit to how small it can get.

Again, it doesn’t matter *how* advanced your computers get; it’s going to take more than one particle in the real universe to simulate the behavior of a particle. To simulate a universe, you’d need a computer bigger than the universe you want to simulate. There’s really no way around that: you need to maintain state information about every particle in the universe. You need to store information about everything in the universe, and you need to also have some amount of hardware to actually do the simulation with the state information. So even with the most advanced technology that you can possible imagine, you can’t possible to better than one particle in the real universe containing all of the state information about a particle in the simulated universe. If you did, then you’d be guaranteeing that your simulated universe wasn’t realistic, because its particles would have less state than particles in the real universe.

This means that to simulate something in full detail, you effectively need something *bigger* than the thing you’re simulating.

That might sound silly: we do lots of things with tiny computers. I’ve got an iPad in my computer bag with a couple of hundred books on it: it’s much smaller than the books it simulates, right?

The “in full detail” is the catch. When my iPad simulates a book, it’s not capturing all the detail. It doesn’t simulate the individual pages, much less the individual molecules that make up those pages, the individual atoms that make up those molecules, etc.

But when you’re talking about perfectly simulating a system well enough to make it possible for an intelligent being to be self-aware, you need that kind of detail. We know, from our own observations of ourselves, that the way our cells operates is dependent on incredibly fine-grained sub-molecular interactions. To make our bodies work correctly, you need to simulate things on that level.

You can’t simulate the full detail of a universe bigger that the computer that simulates it. Because the computer is made of the same things as the universe that it’s simulating.

There’s a lot of handwaving you can do about what things you can omit from your model. But at the end of the day, you’re looking at an incredibly massive problem, and you’re stuck with the simple fact that you’re talking, at least, about building a computer that can simulate an entire planet and its environs. And you’re trying to do it in a universe just like the one you’re simulating.

But OK, we don’t actually *need* to simulate the whole universe, right? I mean, you’re really interested in developing a single species like yourself, so you only care about one planet.

But to make that planet behave absolutely correctly, you need to be able to correctly simulate everything observable from that planet. Its solar system, you need to simulate pretty precisely. The galaxy around it needs less precision, but it still needs a lot of work. Even getting very far away, you’ve got an awful lot of stuff to simulate, because your simulated intelligences, from their little planet, are going to be able to observe an awful lot.

To simulate a planet and its environment with enough precision to get life and intelligence and civilization, and to do it at a reasonable speed, you pretty much need to have a computer bigger than the planet. You can cheat a little bit, and maybe abstract parts of the planet; but you’ve got to do pretty good simulations of lots of stuff outside the planet.

It’s possible, but it’s not particularly useful. Because you need to run that simulation. And since it’s made up of the same particles as the things it’s simulating, it can’t move faster than the universe it simulates. To get useful results, you’d need to build it to be massively parallel. And that means that your computer needs to be even larger – something like a million times bigger.

If technology were to get good enough, you could, in theory, do that. But it’s not going to be something you do a lot of: no matter how advanced technology gets, building a computer that can simulate an entire planet and its people in full detail is going to be a truly massive undertaking. You’re not going to run large numbers of simulations.

You can certainly wave you hands and say that the “real” people live in a universe without the kind of quantum limit that we live with. But if you do, you’re throwing other assumptions out the window. You’re not talking about ancestor simulation any more. And you’re pretending that you can make predictions based on our technology about the technology of people living in a universe with dramatically different properties.

This just doesn’t make any sense. It’s really just techno-religion. It’s based on the belief that technology is going to continue to develop computational capability *without limit*. That the fundamental structure of the universe won’t limit technology and computation. Essentially, it’s saying that technology is omnipotent. Technology is God, and just as in any other religion, it’s adherents believe that you can’t place any limits on it.

Rubbish.

]]> In distributed systems, time is a problem. Each computer has a clock built in, but those clocks are independent. The clocks on different machines can vary quite a bit. If a human being is setting them, then they’re probably *at best* accurate to one second. Even using a protocol like NTP, which synchronizes clocks between different computers, you can only get the clocks accurate to within about a millisecond of each other.

That sounds pretty good. In human timescales, a millisecond is a nearly imperceptible time interval. But to a modern computer, which can execute billions of instructions per second, it’s a long time: long enough to execute a million instructions! To get a sense of how much time that is to a computer, I just measured the time it took to take all of the source code for my main project, compile it from scratch, and execute all of its tests: it took 26 milliseconds.

That’s a *lot* of work. On the scale of a machine running billions of instructions per second, a millisecond is a long time.

Why does that matter?

For a lot of things that we want to do with a collection of computers, we need to know what event happened first. This comes up in lots of different contexts. The simplest one to explain is a shared resource locked by a mutex.

A mutex is a *mutual exclusion lock*. It’s basically a control that only allows one process to access some shared resource at a time. For example, you could think of a database that a bunch of processes all talk to. To make an update, a process P needs to send a request asking for access. If no one is using it when the server receives the request, it will give a *lock* to P, and and then block anyone else from accessing it until P is done. Anyone else who asks for access to the the database will have to wait. When P is done, it *releases* the lock on the mutex, and then if there’s any processes waiting, the database will choose one, and give it the lock.

Here’s where time comes into things. How do you decide who to give the lock to? You could give it to whoever you received the request from first, using the time on the database host. But that doesn’t always work well. It could easily end up with hosts with a lower-bandwidth connection to the server getting far worse service than a a closer host.

You get better fairness by using “send time” – that is, the time that the request was sent to the server by the client. But that’s where the clock issue comes up. Different machines don’t agree perfectly on the current time. If you use their clocktime to determine gets the lock first, then a machine with a slow clock will always get access before one with a fast clock. What you need is some fair way of determining ordering by some kind of timestamp that’s fair.

There are a couple of algorithms for creating some notion of a clock or timestamp that’s fair and consistent. The simplest one, which we’ll look at in this post, is called *Lamport timestamps*. It’s impressively simple, but it works *really* well. I’ve seen it used in real-world implementations of Paxos at places like Google. So it’s simple, but it’s serious.

The idea of Lamport timestamps is to come up with a mechanism that defines a *partial order* over events in a distributed system. What it defines is a *causal ordering*: that is, for any two events, A and B, if there’s any way that A could have influenced B, then the timestamp of A will be less than the timestamp of B. It’s also possible to have two events where we can’t say which came first; when that happens, it means that they couldn’t possible have affected each other. If A and B can’t have any affect on each other, then it doesn’t matter which one “comes first”.

The way that you make this work is remarkably simple and elegant. It’s based on the simplest model of distributed system, where a distributed system is a collection of processes. The processes *only* communicate by explicitly sending messages to each other.

- Every individual process in the distributed system maintains an integer timestamp counter, .
- Every time a process performs an action, it increments . Actions that trigger increments of include message sends.
- Every time a process sends a message to another process, it includes the current value of in the message.
- When a process receives a message from a process , that message includes the value of when the message was sent. So it updates its to the (one more than the maximum of its current timestamp and the incoming message timestamp).

For any two events A and B in the system, if (that is, if A causally occurred before B – meaning that A could have done something that affected B), then we know that the timestamp of A will be smaller than the timestamp of B.

The order of that statement is important. It’s possible for timestamp(A) to be smaller than timestamp(B), but for B to have occurred *before* A by some wallclock. Lamport timestamps provide a *causal ordering*: A cannot have influenced or caused B *unless* ; but A and B can be independent.

Let’s run through an example of how that happens. I’ll write it out by describing the clock-time sequence of events, and following it by a list of the timestamp counter settings for each host. We start with all timestamps at 0: [A(0), B(0), C(0), D(0).

- [Event 1] A sends to C; sending trigger a timestamp increment. [A(1), B(0), C(0), D(0)].
- [Event 2] C receives a message from A, and sets its counter to 2. [A(1), B(0), C(2), D(0).
- [Event 3] C sends a message to A (C increments to 3, and sends.) [A(1), B(0), C(3), D(0).
- [Event 4] A recieves the message from C, and sets its clock to 4. [A(4), B(0), C(3), D(0)]
- [Event 5] B sends a message to D. [A(4), B(1), C(3), D(0)]
- [Event 6] D receives the message. [A(4), B(1), C(3), D(2)].
- [Event 7] D sends a message to C. [A(4), B(1), C(3), D(3)].
- [Event 8] C receives the message, and sets its clock to 4.

According to the Lamport timestamps, in event 5, B sent its message to D at time 1. But by wallclock time, it sent its message after C’s timestamp was already 3, and A’s timestamp was already 4. We know that in our scenario, event 5 happened before event 3 by wallclock time. But in a causal ordering, it didn’t. In causal order, event 8 happened after event 4, and event 7 happened before event 8. In causal comparison, we can’t say whether 7 happened before or after 3 – but it doesn’t matter, because which order they happened in can’t affect anything.

The Lamport timestamp is a partial ordering. It tells us something about the order that things happened in, but far from everything. In effect, if the timestamp of event A is less than the timestamp of event B, it means that *either* A happened before B *or* that there’s no causal relation between A and B.

The Lamport timestamp comparisons only become meaningful when there’s an actual causal link between events. In our example, at the time that event 5 occurs, there’s no causal connection at all between the events on host A, and the events on host B. You can choose any arbitrary ordering between causally unrelated events, and as long as you use it consistently, everything will work correctly. But when event 6 happens, *now* there’s a causal connection. Event 5 could have changed some state on host D, and that could have changed the message that D sent in event 7. Now there’s a causal relationship, timestamp comparisons between messages after 7 has to reflect that. Lamport timestamps are the simplest possible mechanism that captures that essential fact.

When we talk about network time algorithms, we say that what Lamport timestamps do is provide *weak clock consistency*: If A causally happened before B, then the timestamp of A will be less than the timestamp of B.

For the mutex problem, we’d really prefer to have *strong clock consistency*, which says that the timestamp of A is smaller than the timestamp of B if and only if A *causally occurred before* B. But Lamport timestamps don’t give us enough information to do that. (Which is why there’s a more complex mechanism called *vector clocks*, which I’ll talk about in another post.

Getting back to the issues that this kind of timestamp is meant to solve, we’ve got a partial order of events. But that isn’t quite enough. Sometimes we really need to have a total order – we need to have a single, correct ordering of events by time, with no ties. That total order doesn’t need to be *real* – by which I mean that it doesn’t need to be the actual ordering in which events occured according to a wallclock. But it needs to be consistent, and no matter which host you ask, they need to always agree on which order things happened in. Pure lamport timestamps don’t do that: they’ll frequently have causally unrelated events with identical timestamps.

The solution to that is to be arbitrary but consistent. Take some extra piece of information that uniquely identifies each host in the distributed system, and use comparisons of those IDs to break ties.

For example, in real systems, every host has a network interface controller (NIC) which has a universally unique identifier called a MAC address. The MAC address is a 48 bit number. No two NICs in the history of the universe will ever have the same MAC address. (There are 281 trillion possible MAC codes, so we really don’t worry about running out.) You could also use hostnames, IP addresses, or just random arbitrarily assigned identifiers. It doesn’t really matter – as long as it’s consistent.

This doesn’t solve all of the problems of clocks in distributed systems. For example, it doesn’t guarantee fairness in Mutex assignment – which is the problem that I used as an example at the beginning of this post. But it’s a necessary first step: algorithms that *do* guarantee fairness rely on some kind of consistent event ordering.

It’s also just a beautiful example of what good distributed solutions look like. It’s simple: easy to understand, easy to implement correctly. It’s the simplest solution to the problem that works: there is, provably, no simpler mechanism that provides weak clock consistency.

]]>It looks real to me. The technical problem is definitely solvable, but there’s a bunch of social/political stuff piled on top that’s making it hard for the technical solution to actually get implemented.

To understand what’s going on, we need a quick refresher on how bitcoin works.

The basic idea of bitcoin is pretty simple. There’s a thing called a *ledger*, which consists of a list of *transactions*. Each transaction is just a triple, (X, Y, Z), which means “X gave Y bitcoins to Z”. When you use a bitcoin to buy something, what’s really happening is that you’re adding a new entry to the ledger.

To make it all work, there’s a bunch of distributed computing going on to maintain the ledger. Every 10 minutes or so, a batch of transactions is added to the ledger by performing a very expensive computation. The set of transactions is called a *block*. The entire ledger is just a list of blocks – called the blockchain. In the current bitcoin protocol, a ledger block can only hold 1 MB of information.

That block size of 1MB is the problem. There are enough bitcoin transactions going on right now that at peak times, the amount of data needed to represent all of the transactions in a ten minute period is larger than 1MB.

That means that transactions start to back up. Imagine that there’s 1.5M of transactions occuring every 10 minutes. In the first period, you get 1M of them wrapped in a block, and the remaining 0.5MB gets delayed to the next period. The next period, you process the remaining half meg from the previous period, plus just 1/2MB from the current – leaving 1M to roll over to the next. That next period, you’re going to spend the entire block on transactions left from the previous time period – and the full 1.5MB gets deferred to later. Things have backed up to the point where *on average*, a new transaction doesn’t get added to a block for 45 minutes. There are confirmed reports of transactions taking 7 or 8 hours before they get added to the blockchain.

This is a problem on many levels. If you’re a store trying to sell things, and people want to pay with Bitcoin, this is a massive problem. Up until a transactions is confirmed by being part of a block accepted into the blockchain, the transaction can be *rescinded*. So you can’t give your customers their merchandise until you’re sure the transaction is in the blockchain. That was awkward when you had to wait 10 minutes. That’s completely unacceptable when you have *no idea* how long it might take.

Looking at this, you might think that the solution is just to say that you should create blocks more frequently. If there’s 1.5M of transactions every 10 minutes, why not just create a block every five minutes? The answer is: because it takes an average of around 10 minutes to perform the computation needed to add one block to the chain. So you *can’t* reduce the amount of time per block.

Alternatively, you could just increase the size of the block. In theory, that’s a great answer. Jump a block to 2M, and you’ve got enough space to handle the current volume. Jump it to 10M, and you’ve got enough buffer space to cover a couple of years.

But that’s where the social/political thing comes in. The work of performing the computation needed to add blocks to the chain (called mining) has become concentrated in the hands of a small group of people. And they don’t want to change the mining software that they’re running.

I don’t follow bitcoin closely, so I don’t know the details of the fights over the software. But as an outsider, it looks like a pretty typical thing: people prefer to stick with known profits today even if it kills the business tomorrow, rather than take a risk of losing todays profits. Changing the protocol might undermine their dominant mining position – so they’d rather see Bitcoin fall apart than risk losing todays profits.

To quickly address one stupid “answer” to this problem: I’ve seen lots of people say that you *can* make your transaction get added to the chain faster. There’s an option in the protocol to allow a transaction to say “I’ll pay X bitcoins to whoever adds this transaction to the chain”. Miners will grab those transactions and process them first, so all you need to do is be willing to pay.

That’s a partial solution, but it’s probably not a long term answer.

Think of it this way. There’s a range of different transactions performed with bitcoin. You can put them into buckets based on how time critical they are. At one end, you’ve got people walking into a store and buying something. The store needs to have that transaction processed while the customer waits – so it needs to be *fast*. You’ve got other transactions – like, say, paying your mortgage. If it takes 12 hours to go through, big deal! For simplicity, let’s just consider those two cases: there’s time critical transactions (fast), and non-time-critical ones (slow).

For slow transactions, you don’t need to increase the transaction fees. Just let the transaction get added to the blockchain whenever there’s room. For the fast ones, you need to pay.

The problem is, 1MB really isn’t that much space. Even if just 1/3 of the transactions are fast, you’re going to wind up with times when you can’t do all of the pending fast transactions in one block. So fast transactions need to increase their fees. But that can only go on for so long before the cost of using bitcoin starts to become a significant issue in the cost of doing business.

The ultimate problem is that bitcoin is being to successful as a medium of exchange for the current protocol. The blockchain can’t keep up with transactions. What adding transaction fees does is increase the cost of using bitcoin for fast transactions until it reaches the point where enough fast-transactors *drop out* of using bitcoin that all of the remaining fast-transactors no longer exceed the blocksize. In other words, transaction fees as a “solution” to the block-size problem only work by driving businesses away from accepting bitcoin. Which isn’t exactly in the best interest of people who want to use bitcoins.

Realistically, if you want to use bitcoin as a currency, you can’t solve its capacity problems without increasing its capacity. If there are more than 1MB of transactions happening every 10 minutes, then you need to do something to increase the number of transactions that can be part of a block. If not, then you *can’t* support the number of transactions that people want to make. If that’s the case, then you can’t rely on being able to use bitcoin to make a purchase – and that means that you don’t have a usable currency.

The other day, I got a question via email that involves significant figures. Sigfigs are really important in things that apply math to real-world measurements. But they’re poorly understood at best by most people. I’ve written about them before, but not in a while, and this question does have a somewhat different spin on it.

Here’s the email that I got:

Do you have strong credentials in math and/or science? I am looking for someone to give an expert opinion on what seems like a simple question that requires only a short answer.

Could the matter of significant figures be relevant to an estimate changing from 20 to less than 15? What if it were 20 billion and 13.7 billion?

If the context matters, in the 80s the age of the universe was given as probably 20 billion years, maybe more. After a number of changes it is now considered to be 13.7 billion years. I believe the change was due to distinct new discoveries, but I’ve been told it was simply a matter of increasing accuracy and I need to learn about significant figures. From what I know (or think I know?) of significant figures, they don’t really come into play in this case.

The subject of significant digits is near and dear to my heart. My father was a physicist who worked as an electrical engineer producing power circuitry for military and satellite applications. I’ve talked about him before: most of the math and science that I learned before college, I learned from him. One of his pet peeves was people screwing around with numbers in ways that made no sense. One of the most common ones of that involves significant digits. He used to get really angry at people who did things with calculators, and just read off all of the digits.

He used to get really upset when people did things like, say, measure a plate with a 6 inch diameter, and say that it had an are] of 28.27433375 square inches. That’s ridiculous! If you measured a plate’s diameter to within 1/16th of an inch, you can’t use that measurement to compute its area down to less than one billionth of a square inch!

Before we really look at how to answer the question that set this off, let’s start with a quick review of what significant figures are and why they matter.

When we’re doing science, a lot of what we’re doing involves working with measurements. Whether it’s cosmologists trying to measure the age of the universe, chemists trying to measure the energy produced by a reaction, or engineers trying to measure the strength of a metal rod, science involves measurements.

Measurements are limited by the accuracy of the way we take the measurement. In the real world, there’s no such thing as a perfect measurement: all measurements are approximations. Whatever method we chose for taking a measurement of something, the measurement is accurate only to within some margin.

If I measure a plate with a ruler, I’m limited by factors like how well I can align the ruler with the edge of the plate, by what units are marked on the ruler, and by how precisely the units are marked on the ruler.

Once I’ve taken a measurement and I want to use it for a calculation, the accuracy of anything I calculate is limited by the accuracy of the measurements: the accuracy of our measurements necessarily limits the accuracy of anything we can compute from those measurements.

For a trivial example: if I want to know the total mass of the water in a tank, I can start by saying that the mass of a liter of water is one kilogram. To figure out the mass of the total volume of water in the tank, I need to know its volume. Assuming that the tank edges are all perfect right angles, and that it’s uniform depth, I can measure the depth of the water, and the length and breadth of the tank, and use those to compute the volume.

Let’s say that the tank is 512 centimeters long, and 203 centimeters wide. I measure the depth – but that’s difficult, because the water moves. I come up with it being roughly 1 meter deep – so 100 centimeters.

The volume of the tank can be computed from those figures: 5.12 times 2.03 times 1.00, or 10,393.6 liters.

Can I really conclude that the volume of the tank is 10,393.6 liters? No. Because my measurement of the depth wasn’t accurate enough. It could easily have been anything from, say, 95 centimeters to 105 centimeters, so the actual volume could range between around 9900 liters and 11000 liters. From the accuracy of my measurements, claiming that I know the volume down to a milliliter is ridiculous, when my measurement of the depth was only accurate within a range of +/- 5 centimeters!

Ideally, I might want to know a strong estimate on the bounds of the accuracy of a computation based on measurements. I can compute that if I know the measurement error bounds on each error measurement, and I can track them through the computation and come up with a good estimate of the bounds – that’s basically what I did up above, to conclude that the volume of the tank was between 9,900 and 11,000 liters. The problem with that is that we often don’t really know the precise error bounds – so even our estimate of error is an imprecise figure! And even if we did know precise error bounds, the computation becomes much more difficult when you want to track error bounds through it. (And that’s not even considering the fact that our error bounds are only another measured estimate with its own error bounds!)

Significant figures are a simple statistical tool that we can use to determine a reasonable way of estimating how much accuracy we have in our measurements, and how much accuracy we can have at the end of a computation. It’s not perfect, but most of the time, it’s good enough, and it’s really easy.

The basic concept of significant figures is simple. You count how many digits of accuracy each measurement has. The result of the computation over the measurements is accurate to the *smallest* number of digits of any of the measurements used in the computation.

In the water tank example, we had three significant figures of accuracy on the length and width of the tank. But we only had one significant figure on the accuracy of the depth. So we can only have one significant figure in the accuracy of the volume. So we conclude that we can say it was around 10 liters, and we can’t really say anything more precise than that. The exact value likely falls somewhere within a bell curve centered around 10 liters.

Returning to the original question: can significant figures change an estimate of the age of the universe from 20 to 13.7?

Intuitively, it might seem like it shouldn’t: sigfigs are really an extension of the idea of rounding, and 13.7 rounded to one sigfig should round down to 10, not up to 20.

I can’t say anything about the specifics of the computations that produced the estimates of 20 and 13.7 billion years. I don’t know the specific measurements or computations that were involved in that estimate.

What I can do is just work through a simple exercise in computations with significant figures to see whether it’s possible that changing the number of significant digits in a measurement could produce a change from 20 to 13.7.

So, we’re looking at two different computations that are estimating the same quantity. The first, 20, has just one significant figure. The second, 13.7 has three significant digits. What that means is that for the original computation, one of the quantities was known only to one significant figure. We can’t say whether all of the elements of the computation were limited to one sigfig, but we know at least one of them was.

So if the change from 20 to 13.7 was caused by significant digits, it means that by increasing the precision of just one element of the computation, we could produce a large change in the computed value. Let’s make it simpler, and see if we can see what’s going on by just adding one significant digit to one measurement.

Again, to keep things simple, let’s imagine that we’re doing a really simple calculation. We’ll use just two measurements and , and the value that we want to compute is just their product, .

Initially, we’ll say that we measured the value of to be 8.2 – that’s a measurement with two significant figures. We measure to be 2 – just one significant figure. The product . Then we need to reduce that product to just one significant figure, which gives us 20.

After a few years pass, and our ability to measure gets much better: now we can measure it to two significant figures, with a new value of 1.7. Our new measurement is completely compatible with the old one – 1.7 reduced to 1 significant figure is 2.

Now we’ve got equal precision on both of the measurements – they’re now both 2 significant figures. So we can compute a new, better estimate by multiplying them together, and reducing the solution to 2 significant figures.

We multiply 8.2 by 1.7, giving us around 13.94. Reduced to 2 significant figures, that’s 14.

Adding one significant digit to just one of our measurements changed our estimate of the figure from 20 to 14.

Returning to the intuition: It seems like 14 vs 20 is a very big difference: it’s a 30 percent change from 20 to 14! Our intuition is that it’s too big a difference to be explained just by a tiny one-digit change in the precision of our measurements!

There’s two phenomena going on here that make it look so strange.

The first is that significant figures are an absolute error measurement. If I’m measuring something in inches, the difference between 15 and 20 inches is the same size error as the difference between 90 and 95 inches. If a measurement error changed a value from 90 to 84, we wouldn’t give it a second thought; but because it reduced 20 to 14, that seems worse, even though the absolute magnitude of the difference considered in the units that we’re measuring is exactly the same.

The second (and far more important one) is that a measurement of just one significant digit is a very imprecise measurement, and so any estimate that you produce from it is a very imprecise estimate. It seems like a big difference, and it is – but that’s to be expected when you try to compute a value from a very rough measurement. Off by one digit in the least significant position is usually not a big deal. But if there’s only one significant digit, then you’ve got very little precision: it’s saying that you can barely measure it. So of course adding precision is going to have a significant impact: you’re adding a lot of extra information in your increase in precision!

]]> Tech interviews get a *lot* of bad press. Only some of it is deserved.

What you frequently hear in criticism is stuff about “gotcha” questions, or “brain teasers”. Those do happen, and when they do, they deserve condemnation. For example, I have seriously had an interviewer for a software job ask me “Why are manholes round?” That’s stupid. People who like it claim that it’s a test of lateral thinking; I think it’s garbage. But these days, that kind of rubbish seems pretty rare.

Instead, what’s become very common is for interviewers to present a programming problem, and ask the candidate to solve it by writing some code. A lot of people really *hate* these kinds of interviews, and describe them as just brain-teasers. I disagree, and I’m going to try to explain why.

The underlying problem for tech job interviews is that hiring the right people is *really, really hard*.

When someone applies for a job as a software engineer with your company, you start off with just their resume. Resumes are not particularly informative. All they give you is a brief, possibly partial history of the candidates work experience. From a resume, can’t tell how much they really contributed to the projects they worked on. You can’t tell how much their work added (or subtracted) from the team they were part of. You can’t tell if they get work done in a reasonable amount of time, or if they’re slower than a snail. You can’t even tell if they can write a simple program at all.

So you start by screening resumes, doing your best to infer as much as you can from them. Next, you often get recommendations. Recommendations *can* be useful, but let’s be honest: All recommendation letters are positive. You’re not going to *ask* for a recommendation from someone who isn’t going to say great things about you. So no matter how terrible a candidate is, the recommendations are going to say nice things about them. At best, you can sometimes infer a problem by reading between the lines – but that’s a very subjective process.

So you end up interviewing someone who’s resume looks good on paper, and who got a couple of people to write letters for them. How do you determine whether or not they’re going to be a valuable addition to your team?

You need to do *something* to decide whether or not to hire a particular person. What can you do?

That’s what the interview is for. It’s a way to try to get more information. Sure, this person has a college degree. Sure, they’ve got N years of experience. But can they program? Can they communicate well with their coworkers? Do they actually know what they’re doing?

A tech interview is generally an attempt to get information about a candidate by watching them work on a problem. The interview *isn’t* about knowing the right answer. It’s not even about getting the correct solution to the problem. It’s about *watching a candidate work*.

When I ask a job candidate a technical question, there’s three main things I’m looking for.

- What’s their process for solving the problem? On this level, I’m trying to figure out: Do they think about it, or do they jump in and start programming? Do they make sure they understand the problem? Do they clearly state their assumptions?
- Can they write a simple program? Here I’m trying to see if they’ve got any ability to write

code. No one writes*great*code in an interview setting. But I want to know if they’re

able to sit down with an unfamiliar problem, and work out a solution in code. I want to see if they start coding immediately, or take time to think through their solution before they start writing. - How well can they communicate ideas about programming? Can they grasp the problem from my description? If not, can they figure out what questions they need to ask to understand it? Once they start solving the problem, how well can they explain what they’re doing? Can they describe the algorithm that they’ve chosen? Can they explain why it works?

To try to clarify this, I’m going to walk through a problem that I used to use in interviews. I haven’t used this question in about 3 years, and as far as I know, no one is using the question anymore. The problem involves something called Gray code. Gray code is an alternative representation of numbers in binary form that’s useful for a range of applications involving things like switching systems.

Here’s a quick run through one of the reasons to use gray code. Imagine a system that uses physical switches. You’ve got an array of 8 switches representing a number. It’s currently presenting the number 7 in standard binary – so the first 5 switches are off, and last 3 are on. You want to increment the number. To do that, you need to change the position of four switches *at exactly the same time*. The odds of your being able to do that without even a transient state that appeared to be a number other than 7 or 8 are vanishingly small.

Gray code solves that by changing the representation. In Gray code, the representation of every number N+1 is only different from the representation of N by exacly one bit. That’s a nice property which makes it useful, even nowadays when we’re not using physical switches for much of anything anymore.

The easiest way that you get the gray code of numbers is by writing a table. You start off by writing 0 and 1, which are the same in both gray code and standard binary:

Decimal | Standard Binary | Gray |
---|---|---|

0 | 0 | 0 |

1 | 1 | 1 |

There’s the one-bit gray codes. To get the two bit, make two copies of the rows in that table.

To the first copy, prepend a 0. To the second copy, reverse the order of the rows, prepend a 1:

Decimal | Standard Binary | Gray |
---|---|---|

0 | 00 | 00 |

1 | 01 | 01 |

2 | 10 | 11 |

3 | 11 | 10 |

To get to the three bit gray codes, you repeat the process. Copy the rows, prepend 0s to

the first copy; reverse the order of the second, and prepend 1s.

Decimal | Standard Binary | Gray |
---|---|---|

0 | 000 | 000 |

1 | 001 | 001 |

2 | 010 | 011 |

3 | 011 | 010 |

4 | 100 | 110 |

5 | 101 | 111 |

6 | 110 | 101 |

7 | 111 | 100 |

So, the gray code of 6 is 101, and the gray code of 7 is 100.

What I would ask an interview candidate to do is: implement a recursive function that given an integer , returns a string with the gray code of .

I can understand how some people look at this question, and say, “Yeah, that’s just a stupid puzzle.” On one level, yeah. It’s obvious an artifical question. In fact, in practice, *no one* ever uses a recursive algorithm for something like this. Even if you have a problem where gray code is part of a practical solution, there’s a better way of converting numbers to gray code than this silly recursive nonsense.

So I agree that it’s artificial. But interview questions *have* to be artificial. In a typical interview, you’ve got an hour with a candidate. You’re not going to be able to *explain* a real problem to them in that amount of time, much less have them solve it!

But it’s artificial in a useful way that allowed me, as an interviewer, to learn about the candidate. I wasn’t trying to see if the candidate was number-puzzle wizard who could instantly see the recursion pattern in a problem like this. Most people have never heard of gray code, and to most people (including me, the first time I saw this problem!), the recursion pattern isn’t obvious. But that’s not the point: there’s a lot more to the interview that just the initial problem statement.

I don’t present the problem, and then sit back and watch silently as they try to solve it. If I did, all I’d be learning is whether or not they’re a number-puzzle wizard. I don’t care about that. So I didn’t just leave them floundering trying to somehow come up with a solution. In the beginning, after describing the problem, I set an initial direction. I usually have them start by extending the table themselves, to make sure they understand the process. Then I take their extended table, and add a new column:

Decimal | Standard Binary | Gray | Rec |
---|---|---|---|

0 | 0000 | 0000 | |

1 | 0001 | 0001 | |

2 | 0010 | 0011 | “1” + gray(1) |

3 | 0011 | 0010 | “1” + gray(0) |

4 | 0100 | 0110 | |

5 | 0101 | 0111 | |

6 | 0110 | 0101 | |

7 | 0111 | 0100 | |

8 | 1000 | 1100 | “1” + gray(7) |

9 | 1001 | 1101 | “1” + gray(6) |

10 | 1010 | 1111 | “1” + gray(5) |

11 | 1011 | 1110 | |

12 | 1100 | 1010 | |

13 | 1101 | 1011 | |

14 | 1110 | 1001 | |

15 | 1111 | 1000 | “1” + gray(0) |

With that on the board/screen, I’d ask them to try to take what I just gave them, and rewrite it a bit. For example, in row 8, instead of “1” + gray(7), come up with an expression using the numeric value “8” of the row, which will produce 7. They should be able to come up with “15 – 8” – and to see that in every row , where and , the gray code of is “1” + gray(15 – n).

For most people, that’s enough of a clue to be able to start writing the function. If they can’t get there, it shows me that they’ve got some trouble wrapping their head around this abstraction. I’ve got a few more hints up my sleeve to help, but if without all of the help I can give, they still can’t come up with that, that’s one mark against them.

But even if they can’t come up with the relation at all, it’s not the end of the interview. I have, sometimes, ended up recommending hiring someone who had trouble this far! If they can’t come up with that basic relation, it’s just one piece of information about the candidate. I’d file that away, and move on, by giving them the recurrence relation, and then I would ask them to code it.

There’s one problem that comes up in coding, which is interesting and important. The most naive code for gray is something like:

def gray(n): print("gray(%s)" % n) if n == 0: return "0" if n == 1: return "1" num_digits = math.floor(math.log(n, 2)) + 1 return "1" + gray(int(2**num_digits - 1 - n))

That’s very close, but not right. If you call `gray(10)`

, you get the right answer.

If you call `gray(4)`

, you get “110”, which is correct. But if you call `gray(8)`

, you’d get “110”, when you *should* have gotten `1010`

.

Most candidates make this mistake. And then I ask them to trace it through on 8 as an example. They usually see what the problem is. If they don’t, then that’s another red flag.

If they’re really struggling to put together a recursive function, then I’d ask them to just write a function to convert an integer into standard binary. If they can do that, then I start making suggestions about how to convert that to do gray code.

The big indicator here is whether or not they can write a simple recursive function *at all*. The systems I was working on at the time made heavy use of recursion – if a candidate couldn’t write recursive code, there was simply no way that they’d be able to do the job. (It was depressing to see just how many qualified-looking candidates came in, but who couldn’t write a simple recursive function.)

Through this whole process, how well they were able to talk about what they were doing was as important as the solution they came up with. If they heard the question, and immediately wrote down perfect, beautiful code, but they couldn’t explain how they got it, or how it worked? They’d get a mediocre rating, which wouldn’t get them a job offer. If they made a lot of mistakes in their code, but they were crystal clear about explaining how they worked out a solution, and how it worked? They’d probably get a *better* rating than the perfect code candidate.

I hope this shines a bit of light on this kind of interview question. While it’s necessarily artificial, it’s artifical in a way that we *hope* can help us learn more about the candidate. It’s not a trick question that’s irrelevant to the job, like “Why are manholes round?”: this is an attempt to probe at an area of knowledge that any candidate for a software engineering job should have. It’s not an all or nothing problem: even if you start off with no clue of how to approach it, I’m guiding you through. If you can’t solve it without help, this problem gives me some insight into what it is that you don’t understand, and hopefully whether or not that’s going to be a problem if we hire you.

Is it a great way of doing interviews? Honestly, no. But it’s the best way we know of doing it.

As an interview system, it doesn’t do a good job of identifying the very best people to hire. There’s no correlation between outstanding interview performance and outstanding on-the-job performance. But there’s a strong correlation between poor performance on this kind of interview question and poor performance on the job. Great performers on the job show the same distribution of interview performance as average ones; but poor performers on interviews show a significantly negative-shifted job performance distribution.

We haven’t found a way of interviewing people that does a better job than this. It’s the best we have. Statistically, it works far better at selecting people than “open-ended” interviews that don’t involve any kind of practical programming exercise. So for all of its faults, it’s better that the alternatives.

I’m sure there are pedants out there who are asking “So what’s the correct implementation of gray code?” It’s totally not the point of talking about it, but here’s one sloppy but correct implementation. This isn’t the quality of code I would ever use for anything serious at work, but it’s perfectly adequate for an interview.

import math def gray(n): def required_digits(n): """Compute the number of digits required to represent a number in binary """ return int(math.floor(math.log(n, 2))) + 1 def pad_digits(gray, num_digits): if len(gray) < num_digits: return "0"*(num_digits - len(gray)) + gray return gray if n == 0: return "0" if n == 1: return "1" num_digits = int(math.floor(math.log(n, 2)) + 1) return "1" + pad_digits(gray(int(2**num_digits - 1 - n)), num_digits - 1)]]>

Take any number with at least two-digits, N. Reverse the digits of N, giving you M, and then add N+M. That gives you a new number. If you keep repeating this process, most of the time, you’ll get a palindromic number really fast.

For example, let’s start with 16:

- 16 reversed is 61.
- 16+61=77. 77 is a palindromic number.
- So one reverse+add, and we have a palindromic number.

Or 317:

- 317 reversed is 713.
- 317+713=1030.
- 1030 reversed is 301.
- 1030 + 301 = 1331, so we have a palindromic number after two steps.

You can play the same game in different number bases. For example, in base 8, we can start with 013 (11 in base-10): in one reverse+add, it becomes 44 (36 in base-10).

For most numbers, you get a palindrome in just a few steps. But for some numbers, in some number bases, you don’t. If you can’t ever get to a palindrome by doing reverse+add starting from a number, then that number is called a *Lychrel number*.

The process of exploring Lychrel numbers has picked up a lot of devotees, who’ve developed a whole language for talking about it:

The question that haunts Lychrel enthusiasts is, will you *always* eventually get a palindrome? That is, do Lychrel numbers actually exist?

In base-2, we know the answer to that: not all numbers will produce a palindrome; there are base-2 Lychrel numbers. The smallest base-22 Lychrel number is 22 – or 10110 in binary. We can look at its reverse add sequence, and see intuitively why it will never produce a palindrome:

- 10110
- 100011
- 1010100
- 1101001
- 10110100 (10, 11, 01, 00)
- 11100001
- 101101000
- 110010101
- 1011101000 (10, 111, 01, 000)
- 110010101
- 1101000101
- 10111010000
- 0b11000101101
- 0b101111010000 (10, 1111, 01, 0000)

Starting at step 5, we start seeing a pattern in the sequence, where we have recurring values where that have a pattern of 10, followed by -1s, followed by 01, followed by 0s. We’ve got a sequence that’s building larger and larger numbers, in a way that will never converge into a palindrome.

We can find similar sequences in any power-of-two base – base 4, 8, 16, etc. So in power-of-two bases, there are Lychrel numbers. But: are there Lychrel numbers in our familiar base-10?

We think so, but we’re not sure. No one has been able to prove it either way. But we’ve got some numbers, which we call *Lychrel candidates*, that we *think* are probably Lychcrel numbers. The smallest one is 196 – which is why this whole discussion is sometimes called the 196 problem, or the 196 algorithm.

People have written programs that follow the Lychrel thread from 196, trying to see if it reaches a palindrome. So far, the record for exploring the 196 Lychrel thread carries it through more than a billion iterations, producing a non-palindromic number with more than 6 million digits.

That’s pretty impressive, given that the longest Lychrel thread for any number smaller than 196 is the thread of 24 steps, starting with 89 (which produces the palindromic number 8,813,200,023,188).

From my perspective, one thing that interests me about this is its nature as a computational problem. As a problem, it’s really easy to implement. For example, here’s a complete implementation in Ratchet, a Scheme-based programming language. (I used ratchet because it features infinite-precision integers, which makes it easier to write.)

(define (reverse-number n) (string->number (list->string (reverse (string->list (number->string n)))))) (define (palindromic? n) (equal? n (reverse-number n))) (define (reverse-add n) (+ n (reverse-number n))) (define (find-palindrome seed) (define (noisy-find-palindrome n count) (if (palindromic? n) n (begin (printf "At iteration ~v, candidate=~v~n" count n) (noisy-find-palindrome (reverse-add n) (+ count 1))))) (noisy-find-palindrome seed 0))

I literally threw that together in less than five minutes. In that sense, this is a really, really easy problem to solve. But in another sense, it’s a very hard problem: there’s no way to really speed it up.

In modern computing, when we look at a problem that takes a lot of computation to solve, the way that we generally try to approach it is to throw more CPUs at it, and do it in parallel. For most problems that we come across, we can find some reasonable way to divide it into parts that can be run at the same time; then by having a bunch of computers work on those different parts, we can get a solution pretty quickly.

For example, back in the early days of this blog, I did some writing about the Mandelbrot set, and one variant of it that I find particularly interesting, called the Buddhabrot. The Buddhabrot is interesting because it’s a fractal visualization which isn’t naively zoomable. In a typical Mandelbrot set visualization, you can pick a small region of it that you want to see in more detail, and focus your computation on just that part, to get a zoomed in view on that. In the Buddhabrot, due to the chaotic nature of the computation, you *can’t*. So you just compute the Buddhabrot at a massive size, and then you *compress* it. When you want to see a region in more detail, you un-compress. To make that work, buddhabrot’s are frequently computed at resolutions like 1 million by 1 million pixels. That translates to enough complex floating point computations to compute several trillion values. That’s a big task. But in modern environments, that’s routine enough that a friend of mine at Google wrote a program, just for kicks, to compute a big buddhabrot image, and ran it on an experimental cluster.

If that kind of computational capability can be exploited just for kicks, why is it that the best effort at exploring the Lychrel thread for 196 only covers 6 million digits?

The answer is that there’s a way of computing the Buddhabrot in parallel. You can throw 10,000 CPUs at it for a couple of days, and get an amazing Buddhabrot image. But for the Lychrel thread, there’s no good way to do it in parallel.

For each additional number in the thread, you need to rearrange and add a couple of million digits. That’s a lot of work, but it’s not crazy. On any halfway decent computer, it won’t take long. To get a sense, I just whipped up a Python program that generated 1,000,000 random pairs of digits, and added them up. It took under 8 seconds – and that’s half-assed code written using a not-particularly fast interpreted language on a not-particularly-speedy laptop. A single iteration of the Lychrel thread computation even for a million-digit candidate doesn’t take that long – it’s on the order of seconds.

The catch is that the process of searching a Lychrel thread is intrinsically serial: you can’t have different CPUs computing the next thread element for different values: you don’t know the next value until you’ve finished the previous one. So even if it took just 1 second to do the reverse+add for million digit numbers, it would takes a long time to actually explore the space. If you want to explore the next million candidates, at 2 seconds per iteration, that will take you around 3 weeks!

Even if you don’t waste time by using a relatively slow interpreter like Python – even if you use carefully hand-optimized code using an efficient algorithm, it would take months at the very least to explore a space of billions of elements of the 196 thread! And to get beyond what anyone has done, you’ll probably need to end up doing years of computation, even with a very fast modern computer – because you can’t parallelize it.

If you’re interested in this kind of thing, I recommend looking at Wade Van Landingham’s p196 site. Wade is the guy who named them Lychrel numbers (based on a rough reversal of his girlfriend (now wife)’s name, Cheryl). He’s got all sorts of data, including tracking some of the longest running efforts to continue the 196 thread.

*A couple of edits were made to this, based on error pointed out by commenters.*