Technical Interviews: More than a brain teaser?

A request I’ve gotten from a couple of readers (unrelated to the recent charity thing) is to talk about engineering interviews at tech companies. I’ve been at Google, Foursquare, Twitter, and now Dropbox, so I’ve spent some time inside multiple companies that put a lot of effort into recruiting.

Tech interviews get a lot of bad press. Only some of it is deserved.

What you frequently hear in criticism is stuff about “gotcha” questions, or “brain teasers”. Those do happen, and when they do, they deserve condemnation. For example, I have seriously had an interviewer for a software job ask me “Why are manholes round?” That’s stupid. People who like it claim that it’s a test of lateral thinking; I think it’s garbage. But these days, that kind of rubbish seems pretty rare.

Instead, what’s become very common is for interviewers to present a programming problem, and ask the candidate to solve it by writing some code. A lot of people really hate these kinds of interviews, and describe them as just brain-teasers. I disagree, and I’m going to try to explain why.

The underlying problem for tech job interviews is that hiring the right people is really, really hard.

When someone applies for a job as a software engineer with your company, you start off with just their resume. Resumes are not particularly informative. All they give you is a brief, possibly partial history of the candidates work experience. From a resume, can’t tell how much they really contributed to the projects they worked on. You can’t tell how much their work added (or subtracted) from the team they were part of. You can’t tell if they get work done in a reasonable amount of time, or if they’re slower than a snail. You can’t even tell if they can write a simple program at all.

So you start by screening resumes, doing your best to infer as much as you can from them. Next, you often get recommendations. Recommendations can be useful, but let’s be honest: All recommendation letters are positive. You’re not going to ask for a recommendation from someone who isn’t going to say great things about you. So no matter how terrible a candidate is, the recommendations are going to say nice things about them. At best, you can sometimes infer a problem by reading between the lines – but that’s a very subjective process.

So you end up interviewing someone who’s resume looks good on paper, and who got a couple of people to write letters for them. How do you determine whether or not they’re going to be a valuable addition to your team?

You need to do something to decide whether or not to hire a particular person. What can you do?

That’s what the interview is for. It’s a way to try to get more information. Sure, this person has a college degree. Sure, they’ve got N years of experience. But can they program? Can they communicate well with their coworkers? Do they actually know what they’re doing?

A tech interview is generally an attempt to get information about a candidate by watching them work on a problem. The interview isn’t about knowing the right answer. It’s not even about getting the correct solution to the problem. It’s about watching a candidate work.

When I ask a job candidate a technical question, there’s three main things I’m looking for.

  1. What’s their process for solving the problem? On this level, I’m trying to figure out: Do they think about it, or do they jump in and start programming? Do they make sure they understand the problem? Do they clearly state their assumptions?
  2. Can they write a simple program? Here I’m trying to see if they’ve got any ability to write
    code. No one writes great code in an interview setting. But I want to know if they’re
    able to sit down with an unfamiliar problem, and work out a solution in code. I want to see if they start coding immediately, or take time to think through their solution before they start writing.
  3. How well can they communicate ideas about programming? Can they grasp the problem from my description? If not, can they figure out what questions they need to ask to understand it? Once they start solving the problem, how well can they explain what they’re doing? Can they describe the algorithm that they’ve chosen? Can they explain why it works?

To try to clarify this, I’m going to walk through a problem that I used to use in interviews. I haven’t used this question in about 3 years, and as far as I know, no one is using the question anymore. The problem involves something called Gray code. Gray code is an alternative representation of numbers in binary form that’s useful for a range of applications involving things like switching systems.

Here’s a quick run through one of the reasons to use gray code. Imagine a system that uses physical switches. You’ve got an array of 8 switches representing a number. It’s currently presenting the number 7 in standard binary – so the first 5 switches are off, and last 3 are on. You want to increment the number. To do that, you need to change the position of four switches at exactly the same time. The odds of your being able to do that without even a transient state that appeared to be a number other than 7 or 8 are vanishingly small.

Gray code solves that by changing the representation. In Gray code, the representation of every number N+1 is only different from the representation of N by exacly one bit. That’s a nice property which makes it useful, even nowadays when we’re not using physical switches for much of anything anymore.

The easiest way that you get the gray code of numbers is by writing a table. You start off by writing 0 and 1, which are the same in both gray code and standard binary:

Decimal Standard Binary Gray
0 0 0
1 1 1

There’s the one-bit gray codes. To get the two bit, make two copies of the rows in that table.
To the first copy, prepend a 0. To the second copy, reverse the order of the rows, prepend a 1:

Decimal Standard Binary Gray
0 00 00
1 01 01
2 10 11
3 11 10

To get to the three bit gray codes, you repeat the process. Copy the rows, prepend 0s to
the first copy; reverse the order of the second, and prepend 1s.

Decimal Standard Binary Gray
0 000 000
1 001 001
2 010 011
3 011 010
4 100 110
5 101 111
6 110 101
7 111 100

So, the gray code of 6 is 101, and the gray code of 7 is 100.

What I would ask an interview candidate to do is: implement a recursive function that given an integer N, returns a string with the gray code of N.

I can understand how some people look at this question, and say, “Yeah, that’s just a stupid puzzle.” On one level, yeah. It’s obvious an artifical question. In fact, in practice, no one ever uses a recursive algorithm for something like this. Even if you have a problem where gray code is part of a practical solution, there’s a better way of converting numbers to gray code than this silly recursive nonsense.

So I agree that it’s artificial. But interview questions have to be artificial. In a typical interview, you’ve got an hour with a candidate. You’re not going to be able to explain a real problem to them in that amount of time, much less have them solve it!

But it’s artificial in a useful way that allowed me, as an interviewer, to learn about the candidate. I wasn’t trying to see if the candidate was number-puzzle wizard who could instantly see the recursion pattern in a problem like this. Most people have never heard of gray code, and to most people (including me, the first time I saw this problem!), the recursion pattern isn’t obvious. But that’s not the point: there’s a lot more to the interview that just the initial problem statement.

I don’t present the problem, and then sit back and watch silently as they try to solve it. If I did, all I’d be learning is whether or not they’re a number-puzzle wizard. I don’t care about that. So I didn’t just leave them floundering trying to somehow come up with a solution. In the beginning, after describing the problem, I set an initial direction. I usually have them start by extending the table themselves, to make sure they understand the process. Then I take their extended table, and add a new column:

Decimal Standard Binary Gray Rec
0 0000 0000
1 0001 0001
2 0010 0011 “1” + gray(1)
3 0011 0010 “1” + gray(0)
4 0100 0110
5 0101 0111
6 0110 0101
7 0111 0100
8 1000 1100 “1” + gray(7)
9 1001 1101 “1” + gray(6)
10 1010 1111 “1” + gray(5)
11 1011 1110
12 1100 1010
13 1101 1011
14 1110 1001
15 1111 1000 “1” + gray(0)

With that on the board/screen, I’d ask them to try to take what I just gave them, and rewrite it a bit. For example, in row 8, instead of “1” + gray(7), come up with an expression using the numeric value “8” of the row, which will produce 7. They should be able to come up with “15 – 8” – and to see that in every row n , where n \ge 8 and n < 16, the gray code of n is “1” + gray(15 – n).

For most people, that’s enough of a clue to be able to start writing the function. If they can’t get there, it shows me that they’ve got some trouble wrapping their head around this abstraction. I’ve got a few more hints up my sleeve to help, but if without all of the help I can give, they still can’t come up with that, that’s one mark against them.

But even if they can’t come up with the relation at all, it’s not the end of the interview. I have, sometimes, ended up recommending hiring someone who had trouble this far! If they can’t come up with that basic relation, it’s just one piece of information about the candidate. I’d file that away, and move on, by giving them the recurrence relation, and then I would ask them to code it.

There’s one problem that comes up in coding, which is interesting and important. The most naive code for gray is something like:

def gray(n):
  print("gray(%s)" % n)
  if n == 0:
    return "0"
  if n == 1:
    return "1"
  num_digits = math.floor(math.log(n, 2)) + 1
  return "1" + gray(int(2**num_digits - 1 - n))

That’s very close, but not right. If you call gray(10), you get the right answer.
If you call gray(4), you get “110”, which is correct. But if you call gray(8), you’d get “110”, when you should have gotten 1010.

Most candidates make this mistake. And then I ask them to trace it through on 8 as an example. They usually see what the problem is. If they don’t, then that’s another red flag.

If they’re really struggling to put together a recursive function, then I’d ask them to just write a function to convert an integer into standard binary. If they can do that, then I start making suggestions about how to convert that to do gray code.

The big indicator here is whether or not they can write a simple recursive function at all. The systems I was working on at the time made heavy use of recursion – if a candidate couldn’t write recursive code, there was simply no way that they’d be able to do the job. (It was depressing to see just how many qualified-looking candidates came in, but who couldn’t write a simple recursive function.)

Through this whole process, how well they were able to talk about what they were doing was as important as the solution they came up with. If they heard the question, and immediately wrote down perfect, beautiful code, but they couldn’t explain how they got it, or how it worked? They’d get a mediocre rating, which wouldn’t get them a job offer. If they made a lot of mistakes in their code, but they were crystal clear about explaining how they worked out a solution, and how it worked? They’d probably get a better rating than the perfect code candidate.

I hope this shines a bit of light on this kind of interview question. While it’s necessarily artificial, it’s artifical in a way that we hope can help us learn more about the candidate. It’s not a trick question that’s irrelevant to the job, like “Why are manholes round?”: this is an attempt to probe at an area of knowledge that any candidate for a software engineering job should have. It’s not an all or nothing problem: even if you start off with no clue of how to approach it, I’m guiding you through. If you can’t solve it without help, this problem gives me some insight into what it is that you don’t understand, and hopefully whether or not that’s going to be a problem if we hire you.

Is it a great way of doing interviews? Honestly, no. But it’s the best way we know of doing it.

As an interview system, it doesn’t do a good job of identifying the very best people to hire. There’s no correlation between outstanding interview performance and outstanding on-the-job performance. But there’s a strong correlation between poor performance on this kind of interview question and poor performance on the job. Great performers on the job show the same distribution of interview performance as average ones; but poor performers on interviews show a significantly negative-shifted job performance distribution.

We haven’t found a way of interviewing people that does a better job than this. It’s the best we have. Statistically, it works far better at selecting people than “open-ended” interviews that don’t involve any kind of practical programming exercise. So for all of its faults, it’s better that the alternatives.

I’m sure there are pedants out there who are asking “So what’s the correct implementation of gray code?” It’s totally not the point of talking about it, but here’s one sloppy but correct implementation. This isn’t the quality of code I would ever use for anything serious at work, but it’s perfectly adequate for an interview.

import math

def gray(n):
    def required_digits(n):
        """Compute the number of digits required to 
        represent a number in binary
        return int(math.floor(math.log(n, 2))) + 1

    def pad_digits(gray, num_digits):
        if len(gray) < num_digits:
            return "0"*(num_digits - len(gray)) + gray
        return gray

    if n == 0:
        return "0"
    if n == 1:
        return "1"
    num_digits = int(math.floor(math.log(n, 2)) + 1)
    return "1" + pad_digits(gray(int(2**num_digits - 1 - n)), num_digits - 1)

Lychrel Numbers and the 196 Algorithm

The first donor to the Chesapeake Math program asked me to write about the 196 algorithm, a problem also known as the mystery of the Lychrel numbers. To be completely honest, that’s something that people have asked for in the past, but I’ve never written about it, because I didn’t see what I could add. But I made a promise, so… here goes!

Take any number with at least two-digits, N. Reverse the digits of N, giving you M, and then add N+M. That gives you a new number. If you keep repeating this process, most of the time, you’ll get a palindromic number really fast.

For example, let’s start with 16:

  1. 16 reversed is 61.
  2. 16+61=77. 77 is a palindromic number.
  3. So one reverse+add, and we have a palindromic number.

Or 317:

  1. 317 reversed is 713.
  2. 317+713=1030.
  3. 1030 reversed is 301.
  4. 1030 + 301 = 1331, so we have a palindromic number after two steps.

You can play the same game in different number bases. For example, in base 8, we can start with 013 (11 in base-10): in one reverse+add, it becomes 44 (36 in base-10).

For most numbers, you get a palindrome in just a few steps. But for some numbers, in some number bases, you don’t. If you can’t ever get to a palindrome by doing reverse+add starting from a number, then that number is called a Lychrel number.

The process of exploring Lychrel numbers has picked up a lot of devotees, who’ve developed a whole language for talking about it:

A chain is the sequence of numbers produced by reverse+add starting with some number, and possibly converging on a palindromic number.
The seed of a chain is the smallest number that can start that chain. For example, in the example above, we looked at the chain [217, 1030, 1331]. 217 is the seed – even though you could start a chain at 1030, 1030 would never be considered a seed, because it can be the product of a reverse+add on a smaller number.
Two numbers are kin numbers if they’re part of a chain that started at the same seed. If a number is a lychrel number, then (obviously) so are all of its kin.

The question that haunts Lychrel enthusiasts is, will you always eventually get a palindrome? That is, do Lychrel numbers actually exist?

In base-2, we know the answer to that: not all numbers will produce a palindrome; there are base-2 Lychrel numbers. The smallest base-22 Lychrel number is 22 – or 10110 in binary. We can look at its reverse add sequence, and see intuitively why it will never produce a palindrome:

  1. 10110
  2. 100011
  3. 1010100
  4. 1101001
  5. 10110100 (10, 11, 01, 00)
  6. 11100001
  7. 101101000
  8. 110010101
  9. 1011101000 (10, 111, 01, 000)
  10. 110010101
  11. 1101000101
  12. 10111010000
  13. 0b11000101101
  14. 0b101111010000 (10, 1111, 01, 0000)

Starting at step 5, we start seeing a pattern in the sequence, where we have recurring values where that have a pattern of 10, followed by m-1s, followed by 01, followed by m 0s. We’ve got a sequence that’s building larger and larger numbers, in a way that will never converge into a palindrome.

We can find similar sequences in any power-of-two base – base 4, 8, 16, etc. So in power-of-two bases, there are Lychrel numbers. But: are there Lychrel numbers in our familiar base-10?

We think so, but we’re not sure. No one has been able to prove it either way. But we’ve got some numbers, which we call Lychrel candidates, that we think are probably Lychcrel numbers. The smallest one is 196 – which is why this whole discussion is sometimes called the 196 problem, or the 196 algorithm.

People have written programs that follow the Lychrel thread from 196, trying to see if it reaches a palindrome. So far, the record for exploring the 196 Lychrel thread carries it through more than a billion iterations, producing a non-palindromic number with more than 6 million digits.

That’s pretty impressive, given that the longest Lychrel thread for any number smaller than 196 is the thread of 24 steps, starting with 89 (which produces the palindromic number 8,813,200,023,188).

From my perspective, one thing that interests me about this is its nature as a computational problem. As a problem, it’s really easy to implement. For example, here’s a complete implementation in Ratchet, a Scheme-based programming language. (I used ratchet because it features infinite-precision integers, which makes it easier to write.)

(define (reverse-number n)
  (string->number (list->string (reverse (string->list (number->string n))))))

(define (palindromic? n)
  (equal? n (reverse-number n)))

(define (reverse-add n)
  (+ n (reverse-number n)))

(define (find-palindrome seed)
  (define (noisy-find-palindrome n count)
    (if (palindromic? n)
        (printf "At iteration ~v, candidate=~v~n" count n)
                (noisy-find-palindrome (reverse-add n) (+ count 1)))))
  (noisy-find-palindrome seed 0))

I literally threw that together in less than five minutes. In that sense, this is a really, really easy problem to solve. But in another sense, it’s a very hard problem: there’s no way to really speed it up.

In modern computing, when we look at a problem that takes a lot of computation to solve, the way that we generally try to approach it is to throw more CPUs at it, and do it in parallel. For most problems that we come across, we can find some reasonable way to divide it into parts that can be run at the same time; then by having a bunch of computers work on those different parts, we can get a solution pretty quickly.

For example, back in the early days of this blog, I did some writing about the Mandelbrot set, and one variant of it that I find particularly interesting, called the Buddhabrot. The Buddhabrot is interesting because it’s a fractal visualization which isn’t naively zoomable. In a typical Mandelbrot set visualization, you can pick a small region of it that you want to see in more detail, and focus your computation on just that part, to get a zoomed in view on that. In the Buddhabrot, due to the chaotic nature of the computation, you can’t. So you just compute the Buddhabrot at a massive size, and then you compress it. When you want to see a region in more detail, you un-compress. To make that work, buddhabrot’s are frequently computed at resolutions like 1 million by 1 million pixels. That translates to enough complex floating point computations to compute several trillion values. That’s a big task. But in modern environments, that’s routine enough that a friend of mine at Google wrote a program, just for kicks, to compute a big buddhabrot image, and ran it on an experimental cluster.

If that kind of computational capability can be exploited just for kicks, why is it that the best effort at exploring the Lychrel thread for 196 only covers 6 million digits?

The answer is that there’s a way of computing the Buddhabrot in parallel. You can throw 10,000 CPUs at it for a couple of days, and get an amazing Buddhabrot image. But for the Lychrel thread, there’s no good way to do it in parallel.

For each additional number in the thread, you need to rearrange and add a couple of million digits. That’s a lot of work, but it’s not crazy. On any halfway decent computer, it won’t take long. To get a sense, I just whipped up a Python program that generated 1,000,000 random pairs of digits, and added them up. It took under 8 seconds – and that’s half-assed code written using a not-particularly fast interpreted language on a not-particularly-speedy laptop. A single iteration of the Lychrel thread computation even for a million-digit candidate doesn’t take that long – it’s on the order of seconds.

The catch is that the process of searching a Lychrel thread is intrinsically serial: you can’t have different CPUs computing the next thread element for different values: you don’t know the next value until you’ve finished the previous one. So even if it took just 1 second to do the reverse+add for million digit numbers, it would takes a long time to actually explore the space. If you want to explore the next million candidates, at 2 seconds per iteration, that will take you around 3 weeks!

Even if you don’t waste time by using a relatively slow interpreter like Python – even if you use carefully hand-optimized code using an efficient algorithm, it would take months at the very least to explore a space of billions of elements of the 196 thread! And to get beyond what anyone has done, you’ll probably need to end up doing years of computation, even with a very fast modern computer – because you can’t parallelize it.

If you’re interested in this kind of thing, I recommend looking at Wade Van Landingham’s p196 site. Wade is the guy who named them Lychrel numbers (based on a rough reversal of his girlfriend (now wife)’s name, Cheryl). He’s got all sorts of data, including tracking some of the longest running efforts to continue the 196 thread.

A couple of edits were made to this, based on error pointed out by commenters.

Support some students!

I’ve been terrible about updating the blog lately. Personal life interferes sometimes, but I’m trying to build up a backlog of posts so that I can get the blog moving again without too much pressure.

In the meantime, I’ve got a request for you folks.

Through this blog, I met a really great SFF writer named Catherine Asaro. Catherine both writes, and also teaches math. She’s running a GoFundMe for her students, who participate in extracurricular math activities – clubs, classes and competitions.

They need money to cover travel expenses for going to math league competitions.

These kids are us: they’re the math geeks of the future. They need help from the math geeks of today. So go, give them a couple of bucks, help them out!

An extra little bit of incentive: the first five people who donate more than $25 can pick a topic, from either math or computer science, and I’ll write a blog post about it. Send me a copy of the thank-you letter you get from GoFundMe to show that you contributed, along with the topic you’d like me to write about.

These thank-you posts for contributing will skip to the top of my priority queue, so you’ll see them soon!

Recipe: Za Jia Mien, aka “Chinese Spaghetti”

This is a traditional chinese dish. It’s affectionately known as chinese spaghetti. My wife has been telling me about it for years, and I’ve been trying to reproduce it for years. I’ve finally gotten it to the point where she’s happy with it.

It’s a fun dish: it’s a make-your-own. You prepare the noodles and sauce and a bunch of toppings, and then each guest adds sauce and toppings to their own taste. I’ve seen a couple of variations on the set of toppings you can serve with it: cucumbers, carrots, raw onions, shredded ginger, sliced garlic, and bean sprouts all work. I’ve also seen one version that served it with black chinese vinegar, which was really strange.


  • 4 chicken thighs, ground coarsely.
  • 2 tablespoons tobanjiang (chinese spicy bean sauce, available at asian groceries.)
  • 1/4 plus a couple of tablespoons cup soy sauce.
  • 1 cup chicken stock.
  • 1/2 cup sake or shaoshing wine.
  • 1 large onion, minced.
  • 2 cloves garlic, minced.
  • 1 large slice ginger, crushed and minced.
  • 1 cucumber
  • 1 carrot
  • 2 eggs
  • 2 scallions, green sliced, and whites finely minced.
  • 1 tablespoon sugar.
  • 1 tablespoon cornstarch mixed with some water.
  • 1/2 teaspoon sesame oil.
  1. Prepare the toppings
    1. Peel the cucumber, cut it in half, scrape out the seeds, and then cut it into thin sticks. Toss with salt, and set aside for 30 minutes, and then pour off the water that comes out.
    2. Peel the cucumber, and cut into thin sticks. Toss it with salt, and set aside for 30 minutes. Pour off the water that comes out.
    3. Finely slice the greens of the scallions, and set aside.
    4. Beat the two eggs with a bit of soy sauce. Put some oil into a hot non-stick pan, and pour in the eggs. Spread then into a thin layer that covers the pan. Cook until it’s mostly solid, then flip it. When it’s all cooked, move it to a cutting board, and cut into thin ribbons. Set aside.
  2. Make the sauce:
    1. Mix the ground chicken thighs with a tablespoon of soy sauce.
    2. Put 1 tablespoon of oil into a hot wok. Add the chicken, and stir fry until it starts to brown. Then remove it.
    3. Put 1 tablespoon of oil into a hot wok. Add the garlic and ginger, and stir fry until fragrant. Then add the onions and scallion whites, and stir until they soften and just barely start to brown.
    4. Scrape the onions to the sides of the pan, then add the tobanjiang to the oil that collects of the bottom of the pan. Stir it around for 30 seconds.
    5. Add the chicken back in, and then add the sake.
    6. When most of the sake has cooked off, add the chicken stock, soy sauce, sugar, and a cup of water.
    7. Reduce the heat to a simmer, and then let it cook for about 20 minutes.
    8. Thicken with some cornstarch, and add the sesame oil. Remove from heat, put into a bowl, and set aside.
  3. Cook some noodles. This will work with a variety of different types of asian noodles: fresh ramen, lo mien, shanghai noodle. Cook according to the instructions on the package. It’s best if it’s a bit undercooked. Drain them, and then toss with a bit of oil to stop them from sticking together.

To serve, you put the sauce into a bowl, and put it and all of the toppings onto the table. Then put a pile of noodles onto each persons plate. Then have each guest add as much sauce and toppings as they want, and dig in.

One plus one equals Two?

My friend Dr24hours sent me a link via twitter to a new piece of mathematical crackpottery. It’s the sort of thing that’s so trivial that I might lust ignore it – but it’s also a good example of something that someone commented on in my previous post.

This comes from, of all places, Rolling Stone magazine, in a puff-piece about an actor named Terrence Howard. When he’s not acting, Mr. Howard believes that he’s a mathematical genius who’s caught on to the greatest mathematical error of all time. According to Mr. Howard, the product of one times one is not one, it’s two.

After high school, he attended Pratt Institute in Brooklyn, studying chemical engineering, until he got into an argument with a professor about what one times one equals. “How can it equal one?” he said. “If one times one equals one that means that two is of no value because one times itself has no effect. One times one equals two because the square root of four is two, so what’s the square root of two? Should be one, but we’re told it’s two, and that cannot be.” This did not go over well, he says, and he soon left school. “I mean, you can’t conform when you know innately that something is wrong.”

I don’t want to harp on Mr. Howard too much. He’s clueless, but sadly, he’s a not too atypical student of american schools. I’ll take a couple of minutes to talk about what’s wrong with his stuff, but in context of a discussion of where I think this kind of stuff comes from.

In American schools, math is taught largely by rote. When I was a kid, set theory came into vogue, but by and large math teachers didn’t understand it – so they’d draw a few meaningless Venn diagrams, and then switch into pure procedure.

An example of this from my own life involves my older brother. My brother is not a dummy – he’s a very smart guy. He’s at least as smart as I am, but he’s interested in very different things, and math was never one of his interests.

I barely ever learned math in school. My father noticed pretty early on that I really enjoyed math, and so he did math with me for fun. He taught me stuff – not as any kind of “they’re not going to teach it right in school”, but just purely as something fun to do with a kid who was interested. So I learned a lot of math – almost everything up through calculus – from him, not from school. My brother didn’t – because he didn’t enjoy math, and so my dad did other things with him.

When we were in high school, my brother got a job at a local fast food joint. At the end of the year, he had to do his taxes, and my dad insisted that he do it himself. When he needed to figure out how much tax he owed on his income, he needed to compute a percentage. I don’t know the numbers, but for the sake of the discussion, let’s say that he made $5482 that summer, and the tax rate on that was 18%. He wrote down a pair of ratios:

\frac{18}{100} = \frac{x}{5482}

And then he cross-multiplied, getting:

 18 \times 5482 = 100 \times x

 98676 = 100 \times x

and so x = 986.76.

My dad was shocked by this – it’s such a laborious way of doing it. So he started pressing at my brother. He asked him, if you went to a store, and they told you there was a 20% off sale on a pair of jeans that cost $18, how much of a discount would you get? He didn’t know. The only way he knew to figure it out was to do the whole ratios, cross-multiply, and solve. If you told him that 20% off of $18 was $5, he would have believed you. Because percentages just didn’t mean anything to him.

Now, as I said: my brother isn’t a dummy. But none of his math teachers had every taught him what percentages meant. He had no concept of their meaning: he knew a procedure for getting the value, but it was a completely blind procedure, devoid of meaning. And that’s what everything he’d learned about math was like: meaningless procedures performed by rote, without any comprehension.

That’s where nonsense like Terence Howard’s stuff comes from: math education that never bothered to teach students what anything means. If anyone had attempted to teach any form of meaning for arithmetic, the ridiculous of Mr. Howard’s supposed mathematics would be obvious.

For understanding basic arithmetic, I like to look at a geometric model of numbers.

Put a dot on a piece of paper. Label it “0”. Draw a line starting at zero, and put tick-marks on the line separated by equal distances. Starting at the first mark after 0, label the tick-marks 1, 2, 3, 4, 5, ….

In this model, the number one is the distance from 0 (the start of the line) to 1. The number two is the distance from 0 to 2. And so on.

What does addition mean?

Addition is just stacking lines, one after the other. Suppose you wanted to add 3 + 2. You draw a line that’s 3 tick-marks long. Then, starting from the end of that line, you draw a second line that’s 2 tick-marks long. 3 + 2 is the length of the resulting line: by putting it next to the original number-line, we can see that it’s five tick-marks long, so 3 + 2 = 5.


Multiplication is a different process. In multiplication, you’re not putting lines tip-to-tail: you’re building rectangles. If you want to multiply 3 * 2, what you do is draw a rectangle who’s width is 3 tick-marks long, and whose height is 2 tick-marks long. Now divide that into squares that are 1 tick-mark by one tick-mark. How many squares can you fit into that rectangle? 6. So 3*2 = 6.


Why does 1 times 1 equal 1? Because if you draw a rectangle that’s one hash-mark wide, and one hash-mark high, it forms exactly one 1×1 square. 1 times 1 can’t be two: it forms one square, not two.

If you think about the repercussions of the idea that 1*1=2, as long as you’re clear about meanings, it’s pretty obvious that 1*1=2 has a disastrously dramatic impact on math: it turns all of math into a pile of gibberish.

What’s 1*2? 2. 1*1=2 and 1*2=2, therefore 1=2. If 1=2, then 2=3, 3=4, 4=5: all integers are equal. If that’s true, then… well, numbers are, quite literally, meaningless. Which is quite a serious problem, unless you already believe that numbers are meaningless anyway.

In my last post, someone asked why I was so upset about the error in a math textbook. This is a good example of why. The new common core math curriculum, for all its flaws, does a better job of teaching understanding of math. But when the book teaches “facts” that are wrong, what they’re doing becomes the opposite. It doesn’t make sense – if you actually try to understand it, you just get more confused.

That teaches you one of two things. Either it teaches you that understanding this stuff is futile: that all you can do is just learn to blindly reproduce the procedures that you were taught, without understanding why. Or it teaches you that no one really understands any of it, and that therefore nothing that anyone tells you can possibly be trusted.

Bad Math Books and Cantor Cardinality

A bunch of readers sent me a link to a tweet this morning from Professor Jordan Ellenberg:

The tweet links to the following image:

(And yes, this is real. You can see it in context here.)

This is absolutely infuriating.

This is a photo of a problem assignment in a math textbook published by an imprint of McGraw-Hill. And it’s absolutely, unquestionably, trivially wrong. No one who knew anything about math looked at this before it was published.

The basic concept underneath this is fundamental: it’s the cardinality of sets from Cantor’s set theory. It’s an extremely important concept. And it’s a concept that’s at the root of a huge amount of misunderstandings, confusion, and frustration among math students.

Cardinality, and the notion of cardinality relations between infinite sets, are difficult concepts, and they lead to some very un-intuitive results. Infinity isn’t one thing: there are different sizes of infinities. That’s a rough concept to grasp!

Here on this blog, I’ve spent more time dealing with people who believe that it must be wrong – a subject that I call Cantor crackpottery – than with any other bad math topic. This error teaches students something deeply wrong, and it encourages Cantor crackpottery!

Let’s review.

Cantor said that two collections of things are the same size if it’s possible to create a one-to-one mapping between the two. Imagine you’ve got a set of 3 apples and a set of 3 oranges. They’re the same size. We know that because they both have 3 elements; but we can also show it by setting aside pairs of one apple and one orange – you’ll get three pairs.

The same idea applies when you look at infinitely large sets. The set of positive integers and the set of negative integers are the same size. They’re both infinite – but we can show how you can create a one-to-one relation between them: you can take any positive integer i, and map it to exactly one negative integer, 0 - i.

That leads to some unintuitive results. For example, the set of all natural numbers and the set of all even natural numbers are the same size. That seems crazy, because the set of all even natural numbers is a strict subset of the set of natural numbers: how can they be the same size?

But they are. We can map each natural number i to exactly one even natural number 2i. That’s a perfect one-to-one map between natural numbers and even natural numbers.

Where it gets uncomfortable for a lot of people is when we start thinking about real numbers. The set of real numbers is infinite. Even the set of real numbers between 0 and 1 is infinite! But it’s also larger than the set of natural numbers, which is also infinite. How can that be?

The answer is that Cantor showed that for any possible one-to-one mapping between the natural numbers and the real numbers between 0 and 1, there’s at least one real number that the mapping omitted. No matter how you do it, all of the natural numbers are mapped to one value in the reals, but there’s at least one real number which is not in the mapping!

In Cantor set theory, that means that the size of the set of real numbers between 0 and 1 is strictly larger than the set of all natural numbers. There’s an infinity bigger than infinity.

I think that this is what the math book in question meant to say: that there’s no possible mapping between the natural numbers and the real numbers. But it’s not what they did say: what they said is that there’s no possible map between the integers and the fractions. And that is not true.

Here’s how you generate the mapping between the integers and the rational numbers (fractions) between 0 and 1, written as a pseudo-Python program:

 i = 0
 for denom in Natural:
   for num in 1 .. denom:
      if num is relatively prime with denom:
         print("%d => %d/%d" % (i, num, denom))
         i += 1

It produces a mapping (0 => 0, 1 => 1, 2 => 1/2, 3 => 1/3, 4 => 2/3, 5 => 1/4, 6 => 3/4, …). It’ll never finish running – but you can easily show that for any possible fraction, there’ll be exactly one integer that maps to it.

That means that the set of all rational numbers between 0 and 1 is the same size as the set of all natural numbers. There’s a similar way of producing a mapping between the set of all fractions and the set of natural numbers – so the set of all fractions is the same size as the set of natural numbers. But both are smaller than the set of all real numbers, because there are many, many real numbers that cannot be written as fractions. (For example, \pi. Or the square root of 2. Or e. )

This is terrible on multiple levels.

  1. It’s a math textbook written and reviewed by people who don’t understand the basic math that they’re writing about.
  2. It’s teaching children something incorrect about something that’s already likely to confuse them.
  3. It’s teaching something incorrect about a topic that doesn’t need to be covered at all in the textbook. This is an algebra-2 textbook. You don’t need to cover Cantor’s infinite cardinalities in Algebra-2. It’s not wrong to cover it – but it’s not necessary. If the authors didn’t understand cardinality, they could have just left it out.
  4. It’s obviously wrong. Plenty of bright students are going to come up with the the mapping between the fractions and the natural numbers. They’re going to come away believing that they’ve disproved Cantor.

I’m sure some people will argue with that last point. My evidence in support of it? I came up with a proof of that in high school. Fortunately, my math teacher was able to explain why it was wrong. (Thanks Mrs. Stevens!) Since I write this blog, people assume I’m a mathematician. I’m not. I’m just an engineer who really loves math. I was a good math student, but far from a great one. I’d guess that every medium-sized high school has at least one math student every year who’s better than I was.

The proof I came up with is absolutely trivial, and I’d expect tons of bright math-geek kids to come up with something like it. Here goes:

  1. The set of fractions is a strict subset of the set of ordered pairs of natural numbers.
  2. So: if there’s a one-to-one mapping between the set of ordered pairs and the naturals, then there must be a one-to-one mapping between the fractions and the naturals.
  3. On a two-d grid, put the natural numbers across, and then down.
  4. Zigzag diagonally through the grid, forming pairs of the horizontal position and the vertical position: (0,0), (1, 0), (0, 1), (2, 0), (1, 1), (0, 2), (3, 0), (2, 1), (1, 2), (0, 3).
  5. This will produce every possible ordered pair of natural numbers. For each number in the list, produce a mapping between the position in the list, and the pair. So (0, 0) is 0, (2, 0) is 3, etc.

As a proof, it’s sloppy – but it’s correct. And plenty of high school students will come up with something like it. How many of them will walk away believing that they just disproved Cantor?

Not a theory! Really! It’s not a theory!

I know I’ve been terrible about updating my blog lately. I’ve got some good excuses. (The usual: very busy with work. The less usual: new glasses that I’m having a very hard time adapting to. Getting old sucks. My eyes have deteriorated to the point where my near vision is shot, and the near-vision correction in my lenses needed to get jumped pretty significantly, which takes some serious getting used to.) And getting discussions of type theory right is a lot of work. Type theory in particular takes a lot of work, because it’s a subject that I really want to get right, because it’s so important in my profession, and because so few people have actually written about it in a way that’s accessible to non-mathematicians.

Anyway: rest assured that I’m not dropping the subject, and I hope to be getting back to writing more very soon. In the meantime, I decided to bring you some humorous bad math.

Outside the scientific community, one of the common criticisms of science is that scientific explanations are “just a theory”. You hear this all the time from ignorant religious folks trying to criticize evolution or the big bang (among numerous other things). When they say that something is just a theory, what they mean is that it’s not a fact, it’s just speculation. They don’t understand what the word “theory” really means: they think that a theory and a fact are the same class of things – that an idea starts as a theory, and becomes a fact if you can prove it.

In science, we draw a distinction between facts and theories, but it’s got nothing to do with how “true” something is. A fact is an observation of something that happens, but doesn’t say why it happens. The sun produces light. That’s a fact. The fact doesn’t say why that happens. It doesn’t have to say how it happens. But it does. That’s an observation of a fact. A theory is an explanation of a set of facts. The combined gravitational force of all of the particles in the sun compress the ones in the center until quantum tunnelling allows hydrogen atoms to combine and fuse producing energy, which eventually radiates as the heat and light that we observe. The theory of solar hydrogen fusion is much more than the words in the previous sentence: it’s an extensive collection of evidence and mathematics that explains the process in great detail. Solar hydrogen fusion – mathematical equations and all – is a theory that explains the heat and light that we observe. We’re pretty sure that it’s true – but the fact that it is true doesn’t mean that it’s not a theory.

Within the scientific community, we criticize crackpot ideas by saying that they’re not a theory. In science, a theory means a well-detailed and tested hypothesis that explains all of the known facts about something, and makes testable predictions. When we say that something isn’t a theory, we mean that it doesn’t have the supporting evidence, testability, or foundation in fact that would be needed to make something into a proper theory.

For example, intelligent design doesn’t qualify as a scientific theory. It basically says “there’s stuff in the world that couldn’t happen unless god did it”. But it never actually says how, precisely, to identify any of those things that couldn’t happen without god. Note that this doesn’t mean that it’s not true. I happen to believe that it’s not – but whether it’s true or not has nothing to do with whether, scientifically, in qualifies as a theory.

That’s a very long, almost Oracian introduction to today’s nonsense. This bit of crackpottery, known as “the Principle of Circlon Synchronicity”, written by one James Carter, has one really interesting property: I agree, 100%, with the very first thing that Mr. Carter says about his little idea.

The Principle of Circlon Synchronicity is not a Theory

They’re absolutely correct. It’s not a theory. It’s a bundle of vague assumptions, tied together by a shallow pretense at mathematics.

The “introduction” to this “principal” basically consists of the author blindly asserting that a bunch of things aren’t theories. For example, his explanation for why the principal of circlon synchronicity is not a theory begins with:

There are many different theories that have been used to explain the nature of reality. Today, the most popular of these are quantum mechanics, special relativity, string theories, general relativity and the Big Bang. Such theories all begin with unmeasured metaphysical assumptions such as fields and forces to explain the measurements of various phenomena. Circlon synchronicity is a purely mechanical system that explains local physical measurements. You only need a theory to explain physical measurements in terms of non-local fields, forces and dimensions.

This is a novel definition of “theory”. It has absolutely nothing to do with what the rest of us mean by the word “theory”. Basically, he thinks that his explanations, because they are allegedly simple, mechanical, and free of non-local effects, aren’t theories. They’re principals.

The list of things that don’t need a theory, according to Mr. Carter, is extensive.
For example:

The photon is not a theory. The photon is a mechanical measurement of mass. The photon is a conjoined matter-antimatter pair that is the basic form of mass and energy in the Living Universe. All photons move at exactly the speed of light relative to one another within the same absolute space. Photons are produced when a proton and electron are joined together to form a hydrogen atom. The emission of a photon is a mini annihilation with part of the electron and part of the proton being carried away by the photon. A photon with mass and size eliminates the need for both Planck’s constant and the Heisenberg uncertainty principle and also completely changes the meaning of the equation E=MC2. This is not a theory of a photon. It is the measurements describing the nature of the photon.

This is where we start on the bad math.

A photon is a quantum of light, or some other form of electromagnetic radiation. It doesn’t have any mass. But even if it didn’t: a photon isn’t a measurement. A photon is a particle (or a wave, depending on how you deal with it.) A measurement is a fundamentally different thing. If you want to do math that describes the physical universe, you’ve got to be damned careful about your units. If you’re measuring mass, you your units need to be mass units. If you’re describing mass, then the equations that derive your measurements of mass need to have mass units. If a photon is a measurement of mass, then what’s its unit?

Further, you can’t take an equation like e=mc^2, and rip it out of context, while asserting that it has exactly the same meaning that it did in its original context. Everyone has seen that old equation, but very few people really understand just what it means. Mr. Carter is not part of that group of people. To him, it’s just something he’s seen, which he knows is sciency, and so he grabs on to it and shouts about it in nonsensical ways.

But note, importantly, that even here, what Mr. Carter is doing isn’t science. He’s absolutely right when he says it’s not a theory. He asserts that the whole meaning of e=mc^2 changes because of his new understanding of what light is; but he doesn’t ever bother to explain just what that new understanding is, or how it differs from the old one.

He makes some hand-waves about how you don’t need the uncertainty principle. If his principles had a snowballs chance in hell of being correct, that might be true. The problem with that assertion is that the uncertainty principle isn’t just a theory. It’s a theory based on observations of facts that absolutely require explanations. There’s a great big fiery-looking ball up in the sky that couldn’t exist without uncertainty. Uncertainty isn’t just a pile of equations that someone dreamed up because it seemed like fun. It’s a pile of equations that were designed to try to explain the phenomena that we observe. There are a lot of observations that demonstrate the uncertainty principle. It doesn’t disappear just because Mr. Carter says it should. He needs to explain how his principles can account for the actual phenomena we observe – not just the phenomena that he wants to explain.

Similarly, he doesn’t like the theory of gravity.

We do not need a theory of gravity to explain exactly how it works. Gravity is a simple measurement that plainly shows exactly what gravity does. We use accelerometers to measure force and they exactly show that gravity is just an upwardly pointing force caused by the physical expansion of the Earth. The gravitational expansion of matter does not require a theory. It is just the physical measurement of gravity that shows exactly how it works in a completely mechanical way without any fields or non-local interactions. You only need a theory to explain a non-local and even infinite idea of how gravity works in such a way that it can’t be directly measured. Gravity not a theory.

Once again, we see that he really doesn’t understand what theory means. According to him, gravity can be measured, and therefore, it’s not a theory. Anything that can be measured, according to Mr. Carter, can’t be a theory: if it’s a fact, it can’t be a theory; even more, if it’s a fact, it doesn’t need to be explained at all. It’s sort-of like the fundamentalists idea of a theory, only slightly more broken.

This is where you can really see what’s wrong with his entire chain of reasoning. He asserts that gravity isn’t a theory – and then he moves in to an “explanation” of how gravity works which simply doesn’t fit.

We do not need a theory of gravity to explain exactly how it works. Gravity is a simple measurement that plainly shows exactly what gravity does. We use accelerometers to measure force and they exactly show that gravity is just an upwardly pointing force caused by the physical expansion of the Earth. The gravitational expansion of matter does not require a theory. It is just the physical measurement of gravity that shows exactly how it works in a completely mechanical way without any fields or non-local interactions. You only need a theory to explain a non-local and even infinite idea of how gravity works in such a way that it can’t be directly measured. Gravity not a theory.

The parade of redefinitions marches on! “Exactly” now means “hand-wavy”.

We’re finally getting to the meat of Mr. Carter’s principle. He’s a proponent of the same kind of expanding earth rubbish as Neal Adams. Gravity has nothing to do with non-local forces. It’s all just the earth expanding under us. Of course, this is left nice and vague: he mocks the math behind the actual theory of gravity, but he can’t actually show that his principal works. He just asserts that he’s defined exactly how it works by waving his hands really fast.

I can disprove his principle of gravity quite easily, by taking my phone out of my pocket, and opening Google maps.

In 5 seconds flat (which is longer than it should take!), Google maps shows me my exact position on the map. It does that by talking to a collection of satellites that are revolving around the earth. The positions of those satellites are known with great accuracy. They circle the earth without the use of any sort of propellant. If Mr. Carter (or Mr. Adams, who has a roughly equivalent model) were correct – if gravity was not, in fact, a force attracting mass to other masses, but instead was an artifact of an expanding earth – then the “satellites” that my phone receives data from would not be following an elliptical path around the earth. They’d be shooting off into the distance, moving in a perfectly straight line. But they don’t move in a straight line. They continue to arc around the earth, circling around and around, without any propulsion.

In any reasonable interpretation of the expanding earth? That doesn’t make sense. There’s no way for them to orbit. Satellites simply can’t work according to his theory. And yet, they do.

Of course, I’m sure that Mr. Carter has some hand-wavy explanation of just why satellites work. The problem is, whatever explanation he has isn’t a theory. He can’t actually make predictions about how things will behave, because his principles aren’t predictive.

In fact, he even admits this. His whole screed turns out to be a long-winded advertisement for a book that he’ll happily sell you. As part of the FAQ for his book, he explains why (a) he can’t do the math, and (b) it doesn’t matter anyway:

The idea that ultimate truth can be represented with simple mathematical equations is probably totally false. A simple example of this is the familiar series of circular waves that move away from the point where a pebble is dropped into a quiet pool of water. While these waves can be described in a general way with a simple set of mathematical equations, any true and precise mathematical description of this event would have to include the individual motion of each molecule within this body of water. Such an equation would require more than the world’s supply of paper to print and its complexity would make it virtually meaningless.

The idea of the circlon is easy to describe and illustrate. However, any kind of mathematical description of its complex internal dynamics is presently beyond my abilities. This deficiency does not mean that circlon theory cannot compete with the mathematically simplistic point-particle and field theories of matter. It simply means that perhaps ultimate truth is not as easily accessible to a mathematical format as was once hoped.

It’s particularly interesting to consider this “explanation” in light of some recent experiments in computational fluid dynamics. Weather prediction has become dramatically better in the last few years. When my father was a child, the only way to predict when a hurricane would reach land was to have people watching the horizon. No one could make accurate weather predictions at all, not even for something as huge as a storm system spanning hundreds of miles! When I was a child, weathermen rarely attempted to predict more than 2 days in advance. Nowadays, we’ve got 7-day forecasts that are accurate more often than the 2-day forecasts were a couple of decades ago. Why is that?

The answer is something called the Navier Stokes equations. The Navier-Stokes equations are a set of equations that describe how fluids behave. We don’t have the computational power or measurement abilities to compute N-S equations to the level of single molecules – but in principle, we absolutely could. The N-S equations – which demonstrably work remarkably well even when you’re just computing approximations – also describe exactly the phenomenon that Mr. Carter asserts can’t be represented with mathematical equations.

The problem is: he doesn’t understand how math or science work. He has no clue of how equations describe physical phenomena in actual scientific theories. The whole point of math is that it gives you a simple but precise way of describing complex phenomena. A wave in a pool of water involves the motion of an almost unimaginable number of particles, with a variety of forces and interactions between those particles. But all of them can be defined by reasonably simple equations.

Mr. Carter’s explanations are, intuitively, more attractive. If you really want to understand relativity, you’re going to need to spend years studying math and physics to get to the point where its equations make sense to you. But once you do, they don’t just explain things in a vague, hand-wavy way – they tell you exactly how things work. They make specific, powerful, precise predictions about how things will behave in a range of situations that match reality to the absolute limits of our ability to measure. Mr. Carter’s explanations don’t require years of study; they don’t require to study esoteric disciplines like group theory or tensor theory. But they also can’t tell you much of anything. Relativity can tell you exactly what adjustment you need to make to a satellite’s clock in order to make precise measurements of the location of a radio receiver on the ground. Mr. Carter’s explanations can’t even tell you how the satellite got there.

The beauty of math; the humor of stupidity.

%d bloggers like this: