I was recently fortunate enough to get a review copy of Cory Doctorow’s new book, <a href="Little Brother“>”Little Brother”. I’ve never read Doctorow before, but the book

was edited by Patrick Neilsen Hayden, who I think is the best editor in

the business, and Patrick says that this book is one of the best things

he’s ever worked on. In his words, it’s “one of the books that, should I happen to be run down by a beer truck next tuesday, I’d most like to be remembered for having helped into print”. So when Patrick posted on his blog that he had review copies available, I jumped at the chance.

# Monthly Archives: April 2008

# Implementing Compact Binary Heaps

Last post, I described the basic idea of the binary heap data

structure. But I didn’t talk about how to implement it – and there’s

a very cool way of implementing it – optimally space efficient,

very fast, and with respectable cache locality for moderate sized

datasets.

# Binary Heaps

One of the most neglected data structures in the CS repertoire is

the heap. Unfortunately, the jargon is overloaded, so “heap” can mean

two different things – but there is a connection, which I’ll explain

in a little while.

In many programming languages, when you can dynamically create

objects in memory, the space from which you allocate the memory to

create them is called the heap. That is *not* what I’m talking about.

What I am talking about is an extremely simple but powerful structure

that is used for maintaining a collection of values from which you want

to be able to quickly remove the largest object quickly. This may sound

trivial, but this turns out to be an incredibly useful structure. You can use this basic structure to implement a very fast sorting algorithm; to maintain a prioritized set of values/tasks; to manage chunks of memory; etc. (The

prioritized set is the most common case in my experience, but that’s probably because I’ve spent so much of my career working on distributed systems.)

# Simple Games, Utility Functions, and Saddle Points

Last time I wrote about Game Theory, I explained the basic idea of

zero sum games. In their simplest form, a game can be described by a payoff matrix,where each dimension of the matrix is the set of *strategies* which can be selected by one player, and each entry in the matrix describes the payoffs that result when all players select their strategy. Informally, a game is zero-sum when the total payoff is zero – that is, when any positive payout to one player is balanced by payouts from other players.

We can formalize some of the basic ideas more precisely. What we can say is that each move results in a *state*; and for each player y, there is a function, u_{p}, called a utility function, which maps from a state to a utility value. Taking the player utility functions as components, we can define the game’s utility function in terms of tuples: for players 1..n,

the utility function maps from states to tuples of utility values: u(state)=(u_{1}, …, u_{n}).

# More Bad Bayesians: No ETs!

Remember when I talked about the problems with Bayesian probability? As you’ll probably recall, one of the things that drives me crazy about Bayesianism is that you get a constant stream of crackpots abusing it. Since the basic assumption of Bayesian probability is that you can *always* use it, you’ll constantly get people abusing it.

Christopher Mims, who was one of the people running ScienceBlogs when I first signed on, sent me a classic example. A professor has published a paper in a journal called “Astrobiology”, arguing that there’s an exceedingly low probability of intelligent life elsewhere in the Universe.

# Asteroid Apophosis, Orbit Changes, and Boy (not)Geniuses

You might have heard the story that’s been going round about the asteroid

Apophis. This is an asteroid that was, briefly, considered by NASA to be a collision risk with earth. But after more observations to gather enough data to compute its orbit more precisely, the result was that it’s not a significant risk. The current NASA estimates are that it’s a collision risk of about one in 45,000.

The news around it is that some German kid claims to have figured out that

NASA got it wrong, and that the real risk is 1 in 450. What was NASA’s big

mistake, according to the kid?

He says that if the asteroid were to hit a satellite, that it would change the satellite’s trajectory enough to make it hit the earth.

This has been reported with ridiculous credulity. Anyone with the least

bit of mathematical literacy should know, pretty much without even needing to

think about it, that this is absolutely silly.

# XKCD and Friendly Numbers

I’ve been getting mail all day asking me to explain something

that appeared in today’s XKCD comic. Yes, I’ve been reduced to explaining geek comics to my readers. I suppose that there are worse fates. I just can’t

think of any. ðŸ™‚

But seriously, I’m a huge XKCD fan, and I don’t mind explaining interesting things no matter what the source. If you haven’t read today’s

comic, follow the link, and go look. It’s funny, and you’ll know what

people have been asking me about.

The comic refers to *friendly numbers*. The question,

obviously, is what are friendly numbers?

First, we define something called a divisors function over the integers, written σ(n). For any integer, there’s a set of integers that divide

into it. For example, for 4, that’s 1, 2, and 4. For 5, it’s just 1 and 5. And for 6, it’s 1, 2, 3, 6. The divisors function, σ(n) is the sum of all of the divisors of n. So

$ sigma(4)=8, sigma(5)=6, sigma(6)=12.$

For each integer, there is a *characteristic ratio*, defined

using the divisors function. For the integer n, the characteristic

is the ratio of the divisors function over the the number itself: σ(n)/n. So the characteristic ratio of 4 is 7/4; for 6, it’s

12/6=2.

For any characteristic ratio, the set of numbers that share that characteristic are *friendly* with each other. A friendly number is,

therefore, any integer that shares its characteristic ratio with at least one other integer. If an integer isn’t friendly, then it’s called a *solitary* number. 1, 2, 3, 4, and 5 are all solitary numbers. 6 is

friendly with 28 (1+2+4+7+14+28/28 = 56/28 = 2).

replaceMath( document.body );

# Ben Stein and Darwin: Truth is what matters.

Like the rest of the skeptical blogosphere, I’ve been watching the

uproar around Ben Stein’s new movie with a lot of amusement, but also with

a lot of disgust. There’s one thing that I feel compelled to comment on that

I think has, for some reason, not been addressed nearly enough.

# Probability Distributions

I’d like to start with a quick apology. Sorry that both the abstract algebra and the new game theory posts have been moving so slowly. I’ve been a bit overwhelmed lately with things that need doing *right away*, and

by the time I’m done with all that, I’m too tired to write anything that

requires a lot of care. I’ve known the probability theory stuff for so long,

and the parts I’m writing about so far are so simple that it really doesn’t take nearly as much effort.

With that out of the way, today, I’m going to write about probability distributions.

# Random Variables

The first key concept in probability is called a *random variable*.

Random variables are a key concept – but since they’re a key concept of the

frequentist school, they are alas, one of the things that bring out more of

the Bayesian wars. But the idea of the random variable, and its key position

in understanding probability and statistics predates the divide between

frequentist and Bayesian though. So please, folks, be a little bit patient,

and don’t bring the Bayesian flamewars into this post, OK? If you want to

rant about how stupid frequentist explanations are, please keep it in the comments here. I’m trying to

explain basic ideas, and you really can’t talk about probability and

statistics without talking about random variables.