Monthly Archives: November 2009

Advanced Haskell Data Structures: Red-Black Trees

unbalanced-trees.jpg

So, we’ve built up some pretty nifty binary trees – we can use the binary tree both as the basis of an implementation of a set, or as an implementation of a dictionary. But our implementation has had one major problem: it’s got absolutely no way to maintain balance.

What that means is that depending on the order in which things are inserted to the tree, we might have excellent performance, or we might be no better than a linear list. For example, look at these trees. As you can see, a tree with the same values can wind up quite different. In a good insert order, you can wind up with a nicely balanced tree: the minimum distance from root to leaf is 3; the maximum is 4. On the other hand, take the same values, and insert them in a different order and you get a rotten tree; the minimum distance from root to leaf is 1, and the maximum is 7. So depending on luck, you can get a tree that gives you good performance, or one that ends up giving you no better than a plain old list. Playing with a bit of randomization can often give you reasonably good performance on average – but if you’re using a tree, it’s probably because O(n) complexity is just too high. You want the O(lg n) complexity that you’ll get from a binary tree – and not just sometimes.

To fix that, you need to change the structure a bit, so that as you insert things, the tree stays balanced. There are several different approaches to how you can do this. The one that we’re going to look at is based on labeling nodes in ways that allow you to very easily detect when a serious imbalance is developing, and then re-arrange the tree to re-balance it. There are two major version of this, called the AVL tree, and the red-black tree. We’re going to look at the red-black. Building a red-black tree is as much a lesson in data structures as it is in Haskell, but along with learning about the structure, we’ll see a lot about how to write code in Haskell, and particularly about how to use pattern-matching for complex structures.

Continue reading

A Special Midweek Recipe: Ad-Libbed Cranberry Chutney

It’s not saturday, but I’ve got a recipe that I needed to write down before I forget it, so you’re getting an extra bonus.

I usually make a simple cranberry relish for thanksgiving. But it needs to be made a couple of days in advance. This year, I completely forgot about the cranberries until this morning. So I figured I needed to do something else. A good chutney sounded nice. I went hunting online, but couldn’t find anything that sounded good, so I went ahead and ad-libbed. And the results were amazing – this is definitely the new cranberry tradition in the Chu-Carroll household. Sweet, tart, and spicy – it’s a perfect compliment for the turkey.

Continue reading

Creating User-Defined Types in Haskell

  • (This is an edited repost of one of the posts from the earlier
    version of my Haskell tutorial.)
  • (This file is a literate haskell script. If you save it
    as a file whose name ends in “.lhs”, it’s actually loadable and
    runnable in GHCI or Hugs.)

Like any other modern programming language, Haskell has excellent support
for building user-defined data types. In fact, even though Haskell is very
much not object-oriented, most Haskell programs end up being centered
around the design and implementation of data structures, using constructions
called classes and instances.

In this post, we’re going to start looking at how you implement data types
in Haskell. What I’m going to do is start by showing you how to implement a
simple binary search tree. I’ll start with a very simple version, and then
build up from there.

Continue reading

The Balance of Screening Tests

As you’ve no doubt heard by now, there’s been a new recommendation issues
which proposes changing the breast-cancer screening protocol for women under
50, by eliminating mammograms for women who don’t have significant risk
factos. While Orac has done a terrific job of covering this here and
here, I wanted to throw
in a couple of notes and a personal perspective.

Continue reading

Shameful Innumeracy in the New York Times

I’ve been writing this blog for a long time – nearly four years. You’d think that
after all of the bad math I’ve written about, I must have reached the point where
I wouldn’t be surprised at the sheer innumeracy of most people – even most supposedly
educated people. But alas for me, I’m a hopeless idealist. I just never quite
manage to absorb how clueless the average person is.

Today in the New York Times, there’s an editorial which talks about
the difficulties faced by the children of immigrants. In the course of
their argument, they describe what they claim is the difference between
the academic performance of native-born versus immigrant children:

Whereas native-born children’s language skills follow a bell
curve, immigrants’ children were crowded in the lower ranks: More than
three-quarters of the sample scored below the 85th percentile in English
proficiency.

Scoring in the 85th percentile on a test means that you did better on that
test than 85 percent of the people who took it. So for the population as a
whole
, 85% of the people who took it scored below the 85th percentile –
by definition. So, if the immigrant population were perfectly matched
with the population as a whole, then you’d expect more than 3/4s the
score below the 85th percentile.

As they reported it, the most reasonable conclusion would be that on the
whole, immigrant children do better than native-born children! The
population of test takers consists of native-born children and immigrant
children. (There’s no third option – if you’re going to school here, either
you were born here, or you weren’t.) If 3/4s of immigrant children are scoring
85th percentile or below, then that means that more than 85% of
the non-immigrant children are scoring below 85th percentile.

I have no idea where they’re getting their data. Nor do I have any idea of
what they thought they were saying. But what they actually said is a
mind-boggling stupid thing, and I can’t imagine how anyone who had the most
cursory understanding of what it actually meant would miss the fact that
the statistic doesn’t in any way, shape, or form support the statement it’s
attached to.

The people who write the editorials for the New York Times don’t even
know what percentiles mean. It’s appalling. It’s worse that appalling – it’s
an absolute disgrace.

Types in Haskell: Types are Propositions, Programs are Proofs

(This is a revised repost of an earlier part of my Haskell tutorial.)

Haskell is a strongly typed language. In fact, the type system in Haskell
is both stricter and more expressive than any type system I’ve seen for any
non-functional language. The moment we get beyond writing trivial
integer-based functions, the type system inevitably becomes visible, so we
need to take the time now to talk about it a little bit, in order to
understand how it works.

Continue reading

Dembski Stoops Even Lower: Legal Threats to Silence a Critic

For those who have slightly better memory of recent events than an average
gerbil, you’ll surely remember that not too long ago, the Intelligent Design
folks, with the help of Ben Stein, put together a whole movie about how
evilutionists are all a bunch of evil fascists, out to silence the poor,
hard-working IDers.

You’ll also remember that Bill Dembski has been talking up the fact that
he’s got two peer reviewed papers allegedly about intelligent design. So,
you’d think that after complaining about being locked out of the debate,
now that he has some actual papers to talk about, he’d be eager to, well,
talk about them!

Yeah, right. As it turns out, debate is the last thing that Bill
wants. When someone took a good look at one of his papers, and
posted a critique, Bill’s response was the threaten to sue them for
copyright violation. Knowing how utterly trustworthy the Disco gang
is, I’ve got a screen-capture of the post with the threat below the fold, in
case they try to change history by deleting it.

Continue reading

Writing Basic Functions in Haskell (edited repost)

(This is a heavily edited repost of the first article in my original
Haskell tutorial.)

(I’ve attempted o write this as a literate haskell program. What that
means is that if you just cut-and-paste the text of this post from your
browser into a file whose name ends with “.lhs”, you should be able to run it
through a Haskell compiler: only lines that start with “>” are treated as
code. The nice thing about this is that this blog post is itself a
compilable, loadable Haskell source file – so I’ve compiled and tested
all of the code in here in exactly this context.)

Continue reading

The Go I Forgot: Concurrency and Go-Routines

A couple of people pointed out that in my wednesday post about Go, I completely left out the concurrency stuff! That’s what I get for rushing the post – I managed to leave out one of the most interesting subjects! Go provides very strong support for communicating processes.

I haven’t done a lot of hacking with the concurrency stuff yet – so my impressions of it are still very preliminary. But my early impressions are very good.

Continue reading

Google's New Language: Go

I’ve been being peppered with questions about Go, the new programming language just released as open-source by Google. Yes, I know about it. And yes, I’ve used it. And yes, I’ve got some strong opinions about it.

Go is an interesting language. I think that there are many fantastic things about it. I also think that there are some really dreadful things about it.

A warning before I go on: this post is definitely a bit of a rush job. I wanted to get something out before my mailbox explodes :-). I’ll probably try to do a couple of more polished posts about Go later. But this should give you a first taste.

Continue reading