# Regular Expressions and Derivatives

When you’re working with regular languages specified in regular expression form, there’s a really cool idea that you can use for building regular expression matchers, and for describing how to convert from a regular expression to a NFA. It’s called the Brzozozwksi derivative of a regular expression – or just simply the derivative of a regexp.

The basic idea of the derivative is that given a regular expression, $r$, you can derive a new regular expression called the derivative with respect to symbol $c$, $D_c(r)$. $D_c(r)$ is a regular expression describing the string matched by $r$ after it’s matched an $r$.

# Zippers: Making Functional "Updates" Efficient

In the Haskell stuff, I was planning on moving on to some monad-related
post on data structures, focusing on a structured called a
zipper.

A zipper is a remarkably clever idea. It’s not really a single data
structure, but rather a way of building data structures in functional
languages. The first mention of the structure seems to be a paper
by Gerard Huet in 1997
, but as he says in the paper, it’s likely that this was
used before his paper in functional code — but no one thought to formalize it
and write it up. (In the original version of this post, I said the name of the guy who first wrote about zippers was “Carl Huet”. I have absolutely no idea where that came from – I literally had his paper on my lap as I wrote this post, and I still managed to screwed up his name. My apologies!)

It also happens that zippers are one of the rare cases of data structures
where I think it’s not necessarily clearer to show code. The concept of
a zipper is very simple and elegant – but when you see a zippered tree
written out as a sequence of type constructors, it’s confusing, rather
than clarifying.

So, we’ve built up some pretty nifty binary trees – we can use the binary tree both as the basis of an implementation of a set, or as an implementation of a dictionary. But our implementation has had one major problem: it’s got absolutely no way to maintain balance.

What that means is that depending on the order in which things are inserted to the tree, we might have excellent performance, or we might be no better than a linear list. For example, look at these trees. As you can see, a tree with the same values can wind up quite different. In a good insert order, you can wind up with a nicely balanced tree: the minimum distance from root to leaf is 3; the maximum is 4. On the other hand, take the same values, and insert them in a different order and you get a rotten tree; the minimum distance from root to leaf is 1, and the maximum is 7. So depending on luck, you can get a tree that gives you good performance, or one that ends up giving you no better than a plain old list. Playing with a bit of randomization can often give you reasonably good performance on average – but if you’re using a tree, it’s probably because O(n) complexity is just too high. You want the O(lg n) complexity that you’ll get from a binary tree – and not just sometimes.

To fix that, you need to change the structure a bit, so that as you insert things, the tree stays balanced. There are several different approaches to how you can do this. The one that we’re going to look at is based on labeling nodes in ways that allow you to very easily detect when a serious imbalance is developing, and then re-arrange the tree to re-balance it. There are two major version of this, called the AVL tree, and the red-black tree. We’re going to look at the red-black. Building a red-black tree is as much a lesson in data structures as it is in Haskell, but along with learning about the structure, we’ll see a lot about how to write code in Haskell, and particularly about how to use pattern-matching for complex structures.

# Creating User-Defined Types in Haskell

• (This is an edited repost of one of the posts from the earlier
• (This file is a literate haskell script. If you save it
as a file whose name ends in “.lhs”, it’s actually loadable and
runnable in GHCI or Hugs.)

Like any other modern programming language, Haskell has excellent support
for building user-defined data types. In fact, even though Haskell is very
much not object-oriented, most Haskell programs end up being centered
around the design and implementation of data structures, using constructions
called classes and instances.

In this post, we’re going to start looking at how you implement data types
in Haskell. What I’m going to do is start by showing you how to implement a
simple binary search tree. I’ll start with a very simple version, and then
build up from there.

# Types in Haskell: Types are Propositions, Programs are Proofs

(This is a revised repost of an earlier part of my Haskell tutorial.)

Haskell is a strongly typed language. In fact, the type system in Haskell
is both stricter and more expressive than any type system I’ve seen for any
non-functional language. The moment we get beyond writing trivial
integer-based functions, the type system inevitably becomes visible, so we
need to take the time now to talk about it a little bit, in order to
understand how it works.

# Writing Basic Functions in Haskell (edited repost)

(This is a heavily edited repost of the first article in my original

(I’ve attempted o write this as a literate haskell program. What that
means is that if you just cut-and-paste the text of this post from your
browser into a file whose name ends with “.lhs”, you should be able to run it
all of the code in here in exactly this context.)

# Philosophizing about Programming; or "Why I'm learning to love functional programming"

Way back, about three years ago, I started writing a Haskell tutorial as a series of posts on this blog. After getting to monads, I moved on to other things. But based on some recent philosophizing, I think I’m going to come back to it. I’ll start by explaining why, and then over the next few days, I’ll re-run revised versions of old tutorial posts, and then start new material dealing with the more advanced topics that I didn’t get to before.

# Ropes: Twining Together Strings for Editors

It’s been a while since I’ve written about any data structures. But it just so happens that I’m actually really working on implementing a really interesting and broadly useful data structure now, called a Rope.

A bit of background, to lead in. I’ve got this love-hate relationship with some of the development tools that Rob Pike has built. (Rob is one of the Unix guys from Bell labs, and was one of the principal people involved in both the Plan9 and Inferno operating systems.) Rob has implemented some amazing development tools. The two that fascinate me were called Sam and Acme. The best and worst features of both are a sort of extreme elegant minimalism. There’s no bloat in Rob’s tools, no eye-candy, no redundancy. They’re built to do a job, and do it well – but not to do any more than their intended job. (This can be contrasted against Emacs, which is a text editor that’s grown into a virtual operating system.) The positive side of this is that they’re incredibly effective, and they demonstrate just how simple a programmers text editor should be. I’ve never used another tool that is more effective than Acme or Sam. In all seriousness, I can do more of my work more easily in Sam than I can in Emacs (which is my everyday editor). But on the other hand, that extreme minimalist aesthetic has the effect of strictly eliminating any overlaps: there’s one way to do things, and if you don’t like it, tough. In the case of Acme and Sam, that meant that you used the mouse for damn-near everything. You couldn’t even use the up and down arrows to move the cursor!

This is a non-starter for me. As I’ve mentioned once or twice, I’ve got terrible RSI in my wrists. I can’t use the mouse that much. I like to keep my hands on my keyboard. I don’t mind using the mouse when it’s appropriate, but moving my hand from the keyboard to the mouse every time I want to move the cursor?. No. No damned way. Just writing this much of this post, I would have had to go back and forth between the keyboard and mouse over 50 times. (I was counting, but gave up when I it 50.) A full day of that, and I’d be in serious pain.

I recently got reminded of Acme, because my new project at work involves using a programming language developed by Rob Pike. And Acme would really be incredibly useful for my new project. But I can’t use it. So I decided to bite the bullet, and use my free time to put together an Acme-like tool. (Most of the pieces that you need for a prototype of a tool like that are available as open-source components, so it’s just a matter of assembling them. Still a very non-trivial task, but a possible one.)

Which finally, leads us back to today’s data structure. The fundamental piece of a text editor is the data structure that you use to represent the text that you’re editing. For simplicity, I’m going to use Emacs terminology, and refer to the editable contents of a file as a Buffer.

How do you represent a buffer?

As usual with data structures, you start by asking: What do I need it to do? What performance characteristics are important?

In the case of a text buffer, you can get by with a fairly small set of basic operations:

• Fast concatenation: concatenating blocks of text needs to be really fast.
• Fast insert: given a point in a block of text, you need to be able to quickly insert text at that point.
• Fast delete: given two points in a block of text, you need to be able to quickly remove the text between those points.
• Reasonably fast traversal: Lots of algorithms, ranging from printing out the text to searching it are based on linear traversals of the contents. This doesn’t have to be incredibly fast – it is an intrinsically linear process, and it’s usually done in the context of something with a non-trivial cost (I/O, regular-expression scanning). But you can’t afford to make it expensive.
• Size: you need to be able to store effectively unlimited amounts of text, without significant performance degradation in the operations described above.

# Meta out the wazoo: Monads and Monoids

Since I mentioned the idea of monoids as a formal models of computations, John Armstrong made the natural leap ahead, to the connection between monoids and monads – which are a common feature in programming language semantics, and a prominent language feature in Haskell, one of my favorite programming languages.

# Using Monads for Control: Maybe it's worth a look?

So, after our last installment, describing the theory of monads, and the previous posts, which focused on representing things like state and I/O, I thought it was worth taking a moment to look at a different kind of thing that can be done with monads. We so often think of them as being state wrappers; and yet, that’s only really a part of what we can get from them. Monads are ways of tying together almost anything that involves sequences.