As long as I’m doing all of these basics posts, I thought it would be worth
explaining just what a Turing machine is. I frequently talk about things
being Turing equivalent, and about effective computing systems, and similar things, which all assume you have some clue of what a Turing machine is. And as a bonus, I’m also going to give you a nifty little piece of Haskell source code that’s a very basic Turing machine interpreter. (It’s for a future entry in the Haskell posts, and it’s not entirely finished, but it does work!)
The Turing machine is a very simple kind of theoretical computing device. In
fact, it’s almost downright trivial. But according to everything that we know and understand about computation, this trivial device is capable of any computation that can be performed by any other computing device.
The basic idea of the Turing machine is very simple. It’s a machine that runs on
top of a tape, which is made up of a long series of little cells, each of which has a single character written on it. The machine is a read/write head that moves over the tape, and which can store a little bit of information. Each step, the
machine looks at the symbol on the cell under the tape head, and based on what
it sees there, and whatever little bit of information it has stored, it decides what to do. The things that it can do are change the information it has store, write a new symbol onto the current tape cell, and move one cell left or right.
That’s really it. People who like to make computing sound impressive often have
very complicated explanations of it – but really, that’s all there is to it. The point of it was to be simple – and simple it certainly is. And yet, it can do
anything that’s computable.
As promised, I’m finally going to get to the theory behind monads. As a quick review, the basic idea of the monad in Haskell is a hidden transition function – a monad is, basically, a state transition function.
The theory of monads comes from category theory. I’m going to assume you know a little bit about category theory – if you have trouble with it, go take a look at my introductory posts here.
Time for more monads. In this article, I’m going to show you how to implement a very simple
state monad – it’s a trivial monad which allows you to use a mutable state
consisting of a single integer. Then we’ll expand it to allow a more interesting
notion of state.
The biggest nightmare for most people learning Haskell is monads. Monads are the
key to how you can implement IO, state, parallelism, and sequencing (among numerous other things) in Haskell. The trick is wrapping your head around them.
One thing that we’ve seen already in Haskell programs is type
classes. Today, we’re going to try to take our first look real look
at them in detail – both how to use them, and how to define them. This still isn’t the entire picture around type-classes; we’ll come back for another look at them later. But this is a beginning – enough to
really understand how to use them in basic Haskell programs, and enough to give us the background we’ll need to attack Monads, which are the next big topic.
Type classes are Haskell’s mechanism for managing parametric polymorphism. Parametric polymorphism
is a big fancy term for code that works with type parameters. The idea of type classes is to provide a
mechanism for building constrained polymorphic functions: that is, functions whose type involves a type parameter, but which needs some constraint to limit the types that can be used to instantiate it, to specify what properties its type parameter must have. In essence, it’s doing very much the same thing that parameter type declarations let us do for the code. Type declarations let us say “the value of this parameter can’t be just any value – it must be a value which is a member of this particular type”; type-class declarations let us say “the type parameter for instantiating this function can’t be just any type – it must be a type which is a member of this type-class.”
So, we’ve built up some pretty nifty binary trees – we can use the binary tree both as the
basis of an implementation of a set, or as an implementation of a dictionary. But our
implementation has had one major problem: it’s got absolutely no way to maintain balance. What
that means is that depending on the order in which things are inserted to the tree, we might
have excellent performance, or we might be no better than a linear list. For example, look at
these trees. As you can see, a tree with the same values can wind up quite different. In a good insert order, you can wind up with a nicely balanced tree: the minimum distance from root to leaf is 3; the maximum is 4. On the other hand, take the same values, and insert them in a different order and you
get a rotten tree; the minimum distance from root to leaf is 1, and the maximum is 7. So depending on
luck, you can get a tree that gives you good performance, or one that ends up giving you no better than a plain old list.
Today we’re going to look at fixing that problem. That’s really more of a lesson in data
structures than it is in Haskell, but we’ll need to write more complicated and interesting
data structure manipulation code than we have so far, and it’ll be a lollapalooza of pattern
matching. What we’re going to do is turn our implementation into a red-black tree.
In Haskell, there are no looping constructs. Instead, there are two alternatives: there are list iteration constructs (like
foldl which we’ve seen before), and tail recursion. Let me say, up front, that in Haskell if you find yourself writing any iteration code on a list or tree-like structure, you should always look in the libraries; odds are, there’s some generic function in there that can be adapted for your use. But there are always cases where you need to write something like a loop for yourself, and tail recursion is the way to do it in Haskell.
Tail recursion is a kind of recursion where the recursive call is the very last
thing in the computation of the function. The value of tail recursion is that in a tail
recursive call, the caller does nothing except pass up the value that’s returned
by the the callee; and that, in turn, means that you don’t need to return to the caller
at all! If you think of it in terms of primitive machine-level code, in a tail-recursive call, you can use a direct branch instruction instead of a branch-to-subroutine; the tail-recursive call does *not* need to create a new stack frame. It can just reuse the callers frame.
Last time around, I walked through the implementation of
a very simple binary search tree. The implementation that I showed
was reasonable, if not great, but it had two major limitations. First,
it uses equality testing for the search, so that the implementation is only really suitable for use as a set; and second, because it’s such
a trivial tree implementation, it’s very easy for the tree to become
highly unbalanced, resulting in poor performance.
Today, we’ll look at how to extend the implementation so that our BST
becomes useful as a key/value dictionary type. We’ll take two different
approaches to that, each of which demonstrates some common Haskell
techniques: one based on on using higher order functions, and one based on
using tuples. Balancing the tree will be a task for another day.
As an aside: in a real Haskell program, you would probably never write a simple type like this. Haskell has a great library, and it’s got plenty of library types that do a much better job than this. This
is really only for tutorial purposes.
For this post, I’m doing a bit of an experiment. Haskell includes a “literate” syntax mode. In literate mode, and text that doesn’t start with a “>” sign is considered a comment. So this entire post can be copied and used as a source file; just save it with a name ending in `”.lhs”`. If this doesn’t work properly, please post something in the comments to let me know. It’s more work for me to write the posts this way, so if it’s not working properly, I’d rather not waste the effort. I’ve tested it in both Hugs and ghci, and everything works, but who knows what will happen after a pass through MovableType!
Like most modern programming languages, Haskell has excellent support for building user-defined data types. In fact, even though Haskell is very much not object-oriented, most Haskell programs end up being centered around the design and implementation of data structures.
So today, I’m going to start looking at how you implement data types in Haskell. What I’m
going to do is start by showing you how to implement a simple binary search tree. I’ll start with
a very simple version, and then build on that.
Haskell is a strongly typed language. In fact, the type system in Haskell is both stricter and more
expressive than any type system I’ve seen for any non-functional language. The moment we get beyond
writing trivial integer-based functions, the type system inevitably becomes visible, so we need to
take the time now to talk about it a little bit, in order to understand how it works.