Category Archives: Programming

Everyone should program, or Programming is Hard? Both!

I saw something on twitter a couple of days ago, and I promised to write this blog post about it. As usual, I’m behind on all the stuff I want to do, so it took longer to write than I’d originally planned.

My professional specialty is understanding how people write programs. Programming languages, development environment, code management tools, code collaboration tools, etc., that’s my bread and butter.

So, naturally, this ticked me off.

The article starts off by, essentially, arguing that most of the programming tutorials on the web stink. I don’t entirely agree with that, but to me, it’s not important enough to argue about. But here’s where things go off the rails:

But that’s only half the problem. Victor thinks that programming itself is broken. It’s often said that in order to code well, you have to be able to “think like a computer.” To Victor, this is absurdly backwards– and it’s the real reason why programming is seen as fundamentally “hard.” Computers are human tools: why can’t we control them on our terms, using techniques that come naturally to all of us?

And… boom! My head explodes.

For some reason, so many people have this bizzare idea that programming is this really easy thing that programmers just make difficult out of spite or elitism or clueless or something, I’m not sure what. And as long as I’ve been in the field, there’s been a constant drumbeat from people to say that it’s all easy, that programmers just want to make it difficult by forcing you to think like a machine. That what we really need to do is just humanize programming, and it will all be easy and everyone will do it and the world will turn into a perfect computing utopia.

First, the whole “think like a machine” think is a verbal shorthand that attempts to make programming as we do it sound awful. It’s not just hard to program, but those damned engineers are claiming that you need to dehumanize yourself to do it!

To be a programmer, you don’t need to think like a machine. But you need to understand how machines work. To program successfully, you do need to understand how machines work – because what you’re really doing is building a machine!

When you’re writing a program, on a theoretical level, what you’re doing is designing a machine that performs some mechanical task. That’s really what a program is: it’s a description of a machine. And what a programming language is, at heart, is a specialized notation for describing a particular kind of machine.

No one will go to an automotive engineer, and tell him that there’s something wrong with the way transmissions are designed, because they make you understand how gears work. But that’s pretty much exactly the argument that Victor is making.

How hard is it to program? That all depends on what you’re tring to do. Here’s the thing: The complexity of the machine that you need to build is what determines the complexity of the program. If you’re trying to build a really complex machine, then a program describing it is going to be really complex.

Period. There is no way around that. That is the fundamental nature of programming.

In the usual argument, one thing that I constantly see is something along the lines of “programming isn’t plumbing: everyone should be able to do it”. And my response to that is: of course so. Just like everyone should be able to do their own plumbing.

That sounds like an amazingly stupid thing to say. Especially coming from me: the one time I tried to fix my broken kitchen sink, I did over a thousand dollars worth of damage.

But: plumbing isn’t just one thing. It’s lots of related but different things:

  • There are people who design plumbing systems for managing water distribution and waste disposal for an entire city. That’s one aspect of plubing. And that’s an incredibly complicated thing to do, and I don’t care how smart you are: you’re not going to be able to do it well without learning a whole lot about how plumbing works.
  • Then there are people who design the plumbing for a single house. That’s plumbing, too. That’s still hard, and requires a lot of specialized knowledge, most of which is pretty different from the city designer.
  • There are people who don’t design plumbing, but are able to build the full plumbing system for a house from scratch using plans drawn by a designer. Once again, that’s still plumbing. But it’s yet another set of knowledge and skills.
  • There are people who can come into a house when something isn’t working, and without ever seeing the design, and figure out what’s wrong, and fix it. (There’s a guy in my basement right now, fixing a drainage problem that left my house without hot water, again! He needed to do a lot of work to learn how to do that, and there’s no way that I could do it myself.) That’s yet another set of skills and knowledge – and it’s still plumbing.
  • There are non-professional people who can fix leaky pipes, and replace damaged bits. With a bit of work, almost anyone can learn to do it. Still plumbing. But definitely: everyone really should be able to do at least some of this.

  • And there are people like me who can use a plumbing snake and a plunger when the toilet clogs. That’s still plumbing, but it requires no experience and no training, and absolutely everyone should be able to do it, without question.

All of those things involve plumbing, but they require vastly different amounts and kinds of training and experience.

Programming is exactly the same. There are different kinds of programming, which require different kinds of skills and knowledge. The tools and training methods that we use are vastly different for those different kinds of programming – so different that for many of them, people don’t even realize that they are programming. Almost everyone who uses computers does do some amount of programming:

  • When someone puts together a presentation in powerpoint, with things that move around, appear, and disappear on your command: that is programming.
  • When someone puts formula into a spreadsheet: that is programming.
  • When someone builds a website – even a simple one – and use either a set of tools, or CSS and HTML to put the site together: that is programming.
  • When someone writes a macro in Word or Excel: that is programming.
  • When someone sets up an autoresponder to answer their email while they’re on vacation: that is programming.

People like Victor completely disregard those things as programming, and then gripe about how all programming is supercomplexmagicalsymbolic gobbledygook. Most people do write programs without knowing about it, precisely because they’re doing it with tools that present the programming task as something that’s so natural to them that they don’t even recognize that they are programming.

But on the other hand, the idea that you should be able to program without understanding the machine you’re using or the machine that you’re building: that’s also pretty silly.

When you get beyond the surface, and start to get to doing more complex tasks, programming – like any skill – gets a lot harder. You can’t be a plumber without understanding how pipe connections work, what the properties of the different pipe materials are, and how things flow through them. You can’t be a programmer without understanding something about the machine. The more complicated the kind of programming task you want to do, the more you need to understand.

Someone who does Powerpoint presentations doesn’t need to know very much about the computer. Someone who wants to write spreadsheet macros needs to understand something about how the computer processes numbers, what happens to errors in calculations that use floating point, etc. Someone who wants to build an application like Word needs to know a whole lot about how a single computer works, including details like how the computer displays things to people. Someone who wants to build Google doesn’t need to know how computers render text clearly on the screen, but they do need to know how computers work, and also how networks and communications work.

To be clear, I don’t think that Victor is being dishonest. But the way that he presents things often does come off as dishonest, which makes it all the worse. To give one demonstration, he presents a comparison of how we teach programming to cooking. In it, he talks about how we’d teach people to make a soufflee. He shows a picture of raw ingredients on one side, and a fully baked soufflee on the other, and says, essentially: “This is how we teach people to program. We give them the raw ingredients, and say fool around with them until you get the soufflee.”

The thing is: that’s exactly how we really teach people to cook – taken far out of context. If we want them to be able to prepare exactly one recipe, then we give them complete, detailed, step-by-step instructions. But once they know the basics, we don’t do that anymore. We encourage them to start fooling around. “Yeah, that soufflee is great. But what would happen if I steeped some cardamom in the cream? What if I left out the vanilla? Would it turn out as good? Would that be better?” In fact, if you never do that experimentation, you’ll probably never learn to make a truly great soufflee! Because the ingredients are never exactly the same, and the way that it turns out is going to depend on the vagaries of your oven, the weather, the particular batch of eggs that you’re using, the amount of gluten in the flour, etc.

To write complicated programs is complicated. To write programs that manipulate symbolic data, you need to understand how the data symbolizes things. To write a computer that manipulates numbers, you need to understand how the numbers work, and how the computer represents them. To build a machine, you need to understand the machine that you’re building. It’s that simple.

DNS and Yesterday's Outage

As I’m sure at least some of you noticed, Scientopia – along with a huge number of other sites – was unreachable for most of yesterday. I say “unreachable” instead of “down” because the site was up. But if you wanted to get to it, and you didn’t know it’s numeric address, you couldn’t reach it. That’s because your computer uses a service called DNS to find things.

So what’s this DNS thing? And what brought down so much of the net yesterday?

DNS stands for “domain name service”. What it does is provide a bridge between the way that you think of the name of a website, and the way that your computer actually talks to the website. You think of a website by a URL, like http://scientopia.org/blogs/goodmath. That URL basically says “the thing named ”blogs/goodmath” on a server named ”scientopia.org”, which you can access using a system called ”http”.”.

The catch is that your computer can’t just send a message to something named “scientopia.org”. Scientopia is a server in a datacenter somewhere, and to send a message to it, it needs to know a numeric address. Numeric addresses are usually written as a set of four numbers, each ranging from 0 to 255. For example, scientopia.org is currently 184.106.221.182.

You don’t want to have to remember those four meaningless numbers in order to connect to scientopia. But there’s more to it than just remembering: those numbers can change. The physical computer that scientopia works on has been moved around several times. It lives in a datacenter somewhere, and as that datacenter has grown, and hardware has needed to be either expanded or replaced, the numeric address has changed. Even though I’m the administrator of the site, there’ve been a couple of times where it’s changed, and I can’t pinpoint exactly when!

The reason that all of that works is DNS. When you tell your computer to connect to another computer by name, it uses DNS to ask “What’s the numeric address for this name?”, and once it gets that numeric address, it uses that to connect to the other computer.

When I first got access to the internet in college, it was shortly before DNS first got deployed. At the time, there was one canonical computer somewhere in California. Every other computer on the (then) arpanet would contact that computer every night, and download a file named “hosts.txt”. hosts.txt contained a list of every computer on the internet, and its numeric address. That was clearly not working any more, and DNS was designed as the replacement. When they decided to replace hosts.txt, they wanted to replace it with something much more resilient, and something that didn’t need to have all of the data in one place, and which didn’t rely on any single machine in order to work.

The result was DNS. It’s a fascinating system. It’s the first real distributed system that I ever learned about, and I still think it’s really interesting. I’m not going to go into all of the details: distributed systems always wind up with tons of kinks, to deal with the complexity of decentralized information in the real world. But I can explain the basics pretty easily.

The basic way that DNS works is really simple. You take a domain name. For example, my personal machine is named “sunwukong”, and my family network is “phouka.net” – so the hostname that I’m typing this on right now is “sunwukong.phouka.net”. To look it up, what happens is you take the host name, and split it at the dots. So for the example, that’s [“sun-wukong”, “phouka”, “net”].

You start with the last element at the list, which is called the top-level-domain, and you contact a special server called the root nameserver and ask “who do I ask for addresses ending with this tld? (“Who do I ask for ‘.net’?”) It sends a response to you, giving you the address of another nameserver. Then you contact that, and ask it “who do I ask for the next element”? (“Who do I ask for phouka?”) Finally, when you get to the last one, the address that it gives you is the address of the machine you’re looking for, instead of the name of another DNS server. At each level of the heirarchy, there’s a server or set of servers that’s registered with the level above to be the canonical server for some set of names. The “.com” servers are registered with the root level servers. The next level of domains is registered with the TLD servers. And so on.

So, to look up sunwukong.phouka.net, what happens is:

  1. You send a request to the root nameserver, and ask for the server for “.net”?
  2. The root server responds, giving you the “.net” server.
  3. You send a request to the .net server address that you just got, asking “Who can tell
    me about phouka?”
  4. The .net server responds, and gives you an address for a DNS server that can tell
    you where things are in phouka.net.
  5. You ask the “phouka.net” server for the address of sunwukong
  6. It finally gives you the address of your server.

This distributes the information nicely. You can have lots of root name servers – all they need to be able to do is find DNS servers for the list of top-level domains. And there can be – and are – thousands and thousands of nameservers for addresses in those top-level domains. And then each administrator of a private network can have a top-level nameserver for the computers that they directly manage. In this mess, any nameserver at any level can disappear, and all that will be affected are the names that it manages.

The problem is, there’s no one place to get information; and more importantly, every time you need to talk to another computer, you need to go through this elaborate sequence of steps.

To get around that last issue, you’ve got something called caches, on multiple levels. A cache is a copy of the canonical data, which gets kept around for some period of time. In the DNS system, you don’t need to talk to the canonical DNS server for a domain if someone has the data they need in a cache. You can (and generally do) talk to cacheing DNS servers. With a cacheing DNS server, you tell it the whole domain name that you want to look up, and it does the rest of the lookup process, and gives you the address you’re looking for. Every time it looks something up, it remembers what it did, so that it doesn’t need to ask again. So when you look up a “.com” address, your cacheing service will remember who it can ask for “.com” addresses. So most of the time, you only talk to one server, which does the hard work, and does it in a smart way so that it doesn’t need to unnecessarily repeat requests.

So now, we can get to what happened yesterday. GoDaddy, one of the major american DNS registries, went down. Exactly why it went down is not clear – a group of jerks claimed to have done a denial-of-service attack against them; but the company claims to have just had a configuration glitch. Honestly, I’m inclined to believe that it was the hackers, but I don’t know for sure.

But – the canonical server for scientopia.org was gone. So if you needed to look up scientopia.org and it wasn’t in your cache, then your cacheing nameserver would go to the .org server, and ask for scientopia; the .org nameserver would return the address of GoDaddy’s nameserver, and that wouldn’t respond. So you’d never reach us.

That’s also why some people were able to reach the site, and others weren’t. If your cacheing server had cached the address for scientopia, then you’d get the server address, and since the server was up the whole time, you’d connect right up, and everything would work.

Nulls

So, my post on monads apparently set of a bit of a firestorm over my comments about avoiding null pointer exceptions. To show you what I mean, here’s a link to one of the more polite and reasonable posts.

Ok. So. Let me expand a bit.

The reason that I think it’s so great that I don’t get NPEs when I use an option with something like Option isn’t because it makes me a super-programmer who’s better than the lowly slime who deal with NPEs. It’s because it changes how that I write code, in a way that helps me avoid making mistakes – mistakes that I make because I’m just a fallible idiot of a human.

There are two fundamental things about an option type, like what we have in Scala, that make a huge difference.

First, it narrows the field of errors. When I’m programming in Java, any call that returns a pointer could return a null. The language makes no distinction between a function/expression that could return a null, and one that can’t. That means that when I get an NPE, the source of that null pointer could be anything in the dynamic slice leading to the error. With an option type, I’ve got two kinds of functions: functions that always return a non-null value, and functions that sometimes return a non-null value, and sometimes return a None. That’s incredibly valuable.

Second, it forces me to explicitly deal with the None case. In Java, programmers constantly build code without null checks, because they know that a function won’t return null. And then it does, and ker-splat. With an option type, I have no choice: I have to explicitly deal with the potential error case. Sure, I can forcibly code around it – in Scala, I can use Option.get, which will turn into an error analagous to an NPE. But it forces me to make that choice, and make it explicitly.

Even if I’m going the stupid, brute-force route and assuming that I know, without fail, that a function is going to return a non-null value… Consider an example:

   Java: :
  T g = f.doSomething()
  g.doSomethingElse()

   Scala:
  val g: Option[T] = f.doSomething()
  g.get.doSomethingElse()

The scala case has to explicitly deal with the fact that it’s dealing with a potentially empty value, and using a statement that asserts the non-emptiness.

But in reality, if you’re a decent programmer, you never use .get to directly access an option. (The only exception is in cases where you call the .get in a context dominated by a non-empty test; but even then, it’s best to not, to avoid errors when the surrounding code is modified.) In real code, you pretty much always explicitly de-option a value using a function like getOrElse:

val f: User = getUser("markcc").getOrElse(new User("markcc"))

As I hope it has become plain, the point of avoiding NPEs through option-like type structures isn’t that somehow it makes the entire category of unexpected result value disappear. It’s that it changes the way that you code to distinguish where those errors can occur, and to force you to deal with them.

I think that ultimately, things like this are really just manifestations of the good-old static vs dynamic type wars. Type errors in a dynamically typed language are really just unexpected value errors. Strong typing doesn’t stop you from making those errors. It just takes a bunch of common cases of those errors, and converts them from a run-time error to a compile-time error. Whether you want them to be run-time or compile-time depends on the project your working on, on your development team, and on your personal preferences.

I find in practice that I get many fewer errors by being forced to explicitly declare when a value might be null/None, and by being required to explicitly deal with the null/None case when it might occur. I’ve spent much less time debugging that kind of error in my year at foursquare than in the 15 years of professional development that I did before. That’s not because I magically became a better programmer a year ago when I joined foursquare. It’s because I’m using a better tool that helps me avoid mistakes.

Monads and Programming

Sorry things have been so slow around here. I know I keep promising that I’m going to post more frequently, but it’s hard. Life as an engineer at a startup is exhausting. There’s so much work to do! And the work is so fun – it’s easy to let it eat up all of your time.

Anyway… last good-math post ’round these parts was about monoids and programming. Today, I’m going to talk about monads and programming!

If you recall, monoids are an algebraic/categorical construct that, when implemented in a programming language, captures the abstract notion of foldability. For example, if you’ve got a list of numbers, you can fold that list down to a single number using addition. Folding collections of values is something that comes up in a lot of programming problems – capturing that concept with a programming construct allows you to write code that exploits the general concept of foldability in many different contexts.

Monads are a construct that have become incredibly important in functional programming, and they’re very, very poorly understood by most people. That’s a shame, because the real concept is actually simple: a monad is all about the concept of sequencing. A monad is, basically, a container that you can wrap something in. Once it’s wrapped, you can form a sequence of transformations on it. The result of each step is the input to the next. That’s really what it’s all about. And when you express it that way, you can begin to see why it’s such an important concept.

I think that people are confused by monads for two reasons:

  1. Monads are almost always described in very, very abstract terms. I’ll also get into the abstract details, but I’ll start by elaborating on the simple description I gave above.
  2. Monads in Haskell, which are where most people are introduced to them, are very confusing. The basic monad operations are swamped with tons of funny symbols, and the basic monad operations are named in incredibly misleading ways. (“return” does almost the exact opposite of what you expect return to do!)

In programming terms, what’s a monad?

Basically, a monadic computation consists of three pieces:

  1. A monadic type, M which is a parametric type wrapper that can wrap a value of any type.
  2. An operation which can wrap a value in M.
  3. An operation which takes a function that transforms a value wraped in M into another value (possibly with a different type) wrapped in M.

Whenever you describe something very abstractly like this, it can seem rather esoteric. But this is just a slightly more formal way of saying what I said up above: it’s a wrapper for a series of transformations on the wrapped value.

Let me give you an example. At foursquare, we do all of our server programming in Scala. In a year at foursquare, I’ve seen exactly one null pointer exception. That’s amazing – NPEs are ridiculously common in Java programming. But in Scala at foursquare, we don’t allow nulls to be used at all. If you have a value which could either be an instance of A, or no value, we use an option type. An Option[T] can be either Some(t: T) or None.

So far, this is nice, but not really that different from a null. The main difference is that it allows you to say, in the type system, whether or not a given value might be null. But: Option is a monad. In Scala, that means that I can use map on it. (map is one of the transformation functions!)

	val t: Option[Int] = ...
	val u: Option[String] = t.map( _ + 2 ).map(_.toString)

What this does is: if t is Some(x), then it adds two to it, and returns Some(x+2); then it takes the result of the first map, and converts it toa string, returning an Option[String]. If t is None, then running map on it always returns None. So I can write code which takes care of the null case, without having to write out any conditional tests of nullness – because optionality is a monad.

In a good implementation of a monad, I can do a bit more than that. If I’ve got a Monad[T], I can use a map-like operation with a function that takes a T and returns a Monad[T].

For an example, we can look at lists – because List is a monad:

val l: List[Int] = List(1, 2, 3)
l.flatMap({ e => List( (e, e), (e+1, e+2) )  })
res0: List[(Int, Int)] = List((1,1), (2,3), (2,2), (3,4), (3,3), (4,5))

The monad map operation does a flatten on the map steps. That means a lot of things. You can see one in the rather silly example above.

You can take values, and wrap them as a list. THen you can perform a series of operations on those elements of a list – sequencing over the elements of the list. Each operation, in turn, returns a list; the result of the monadic computation is a single list, concatenating, in order, the lists returned by each element. In Scala, the flatMap operation captures the monadic concept: basically, if you can flatmap something, it’s a monad.

Let’s look at it a bit more specifically.

  1. The monadic type: List[T].
  2. A function to wrap a value into the monad: the constructor function from List def apply[T](value: T): List[T]
  3. The map operation: def flatMap[T, U](op: T => List[U]): List[U].

(In the original version of this post, the I put the wrong type in flatMap in the list above. In the explanation demonstrating flatMap, the type is correct. Thanks to John Armstrong for catching that!)

You can build monads around just about any kind of type wrapper where it makes sense to map over the values that it wraps: collections, like lists, maps, and options. Various kinds of state – variable environments (where the wrapped values are, essentially, functions from identifiers to values), or IO state. And plenty of other things. Anything where you perform a sequence of operations over a wrapped value, it’s a monad.

Now that we have some understanding of what this thing we’re talking about it, what is it in mathematical terms? For that, we turn to category theory.

Fundamentally, in category theory a monad is a category with a particular kind of structure. It’s a category with one object. That category has a collection of arrows which (obviously) are from the single object to itself. That one-object category has a functor from the category to itself. (As a reminder, a functor is an arrow between categories in the category of (small) categories.)

The first trick to the monad, in terms of theory, is that it’s fundamentally about the functor: since the functor maps from a category to the same category, so you can almost ignore the category; it’s implicit in the definition of the functor. So we can almost treat the monad as if it were just the functor – that is, a kind of transition function.

The other big trick is closely related to that. For the programming language application of monads, we can think of the single object in the category as the set of all possible states. So we have a category object, which is essentially the collection of all possible states; and there are arrows between the states representing possible state transitions. So the monad’s functor is really just a mapping from arrows to different arrows – which basically represents the way that changing the state causes a change in the possible transitions to other states.

So what a monad gives us, in terms of category theory, in a conceptual framework that captures the concept of a state transition system, in terms of transition functions that invisibly carry a state. When that’s translated into programming languages, that becomes a value that implicitly takes an input state, possibly updates it, and returns an output state. Sound familiar?

Let’s take a moment and get formal. As usual for category theory, first there are some preliminary definitions.

  1. Given a category, C, 1C is the identity functor from C to C.
  2. Given a category C with a functor T : CC, T2 = T º T.
  3. Given a functor T, 1T : TT is the natural transformation from T to T.

Now, with that out of the way, we can give the complete formal definition of a monad. Given a category C, a monad on C is a triple: (T:CC, η:1CT, μ:T2T), where T is a functor, and η and μ are natural transformations. The members of the triple must make the following two diagrams commute.

monad-prop1.jpg

Commutativity of composition with μ


monad-prop2.jpg

Commutativity of composition with η

What these two diagrams mean is that successive applications of the state-transition functor over C behave associatively – that any sequence of composing monadic functors will result in a functor with full monadic structure; and that the monadic structure will always preserve. Together, these mean that any sequence of operations (that is, applications of the monad functor) are themselves monad functors – so the building a sequence of monadic state transformers is guaranteed to behave as a proper monadic state transition – so what happens inside of the monadic functor is fine – to the rest of the universe, the difference between a sequence and a single simple operation is indistinguishable: the state will be consistently passed from application to application with the correct chaining behavior, and to the outside world, the entire monadic chain looks like like a single atomic monadic operation.

Now, what does this mean in terms of programming? Each element of a monadic sequence in Haskell is an instantiation of the monadic functor – that is, it’s an arrow between states – a function, not a simple value – which is the basic trick to monads. They look like a sequence of statements; in fact, each statement in a monad is actually a function from state to state. And it looks like we’re writing sequence code – when what we’re actually doing is writing function compositions – so that when we’re done writing a monadic sequence, what we’ve actually done is written a function definition in terms of a sequence of function compositions.

Understanding that, we can now clearly understand why we need the return function to use a non-monad expression inside of a monadic sequence – because each step in the sequence needs to be an instance of the monadic functor; an expression that isn’t an instance of the monadic functor couldn’t be composed with the functions in the sequence. The return function is really nothing but a function that combines a non-monadic expression with the id functor.

In light of this, let’s go back and look at the definition of Monad in the Haskell standard prelude.

class  Functor f  where
  fmap              :: (a -> b) -> f a -> f b

class  Monad m  where
  (>>=)  :: m a -> (a -> m b) -> m b
  (>>)   :: m a -> m b -> m b
  return :: a -> m a
  fail   :: String -> m a

  -- Minimal complete definition:
  --      (>>=), return
  m >> k  =  m >>= _ -> k
  fail s  = error s

The declaration of monad is connected with the definition of functor – if you look, you can see the connection. The fundamental operation of Monad is “>>=” – the chaining operation, which is basically the haskell version of the map operation, which is type m a -> (a -> m b) -> m b is deeply connected with the fmap operation from Functor‘s fmap operation – the a in m a is generally going to be a type which can be a Functor. (Remember what I said about haskell and monads? I really prefer map and flatMap to >> and >>=).

So the value type wrapped in the monad is a functor – in fact, the functor from the category definition! And the “>>=” operation is just the functor composition operation from the monad definition.

A proper implementation of a monad needs to follow some fundamental rules – the rules are, basically, just Haskell translations of the structure-preserving rules about functors and natural transformations in the category-theoretic monad. There are two groups of laws – laws about the Functor class, which should hold for the transition function wrapped in the monad class; and laws about the monadic operations in the Monad class. One important thing to realize about the functor and monad laws is that they are not enforced – in fact, cannot be enforced! – but monad-based code using monad implementations that do not follow them may not work correctly. (A compile-time method for correctly verifying the enforcement of these rules can be shown to be equivalent to the halting problem.)

There are two simple laws for Functor, and it’s pretty obvious why they’re fundamentally just strucure-preservation requirements. The functor class only has one operation, called fmap, and the two functor laws are about how it must behave.

  1. fmap id = id
    (Mapping ID over any structured sequence results in an unmodified sequence)
  2. fmap (f . g) = (fmap f) . (fmap g)
    (“.” is the function composition operation; this just says that fmap preserves the structure to ensure that that mapping is associative with composition.)

The monad laws are a bit harder, but not much. The mainly govern how monadic operations interact with non-monadic operations in terms of the “return” and “>>=” operations of the Monad class.

  1. return x >>= f = f x
    (injecting a value into the monad is basically the same as passing it as a parameter down the chain – return is really just the identity functor passing its result on to the next step. I hate the use of “return”. In a state functor, in exactly the right context, it does sort-of look like a return statement in an imperative language. But in pretty much all real code, return is the function that wraps a value into the monad.)
  2. f >>= return = f
    (If you don’t specify a value for a return, it’s the same as just returning the result of the previous step in the sequence – again, return is just identity, so passing something into return shouldn’t affect it.)
  3. seq >>= return . f = fmap f seq
    (composing return with a function is equivalent to invoking that function on the result of the monad sequence to that point, and wrapping the result in the monad – in other words, it’s just composition with identity.)
  4. seq >>= (x -> f x >>= g) = (seq >>= f) >>= g
    (Your implementation of “>>=” needs to be semantically equivalent to the usual translation; that is, it must behave like a functor composition.)

A Neat Trick with Partial Evalutors

This is something I was talking about with a coworker today, and I couldn’t quickly find a friendly writeup of it online, so I decided to write my own. (Yes, we do have a lot of people who enjoy conversations like this at foursquare, which is one of the many reasons that it’s such a great place to work. And we are hiring engineers! Feel free to get in touch with me if you’re interested, or just check our website!)

Anyway – what we were discussing was something called partial evaluation (PE). The idea of PE is almost like an eager form of currying. Basically, you have a two parameter function, f(x, y). You know that you’re going to invoke it a bunch of times with the same value of x, but different values of y. So what you want to do is create a new function, fx(y), which evaluates as much as possible of f using the known parameter.

It’s often clearer to see it in programmatic form. Since these days, I pretty much live Scala, I’ll use Scala syntax. We can abstractly represent a program as an object which can be run with a list of arguments, and which returns some value as its result.

trait Program {
  def run(inputs: List[String]): String
}

If that’s a program, then a partial evaluator would be:

object PartialEvaluator {
  def specialize(prog: Program, knownInputs: List[String]): Program = {
     ...
  }
}

What it does is take and program and a partial input, and returns a new program, which is the original program, specialized for the partial inputs supplied to the partial evaluator. So, for example, imagine that you have a program like:

object Silly extends Program {
  def run(inputs: List[String]): String = {
    val x: Int = augmentString(inputs[0]).toInt
    val y: Int = augmentString(inputs[0]).toInt
    if (x % 2 == 0) {
	  (y * y).toString
    } else {
	  ((y+1)*(y+1)).toString
    }
  }
}

This is obviously a stupid program, but it’s good for an example, because it’s so simple. If we’re going to run Silly a bunch of times with 1 as the first parameter, but with different second parameters, then we could generate a specialized version of Silly:

   val sillySpecialOne = PartialEvaluator.specialize(Silly, List("1"))

Here’s where the interesting part comes, where it’s really different from
currying. A partial evaluator evaluates everything that it
possibly can, given the inputs that it’s supplied. So the value produced by the specializer would be:

object Silly_1 extends Program {
  def run(inputs: List[String]): String = {
    val y: Int = augmentString(inputs[0]).toInt
    ((y+1)*(y+1)).toString
  }
}

This can be a really useful thing to do. It can turn in to a huge
optimization. (Which is, of course, why people are interested in it.) In compiler terms, it’s sort of like an uber-advanced form of constant propagation.

But the cool thing that we were talking about is going a bit meta. Suppose that you run a partial evaluator on a programming language interpreter?

object MyInterpreter extends Program {
  def run(inputs: List[String]): String = {
    val code = inputs[0]
    val inputsToCode = inputs.tail
    interpretProgram(code, inputsToCode)
  }

  def interpretProgram(code: String, inputs: List[String]): String = {
     ...
  }
}

We can run our partial evaluator to generate a version of the interpreter that’s specialized for the code that the interpreter is supposed to execute:

  1. We have an interpreter for language M, written in language L.
  2. We have a partial evaluator for language L.
  3. We run the partial evaluator on the interpreter, with a program
    written in M that we want to interpret:
    PartialEvaluator.specialize(M_Interpreter, M_Code).
  4. The partial evaluator produces a program written in L.

So the partial evaluator took a program written in M, and transformed it into a program written in L!

We can go a step further – and generate an M to L compiler. How? By running the partial evaluator on itself. That is, run the partial
evaluator like this: PartialEvaluator.specialize(PartialEvaluator, M_Interpreter). The result is a program which takes an M program as
input, and generates an L program: that is, it’s an M to L compiler!

We can go yet another step, and turn the partial evaluator into a
compiler generator: PartialEvaluator(PartialEvaluator,
List(source(PartialEvalutor)))
. What we get is a program which takes an interpreter written in L, and generates a compiler from it.

It’s possible to actually generate useful tools this way. People have actually implemented Lisp compilers this way! For example, you can see someone do a simple version here.

Introducing Algebraic Data Structures via Category Theory: Monoids

Since joining foursquare, I’ve been spending almost all of my time writing functional programs. At foursquare, we do all of our server programming in Scala, and we have a very strong bias towards writing our scala code very functionally.

This has increased my interest in category theory in an interesting way. As a programming language geek, I’m obviously fascinated by data structures. Category theory provides a really interesting handle on a way of looking at a kind of generic data structures.

Historically (as much as that word can be used for anything in computer science), we’ve thought about data structures primarily in a algorithmic and structural ways.

For example, binary trees. A binary tree consists of a collection of linked nodes. We can define the structure recursively really easily: a binary tree is a node, which contains pointers to at most two other binary trees.

In the functional programming world, people have started to think about things in algebraic ways. So instead of just defining data structures in terms of structure, we also think about them in very algebraic ways. That is, we think about structures in terms of how they behave, instead of how they’re built.

For example, there’s a structure called a monoid. A monoid is a very simple idea: it’s an algebraic structure with a set of values S, one binary operation *, and one value i in S which is an identity value for *. To be a monoid, these objects must satisfy some rules called the monad laws:

  1. \forall s \in S: s * i = s, i * s = s
  2. \forall x, y, z \in S: (x * y) * z = x * (y * z)

There are some really obvious examples of monoids – like the set of integers with addition and 0 or integers with multiplication and 1. But there are many, many others.

Lists with concatenation and the empty list are a monoid: for any list,
l ++ [] == l, [] + l == l, and concatenation is associative.

Why should we care if data structures like are monoids? Because we can write very general code in terms of the algebraic construction, and then use it over all of the different operations. Monoids provide the tools you need to build fold operations. Every kind of fold – that is, operations that collapse a sequence of other operations into a single value – can be defined in terms of monoids. So you can write a fold operation that works on lists, strings, numbers, optional values, maps, and god-only-knows what else. Any data structure which is a monoid is a data structure with a meaningful fold operation: monoids encapsulate the requirements of foldability.

And that’s where category theory comes in. Category theory provides a generic method for talking about algebraic structures like monoids. After all, what category theory does is provide a way of describing structures in terms of how their operations can be composed: that’s exactly what you want for talking about algebraic data structures.

The categorical construction of a monoid is, alas, pretty complicated. It’s a simple idea – but defining it solely in terms of the composition behavior of function-like objects does take a bit of effort. But it’s really worth it: when you see a monoidal category, it’s obvious what the elements are in terms of programming. And when we get to even more complicated structures, like monads, pullbacks, etc., the advantage will be even clearer.

A monoidal category is a category with a functor, where the functor has the basic properties of a algebraic monoid. So it’s a category C, paired with a bi-functor – that is a two-argument functor ⊗:C×C→C. This is the categorical form of the tensor operation from the algebraic monoid. To make it a monoidal category, we need to take the tensor operation, and define the properties that it needs to have. They’re called its coherence conditions, and basically, they’re the properties that are needed to make the diagrams that we’re going to use commute.

So – the tensor functor is a bifunctor from C×C to C. There is also an object I∈C, which is called the unit object, which is basically the identity element of the monoid. As we would expect from the algebraic definition, the tensor functor has two basic properties: associativity, and identity.

Associativity is expressed categorically using a natural isomorphism, which we’ll name α. For any three object X, Y, and Z, α includes a component αX,Y,Z (which I’ll label α(X,Y,Z) in diagrams, because subscripts in diagrams are a pain!), which is a mapping from (X⊗Y)⊗Z to X⊗(Y⊗Z). The natural isomorphism says, in categorical terms, that the the two objects on either side of its mappings are equivalent.

The identity property is again expressed via natural isomorphism. The category must include an object I (called the unit), and two natural isomorphisms, called &lamba; and ρ. For any arrow X in C, &lamba; and ρ contain components λX and ρX such that λX maps from I⊗X→X, and ρX maps from X⊗I to X.

Now, all of the pieces that we need are on the table. All we need to do is explain how they all fit together – what kinds of properties these pieces need to have for this to – that is, for these definitions to give us a structure that looks like the algebraic notion of monoidal structures, but built in category theory. The properties are, more or less, exact correspondences with the associativity and identity requirements of the algebraic monoid. But with category theory, we can say it visually. The two diagrams below each describe one of the two properties.

monoidal-categoy.png

The upper (pentagonal) diagram must commute for all A, B, C, and D. It describes the associativity property. Each arrow in the diagram is a component of the natural isomorphism over the category, and the diagram describes what it means for the natural isomorphism to define associativity.

Similarly, the bottom diagram defines identity. The arrows are all components of natural isomorphisms, and they describe the properties that the natural isomorphisms must have in order for them, together with the unit I to define identity.

Like I said, the definition is a lot more complicated. But look at the diagram: you can see folding in it, in the chains of arrows in the commutative diagram.

The Basics of Software Transactional Memory

As promised, it’s time for software transactional memory!

A bit of background, first. For most of the history of computers, the way that we’ve built software is very strongly based on the fact that a computer has a processor – a single CPU, which can do one thing at a time, in order. Until recently, that was true. But now it really isn’t anymore. There’s the internet – which effectively means that no computer is ever really operating in isolation – it’s part of a huge computational space shared with billions of other computers. And even ignoring the internet, we’re rapidly approaching the point where tiny devices, like cellphones, will have more than one CPU.

The single processor assumption makes things easy. We humans tend to think very sequentially – that is, the way that we describe how to do things is: do this, then do that. We’re not so good at thinking about how to do lots of things at the same time. Just think about human language. If I want to bake a cake, what I’ll do is: measure out the butter and sugar, put them in the mixer, mix it until they’re creamed. Then add milk. Then in a separate bowl, measure out and sift the flour and the baking powder. Then slowly pour the dry stuff into the wet, and mix it together. Etc.

I don’t need to fully specify that order. If I had multiple bakers, they could do many of steps at the same time. But how, in english, can I clearly say what needs to be done? I can say it, but it’s awkward. It’s harder to say, and harder to understand than the sequential instructions.

What I need to do are identifying families of things that can be done at the same time, and then the points at which those things will merge.

All of the measurements can be done at the same time. In any mixings step, the mixing can’t be done until all of the ingredients are ready. Ingredients being ready could mean two things. It could mean that the ingredients were measured out; or it could mean that one of the ingredients for the mix is the product of one of the previous mixing steps, and that that previous step is complete. In terms of programming, we’d say that the measurement steps are independent; and the points at which we need to wait for several things to get done (like “we can’t mix the dry ingredients in to the wet until the wet ingredients have all been mixed and the dry ingredients have been measured”), we’d call synchroniation points.

It gets really complicated, really quickly. In fact, it gets even more complicated than you might think. You see, if you were to write out the parallel instructions for this, you’d probably leave a bunch of places where you didn’t quite fully specify things – because you’d be relying on stuff like common sense on the part of your bakers. For example, you’d probably say to turn on the over to preheat, but you wouldn’t specifically say to wait until it reached the correct temperature to put stuff into it; you wouldn’t mention things like “open the over door, then put the cake into it, then close it”.

When we’re dealing with multiple processors, we get exactly those kinds of problems. We need to figure out what can be done at the same time; and we need to figure out what the synchronization points are. And we also need to figure out how to do the synchronization. When we’re talking about human bakers “don’t mix until the other stuff is ready” is fine. But in software, we need to consider things like “How do we know when the previous steps are done?”.

And it gets worse than that. When you have a set of processors doing things at the same time, you can get something called a race condition which can totally screw things up!

For example, imagine that we’re counting that all of the ingredients are measured. We could imagine the mixer process as looking at a counter, waiting until all five ingredients have been measured. Each measuring process would do its measurement, and then increment the counter.

  val measureCount = 0
  process Mixer() {
    wait until measureCount == 5
  }

  process Measurer() {
     do_measure
	 measureCount = measureCount + 1
  }

What happens if two measurer finish at almost the same time? The last statement in Measurer actually consists of three steps: retrieve the value of measureCount; add one; store the incremented value. So we could wind up with:

Time Measurer1 Measurer2 measureCount
0 1
1 Fetch measureCount(=1) 1
2 Increment(=2) Fetch measurecount(=1) 1
3 Store updated(=2) Increment(=2) 2
4 Store updated(=2) 2

Now, Mixer will never get run! Because of the way that the two Measurers overlapped, one of the increments effectively got lost, and so the count will never reach 5. And the way it’s written, there’s absolutely no way to tell whether or not that happened. Most of the time, it will probably work – because the two processes have to hit the code that increments the counter at exactly the same time in order for there to be a problem. But every once in a while, for no obvious reason, the program will fail – the mixing will never get done. It’s the worst kind of error to try to debug – one which is completely unpredictable. And if you try to run it in a debugger, because the debugger slows things down, you probably won’t be able to reproduce it!

This kind of issue always comes down to coordination or synchronization of some kind – that is, the main issue is how do the different processes interact without stomping on each other?

The simplest approach is to use something called a lock. A lock is an object which signals ownership of some resource, and which has one really important property: in concept, it points at at most one process, and updating it is atomic meaning that when you want to look at it and update it, nothing can intervene between the read and write. So if you want to use the resource managed by the lock, you can look at the lock, and see if anyone is using it; and if not, set it to point to you. That process is called acquiring the lock.

In general, we wrap locks up in libraries to make them easier to use. If “l” was a lock, you’d take a lock by using a function something like “lock(l)”, which really expanded to something like:

def take(L: Lock) {
  while (L != me)
     atomically do if L==no one then L=me
}

So the earlier code becomes:

val measureCount = 0
val l = new Lock()

process Mixer() {
  wait until measureCount == 5
}

process Measurer() {
  do_measure
  lock(l)
  measureCount = measureCount + 1
  unlock(l)
}

In a simple example like that, locks are really easy. Unfortunately, real examples get messy. For instance, there’s a situation called deadlock. A classic demonstration is something called the dining philosophers. You’ve got four philosophers sitting at a table dinner table. Between each pair, there’s a chopstick. In order to eat, a philosopher needs two chopsticks. When they get two chopsticks, they’ll use them to take a single bite of food, and then they’ll put down the chopsticks. If Each philosopher starts by grabbing the chopstick to their right, then no one gets to each. Each has one chopstick, and there’s no way for them to get a second one.

That’s exactly what happens in a real system. You lock each resource that you want to share. For some operations, you’re going to use more than one shared resource, and so you need two locks. If you have multiple tasks that need multiple resources, it’s easy to wind up with a situation where each task has some subset of the locks that they need.

Things like deadlock mean that simple locks get hairy really quickly. Not that any of the more complex coordination strategies make deadlocks impossible; you can always find a way of creating a deadlock in any system – but it’s a lot easier to create accidental deadlocks using simple locks than, say, actors.

So there’s a ton of methods that try to make it easier to do coordination between multiple tasks. Under the covers, these ultimately rely on primitives like locks (or semaphores, another similar primitive coordination tool). But they provide a more structured way of using them. Just like structured control flow makes code cleaner, clearer, and more maintanable, structured coordination mechanism makes concurrency cleaner, clearer, and more maintainable.

Software transactional memory is one approach to this problem, which is currently very trendy. It’s still not entirely clear to me whether or not STM is really quite up to the real world – current implementations remain promising, but inefficient. But before getting in to any of that, we need to talk about just what it is.

As I see it, STM is based on two fundamental concepts:

  1. Optimism. In software terms, by optimism, we mean that we’re going to plow ahead and assume that there aren’t any errors; when we’re done, we’ll check if there was a problem, and clean up if necessary. A good example of this from my own background is source code control systems. In the older systems like RCS, you’d lock a source file before you edited it; then you’d make your changes, and check them in, and release the lock. That way, you know that you’re never going to have two people making changes to the same file at the same time. But the downside is that you end up with lots of people sitting around doing nothing, waiting to get the lock on the file they want to change. Odds are, they weren’t going to change the same part of the file as the guy who has the lock. But in order to make sure that they can’t, the locks also block a lot of real work. Eventually, the optimistic systems came along, and what they did was say: “go ahead and edit the file all you want. But before I let you check in (save) the edited file to the shared repository, I’m going to make sure that no one changed it in a way that will mess things up. If I find out that someone did, then you’re going to have to fix it.”
  2. Transactions. A transaction is a concept from (among other places) databases. In a database, you often make a collection of changes that are, conceptually, all part of the same update. By making them a transaction, they become one atomic block – and either the entire collection all succeedd, or the entire collection all fail. Transactions guarantee that you’ll never end up in a situation where half of the changes in an update got written to the database, and the other half didn’t.

What happens in STM is that you have some collection of special memory locations or variables. You’re only allowed to edit those variables in a transaction block. Whenever a program enters a transaction block, it gets a copy of the transaction variables, and just plows ahead, and does whatever it wants with its copy of the transaction variables. When it gets to the end of the block, it tries to commit the transaction – that is, it tries to update the master variables with the values of its copies. But it checks them first, to make sure that the current value of the master copies haven’t changed since the time that it made its copy. If they did, it goes back and starts over, re-running the transaction block. So if anyone else updated any of the transaction variables, the transaction would fail, and then get re-executed.

In terms of our baking example, both of the measurers would enter the transaction block at the same time; and then whichever finished first would commit its transaction, which would update the master count variable. Then when the second transaction finished, it would check the count variable, see that it changed, and go back and start over – fetching the new value of the master count variable, incrementing it, and then committing the result. In terms of code, you’d just do something like:

transactional val measureCount = 0

process Mixer() {
  wait until measureCount == 5
}

process Measurer() {
  do_measure
  atomically {
    measureCount = measureCount + 1
  }
}

It’s really that simple. You just mark all the shared resources as transactional, and then wrap the code that modifies them in a transaction block. And it just works. It’s a very elegant solution.

Of course there’s a lot more to it, but that’s the basic idea. In the code, you identify the transactional variables, and only allow them to be updated inside of a transaction block. At runtime, when you encounter a transaction block, charge ahead, and do whatever you want. Then when you finish, make sure that there isn’t an error. If there was, try again.

So what’s it look like in a non-contrived programming language? These days, I’m doing most of my coding in Scala. There’s a decent STM implementation for Scala as a part of a package called Akka.

In Akka, the way that you define a transactional variable is by using a Ref type. A Ref is a basically a cell that wraps a value. (It’s almost like a pointer value in C.) So, for example, in our Baker example:

var count :Ref[Int] = Ref(0)

Then in code, to use it, you literally just wrap the code that modifies the Refs in “atomic”. Alas, you don’t quite get to treat the refs like normal variables – to access the value of a ref, you need to call Ref.get; to change the value, you need to use a method alter, which takes a function that computes the new value in terms of the old.

class Measurer {
  def doMeasure() {
    // do the measuring stuff
    atomic {
	  ref.alter(_ + 1)
    }
  }
}

The “(_ + 1)” probably needs a quick explanation. In Scala, you can define a single expression function using “_” to mark the slot where the parameter should go. So “(_ + 1)” is equivalent to the lambda expression { x => x + 1}.

You can see, just from this tiny example, why STM is such an attractive approach. It’s so simple! It’s very easy to read and write. It’s a very simple natural model. It’s brilliant in its simplicity. Of course, there’s more to it that what I’ve written about here – error handling, voluntary transaction abort, retry management, transaction waits – but this is the basics, and it really is this simple.

What are the problems with this approach?

  1. Impurity. If not all variables are transactional, and you can modify a non-transactional variable inside of a transaction block, then you’ve got a mess onp your hands. Values from transactionals can “leak” out of transaction blocks.
  2. Inefficiency. You’ve got to either copy all of the transactional variables at the entry to the transaction block, or you’ve got to use some kind of copy-on-write strategy. However you do it, you’ve got grief – aggressive copying, copy-on-write, memory protect – they’ve all got non-trivial costs. And re-doing the entire transaction block every time it fails can eat a lot of CPU.
  3. Fairness. This is a fancy term for “what if one guy keeps getting screwed?” You’ve got lots of processes all working with the same shared resources, which are protected behind the transactional variables. It’s possible for timing to work out so that one process keeps getting stuck doing the re-tries. This is something that comes up a lot in coordination strategies for concurrency, and the implementations can get pretty hairy trying to make sure that they don’t dump all of the burden of retries on one process.

The Wrong Way To Write Concurrent Programs: Actors in Cruise

I’ve been planning to write a few posts about some programming stuff that interests me. I’ve spent a good part of my career working on systems that need to support concurrent computation. I even did my PhD working on a system to allow a particular style of parallel programming. It’s a really hard problem – concurrency creates a host of complex issues in how systems behave, and the way that you express concurrency in a programming language has a huge impact on how hard it is to read, write, debug, and reason about systems.

So, like I’ve said, I’ve spent a lot of time thinking about these issues, and looking at various different proposed solutions, as well as proposing a couple of my own. But I really don’t know of any good writeup describing the basics of the most successful approaches for beginners. So I thought I could write one.

But that’s not really today’s post. Todays post is my version of a train-wreck. Long-time readers of the blog know that I’m fascinated with bizarre programming languages. So today, I’m going to show you a twisted, annoying, and thoroughly pointless language that I created. It’s based on one of the most successful models of concurrent programming, called Actors, which was originally proposed by Professor Gul Agha of UIUC. There’ve been some really nice languages built using ideas from Actors, but this is not one of them.

The language is called “Cruise”. Why? Because it’s a really bad Actor language. And what name comes to mind when I think of really, really bad actors with delusions of adequacy? Tom Cruise.

You can grab my implementation from github. Just so you know, the code sucks. It’s something I threw together in my spare time a couple of years ago, and haven’t really looked at since. So it’s sloppy, overcomplicated, probably buggy, and slow as a snail on tranquilizers.

Quick overview of the actor model

Actors are a theoretical model of computation, which is designed to describe completely asynchronous parallel computation. Doing things totally asynchronously is very strange, and very counter-intuitive. But the fact of the matter is, in real distributed systems, everything *is* fundamentally asynchronous, so being able to describe distributed systems in terms of a simple, analyzable model is a good thing.

According to the actor model, a computation is described by a collection of things called, what else, actors. An actor has a mailbox, and a behavior. The mailbox is a uniquely named place where messages sent to an actor can be queued; the behavior is a definition of how the actor is going to process a message from its mailbox. The behavior gets to look at the message, and based on its contents, it can do three kinds of things:

  1. Create other actors.
  2. Send messages to other actors whose mailbox it knows.
  3. Specify a new behavior for the actor to use to process its next message.

You can do pretty much anything you need to do in computations with that basic mechanism. The catch is, as I said, it’s all asynchronous. So, for example, if you want to write an actor that adds two numbers, you can’t do it by what you’d normally think of as a function call. In a lot of ways, it looks like a method call in something like Smalltalk: one actor (object) sends a message to another actor, and in response, the receiver takes some action specified by them message.

But subroutines and methods are synchronous, and nothing in actors is synchronous. In an object-oriented language, when you send a message, you stop and wait until the receiver of the message is done with it. In Actors, it doesn’t work that way: you send a message, and it’s sent; that’s it, it’s over and done with. You don’t wait for anything; you’re done. If you want a reply, you need to send the the other actor a reference to your mailbox, and make sure that your behavior knows what to do when the reply comes in.

It ends up looking something like the continuation passing form of a functional programming language: to do a subroutine-like operation, you need to pass an extra parameter to the subroutine invocation; that extra parameter is the *intended receiver* of the result.

You’ll see some examples of this when we get to some code.

Tuples – A Really Ugly Way of Handling Data

This subtitle is a bit over-the-top. I actually think that my tuple notion is pretty cool. It’s loosely based on how you do data-types in Prolog. But the way that it’s implemented in Cruise is absolutely awful.

Cruise has a strange data model. The idea behind it is to make it easy to build actor behaviors around the idea of pattern matching. The easiest/stupidest way of doing this is to make all data consist of tagged tuples. A tagged tuple consists of a tag name (an identifier starting with an uppercase letter), and a list of values enclosed in the tuple. The values inside of a tuple can be either other tuples, or actor names (identifiers starting with lower-case letters).

So, for example, Foo(Bar(), adder) is a tuple. The tag is “Foo“. It’s contents are another tuple, “Bar()“, and an actor name, “adder“.

Since tuples and actors are the only things that exist, we need to construct all other types of values from some combination of tuples and actors. To do math, we can use tuples to build up Peano numbers. The tuple “Z()” is zero; “I(n)” is the number n+1. So, for example, 3 is “I(I(I(Z())))“.

The only way to decompose tuples is through pattern matching in messages. In an actor behavior. message handlers specify a *tuple pattern*, which is a tuple where some positions may be filled by{em unbound} variables. When a tuple is matched against a pattern, the variables in the pattern are bound to the values of the corresponding elements of the tuple.

A few examples:

  • matching I(I(I(Z()))) with I($x) will succeed with $x bound to I(I(Z)).
  • matching Cons(X(),Cons(Y(),Cons(Z,Nil()))) with Cons($x,$y) will succeed with $x bound to X(), and $y bound to Cons(Y(),Cons(Z(),Nil())).
  • matching Cons(X(),Cons(Y(),Cons(Z(),Nil()))) with Cons($x, Cons(Y(), Cons($y, Nil()))) will succeed with $x bound to X(), and $y bound to Z().
  • Code Examples!

    Instead of my rambling on even more, let’s take a look at some Cruise programs. We’ll start off with Hello World, sort of.


    actor !Hello {
      behavior :Main() {
        on Go() { send Hello(World()) to out }
      }
      initial :Main
    }
    
    instantiate !Hello() as hello
    send Go() to hello
    
    

    This declares an actor type “!Hello”; it’s got one behavior with no parameters. It only knows how to handle one message, “Go()”. When it receives go, it sends a hello world tuple to the actor named “out”, which is a built-in that just prints whatever is sent to it.

    Let’s be a bit more interesting, and try something using integers. Here’s some code to do a greater than comparison:


    actor !GreaterThan {
      behavior :Compare() {
        on GT(Z(),Z(), $action, $iftrue, $iffalse) {
          send $action to $iffalse
        }
        on GT(Z(), I($x), $action, $iftrue, $iffalse) {
          send $action to $iffalse
        }
        on GT(I($x), Z(), $action, $iftrue, $iffalse) {
          send $action to $iftrue
        }
        on GT(I($x), I($y), $action, $iftrue, $iffalse) {
          send GT($x,$y,$action,$iftrue,$iffalse) to $self
        }
      }
      initial :Compare
    }
    
    actor !True {
      behavior :True() {
        on Result() { send True() to out}
      }
      initial :True
    }
    
    actor !False {
      behavior :False() {
        on Result() { send False() to out}
      }
      initial :False
    }
    
    instantiate !True() as true
    instantiate !False() as false
    instantiate !GreaterThan() as greater
    send GT(I(I(Z())), I(Z()), Result(), true, false) to greater
    send GT(I(I(Z())), I(I(I(Z()))), Result(), true, false) to greater
    send GT(I(I(Z())), I(I(Z())), Result(), true, false) to greater
    
    

    This is typical of how you do “control flow” in Cruise: you set up different actors for each branch, and pass those actors names to the test; one of them will receive a message to continue the execution.

    What about multiple behaviors? Here’s a trivial example of a flip-flop:


    actor !FlipFlop {
      behavior :Flip() {
        on Ping($x) {
          send Flip($x) to out
          adopt :Flop()
        }
        on Pong($x) {
          send Flip($x) to out
        }
      }
      behavior :Flop() {
        on Ping($x) {
          send Flop($x) to out
        }
        on Pong($x) {
          send Flop($x) to out
          adopt :Flip()
        }
      }
      initial :Flip
    }
    
    instantiate !FlipFlop() as ff
    send Ping(I(I(Z()))) to ff
    send Ping(I(I(Z()))) to ff
    send Ping(I(I(Z()))) to ff
    send Ping(I(I(Z()))) to ff
    send Pong(I(I(Z()))) to ff
    send Pong(I(I(Z()))) to ff
    send Pong(I(I(Z()))) to ff
    send Pong(I(I(Z()))) to ff
    
    

    If the actor is in the “:Flip” behavior, then when it gets a “Ping”, it sends “Flip” to out, and switches behavior to flop. If it gets point, it just sents “Flip” to out, and stays in “:Flip”.

    The “:Flop” behavior is pretty much the same idea, accept that it switches behaviors on “Pong”.

    An example of how behavior changing can actually be useful is implementing settable variables:


    actor !Var {
      behavior :Undefined() {
        on Set($v) { adopt :Val($v) }
        on Get($target) { send Undefined() to $target }
        on Unset() { }
      }
      behavior :Val($val) {
        on Set($v) { adopt :Val($v) }
        on Get($target) { send $val to $target }
        on Unset() { adopt :Undefined() }
      }
      initial :Undefined
    }
    instantiate !Var() as v
    send Get(out) to v
    send Set(I(I(I(Z())))) to v
    send Get(out) to v
    
    

    Two more programs, and I’ll stop torturing you. First, a simple adder:


    actor !Adder {
      behavior :Add() {
        on Plus(Z(),$x, $target) {
          send $x to $target
        }
        on Plus(I($x), $y, $target) {
          send Plus($x,I($y), $target) to $self
        }
      }
      initial :Add
    }
    
    actor !Done {
      behavior :Finished() {
        on Result($x) { send Result($x) to out }
      }
      initial :Finished
    }
    
    instantiate !Adder() as adder
    instantiate !Done() as done
    send Plus(I(I(I(Z()))),I(I(Z())), out) to adder
    
    

    Pretty straightforward – the only interesting thing about it is the way that it sends the result of invoking add to a continuation actor.

    Now, let’s use an addition actor to implement a multiplier actor. This shows off some interesting techniques, like carrying auxiliary values that will be needed by the continuation. It also shows you that I cheated, and added integers to the parser; they’re translated into the peano-tuples by the parser.


    actor !Adder {
      behavior :Add() {
        on Plus(Z(),$x, $misc, $target) {
          send Sum($x, $misc) to $target
        }
        on Plus(I($x), $y, $misc, $target) {
          send Plus($x,I($y), $misc, $target) to $self
        }
      }
      initial :Add
    }
    
    actor !Multiplier {
      behavior :Mult() {
        on Mult(I($x), $y, $sum, $misc, $target) {
          send Plus($y, $sum, MultMisc($x, $y, $misc, $target), $self) to adder
        }
        on Sum($sum, MultMisc(Z(), $y, $misc, $target)) {
          send Product($sum, $misc) to $target
        }
        on Sum($sum, MultMisc($x, $y, $misc, $target)) {
          send Mult($x, $y, $sum, $misc, $target) to $self
        }
      }
      initial :Mult
    }
    
    instantiate !Adder() as adder
    instantiate !Multiplier() as multiplier
    send Mult(32, 191, 0, Nil(), out) to multiplier
    
    

    So, is this Turing complete? You bet: it’s got peano numbers, conditionals, and recursion. If you can do those three, you can do anything.

A Taste of Specification with Alloy

In my last post (which was, alas, a stupidly long time ago!), I talked a bit about software specification, and promised to talk about my favorite dedicated specification tool, Alloy. Alloy is a very cool system, designed at MIT by Daniel Jackson and his students.

Alloy is a language for specification, along with an environment which allows you to test your specifications. In a lot of ways, it looks like a programming language – but it’s not. You can’t write programs in Alloy. What you can do is write concise, clear, and specific descriptions of how something else works.

I’m not going to try to really teach you Alloy. All that I’m going to do is give you a quick walk-though, to try to show you why it’s worth the trouble of learning. If you want to learn it, the Alloy group’s website has a really good the official Alloy tutorial. which you should walk through.

Continue reading

The Value of Tests: It's more than just testing!

Since I have some free time, I’ve been catching up on some of the stuff
I’ve been meaning to read. I’ve got a reading list of stuff that I’ve wanted
to look at that were written by other authors with my publisher. Yesterday, I started looking at Cucumber, which is an interesting behavior-driven development tool. This post isn’t really about Cucumber, but about something that Cucumber reminded me of.

When a competent programmer builds software, they write tests. That’s just
a given. But why do we do it? It seems like the answer is obvious: to make sure that our software works. But I’d argue that there’s another reason, which in the long run is as important as the functional one. It’s to describe what the software does. A well-written test doesn’t just make sure that the software does the right thing – it tells other programmers what the code is supposed to do.

A test is an executable specification. Specifications are a really good thing; executable specifications are even better.

Continue reading