Category Archives: Programming

Stuff Everyone Should Do (part 2): Coding Standards

Another thing that we did at Google that I thought was surprisingly effective and useful was strict coding standards.

Before my time at Google, I was sure that coding standards were pointless. I had absolutely no doubt that they were the kind of thing that petty bureaucrats waste time writing and then use to hassle people who are actually productive.

I was seriously wrong.

At Google, I could look at any piece of code, anywhere in Google’s codebase, and I could read it. The fact that I was allowed to do that was pretty unusual in itself. But what was surprising to me was just how much the standardization of style – indents, names, file structures, and comment conventions – made it dramatically easier to look at a piece of unfamiliar code and understand it. This is still surprising to me – because those are all trivial things. They shouldn’t have much impact – but they do. It’s absolutely shocking to realize how much of the time you spend reading code is just looking for the basic syntactic structure!

There’s a suite of common objections to this, all of which I used to believe.

It wastes time!
I’m a good coder, and I don’t want to waste time on stupidity. I’m good enough that when I write code, it’s clear and easy to understand. Why should I waste my time on some stupid standard? The answer is: because there is a value in uniformity. As I alluded to earlier – the fact that every piece of code that you look at — whether it was written by you, by one of your closest coworkers, or by someone 11 timezones away — will always demarcate structures in the same way, will always use the same naming conventions – it really, genuinely makes a big difference. You need so much less effort to read code that you haven’t looked at in a while (or at all), because you can immediately recognize the structure.
I’m an artist!
This is phrased facetiously, but it does reflect a common complaint. We programmers have a lot of pride in our personal style. The code that I write really does reflect something about me and how my mind works. It’s a reflection of my skill and my creativity. If I’m forced into some stupid standard, it seems like it’s stifling my creativity. The thing is, the important parts of your style, the important reflections of your mind and your creativity aren’t in trivial syntactic things. (If it is, then you’re a pretty crappy programmer.) The standard actually makes it easier for other people to see your creativity – because they can actually see what you’re doing, without being distracted by the unfamiliar syntactic quirks.
One size fits all actually fits none!
If you have a coding standard that wasn’t designed specifically for your project, then it’s probably non-optimal for your project. That’s fine. Again, it’s just syntax: non-optimal doesn’t mean bad. The fact that it’s not ideal for your project doesn’t mean that it’s not worth doing. Yeah, sure, it does reduce the magnitude of the benefit for your project, but at the same time, it increases the magnitude of the benefit for the larger organization. In addition, it frequently makes sense to have project-specific code styles. There’s nothing wrong with having a project-specific coding standard. In fact, in my experience, the best thing is to have a very general coding standard for the larger organization, and then project-specific extensions of that for the project-specific idioms and structures.
I’m too good for that!
This is actually the most common objection. It’s sort-of a combination of the others, but it gets at an underlying attitude in a direct way. This is the belief on the part of the complainer that they’re a better programmer than whoever wrote the standard, and lowering themselves to following the standard written by the inferior author will reduce the quality of the code. This is, to put it mildly, bullshit. It’s pure arrogance, and it’s ridiculous. The fact of the matter is that no one is so good that any change to their coding style will damage the code. If you can’t write good code to any reasonable coding standard, you’re a crappy programmer.

When you’re coding against a standard, there are inevitably going to be places where you disagree with the standard. There will be places where your personal style is better than the standard. But that doesn’t matter. There will, probably, also be places where the standard is better than your style. But that doesn’t matter easier. As long as the standard isn’t totally ridiculous, the comprehension benefits are significant enough to more than compensate for that.

But what if the coding standard is totally ridiculous?

Well, then, it’s rough to be you: you’re screwed. But that’s not really because of the ridiculous coding standard. It’s because you’re working for idiots. Screwing up a coding standard enough to really prevent good programmers from writing good code is hard. It requires a sort of dedicated, hard-headed stupidity. If you’re working for people who are such idiots that they’d impose a broken coding standard, then they’re going to do plenty of other stupid stuff, too. If you’re working for idiots, you’re pretty much screwed no matter what you do, coding standard or no. (And I don’t mean to suggest that software businesses run by idiots are rare; it’s an unfortunate fact, but there’s no shortage of idiots, and there are plenty of them that have their own businesses.)

Things Everyone Should Do: Code Review

As I alluded to in my last post (which I will be correcting shortly), I no longer work for Google. I still haven’t decided quite where I’m going to wind up – I’ve got a couple of excellent offers to choose between. But in the interim, since I’m not technically employed by anyone, I thought I’d do a bit of writing about some professional things that are interesting, but that might have caused tension with coworkers or management.

Google is a really cool company. And they’ve done some really amazing things – both outside the company, where users can see it, and inside the company. There are a couple of things about the inside that aren’t confidential, but which also haven’t been discussed all that widely on the outside. That’s what I want to talk about.

The biggest thing that makes Google’s code so good is simple: code review. That’s not specific to Google – it’s widely recognized as a good idea, and a lot of people do it. But I’ve never seen another large company where it was such a universal. At Google, no code, for any product, for any project, gets checked in until it gets a positive review.

Everyone should do this. And I don’t just mean informally: this should really be a universal rule of serious software development. Not just product code – everything. It’s not that much work, and it makes a huge difference.

What do you get out of code review?

There’s the obvious: having a second set of eyes look over code before it gets checked in catches bugs. This is the most widely cited, widely recognized benefit of code review. But in my experience, it’s the least valuable one. People do find bugs in code review. But the overwhelming majority of bugs that are caught in code review are, frankly, trivial bugs which would have taken the author a couple of minutes to find. The bugs that actually take time to find don’t get caught in review.

The biggest advantage of code review is purely social. If you’re programming and you know that your coworkers are going to look at your code, you program differently. You’ll write code that’s neater, better documented, and better organized — because you’ll know that people who’s opinions you care about will be looking at your code. Without review, you know that people will look at code eventually. But because it’s not immediate, it doesn’t have the same sense of urgency, and it doesn’t have the same feeling of personal judgement.

There’s one more big benefit. Code reviews spread knowledge. In a lot of development groups, each person has a core component that they’re responsible for, and each person is very focused on their own component. As long as their coworkers components don’t break their code, they don’t look at it. The effect of this is that for each component, only one person has any familiarity with the code. If that person takes time off or – god forbid – leaves the company, no one knows anything about it. With code review, you have at least two people who are familiar with code – the author, and the reviewer. The reviewer doesn’t know as much about the code as the author – but they’re familiar with the design and the structure of it, which is incredibly valuable.

Of course, nothing is every completely simple. From my experience, it takes some time before you get good at reviewing code. There are some pitfalls that I’ve seen that cause a lot of trouble – and since they come up particularly frequently among inexperienced reviewers, they give people trying code reviews a bad experience, and so become a major barrier to adopting code review as a practice.

The biggest rule is that the point of code review is to find problems in code before it gets committed – what you’re looking for is correctness. The most common mistake in code review – the mistake that everyone makes when they’re new to it – is judging code by whether it’s what the reviewer would have written.

Given a problem, there are usually a dozen different ways to solve it. Andgiven a solution, there’s a million ways to render it as code. As a reviewer, your job isn’t to make sure that the code is what you would have written – because it won’t be. Your job as a reviewer of a piece of code is to make sure that the code as written by its author is correct. When this rule gets broken, you end up with hard feelings and frustration all around – which isn’t a good thing.

The thing is, this is such a thoroughly natural mistake to make. If you’re a programmer, when you look at a problem, you can see a solution – and you think of what you’ve seen as the solution. But it isn’t – and to be a good reviewer, you need to get that.

The second major pitfall of review is that people feel obligated to say something. You know that the author spent a lot of time and effort working on the code – shouldn’t you say something?

No, you shouldn’t.

There is never anything wrong with just saying “Yup, looks good”. If you constantly go hunting to try to find something to criticize, then all that you accomplish is to wreck your own credibility. When you repeatedly make things to criticize just to find something to say, then the people who’s code you review will learn that when you say something, that you’re just saying it to fill the silence. Your comments won’t be taken seriously.

Third is speed. You shouldn’t rush through a code review – but also, you need to do it promptly. Your coworkers are waiting for you. If you and your coworkers aren’t willing to take the time to get reviews done, and done quickly, then people are going to get frustrated, and code review is just going to cause frustration. It may seem like it’s an interruption to drop things to do a review. It shouldn’t be. You don’t need to drop everything the moment someone asks you to do a review. But within a couple of hours, you will take a break from what you’re doing – to get a drink, to go to the bathroom, to talk a walk. When you get back from that, you can do the review and get it done. If you do, then no one will every be left hanging for a long time waiting on you.

The Perils of Premature Optimization

A reader who’s known me for a while just sent me a note asking me to say something about premature optimization. The reason that I mention that he’s known me for a while (in fact, someone who used to be a coworker) is because this is a topic that I’m prone to rant about in person, but I’ve never mentioned on the blog. So he knows that it’s something that I’ve got a strong opinion about. Basically, he’s having trouble dealing with an annoying coworker doing this, and he wants a rant-by-proxy. I’m OK with that. :-).

When you’re writing a new piece of software, particularly on a modern computer, one of the unintuitive things that frequently happens is that the performance of your system is really quite different from what you’d expect.And even when everything is as expected, most people don’t have a particularly good sense of tradeoffs – if I make this thing faster, what effect will it have on that? and if it does really improve the performance, how much will it improve it, and at what cost? If you can’t answer that – and answer it precisely, with supporting evidence – then you have no business optimizing.

So when you sit down to write some new code, what you should really do is write code that’s algorithmically efficient – that is, you pick an algorithm that has good performance in asymptotic time – and implement it in a straightforward way.

But what many people do – in fact, what pretty much all of us do at least some of the time – is try to be clever. We try to find opportunities to change the code to make it faster. Doing that before you understand where the computer actually spends its time when it’s running the program is what we call premature optimization.

Why not?

Continue reading

Regular Expressions and Derivatives

When you’re working with regular languages specified in regular expression form, there’s a really cool idea that you can use for building regular expression matchers, and for describing how to convert from a regular expression to a NFA. It’s called the Brzozozwksi derivative of a regular expression – or just simply the derivative of a regexp.

The basic idea of the derivative is that given a regular expression, r, you can derive a new regular expression called the derivative with respect to symbol c, D_c(r). D_c(r) is a regular expression describing the string matched by r after it’s matched an r.

Continue reading

Apex: My Editor Project

Lots of people were intrigued by my reference to my editor project. So I’m sharing the current language design with you folks. I’m calling it Apex, as a homage to the brilliant Acme, which is the editor that comes closest to what I’d like to be able to use.

So. The language for Apex is sort of a combination of TECO and Sam. Once the basic system is working, I’m planning on a UI that’s modeled on Acme. For now, I’m going to focus on the language.

Continue reading

The Glorious Horror of TECO

In my oh-so-abundant free time, I’ve been working on my own little text editor. And one of my motivations is TECO: one of the oldest, and one of the very best, ever written. It’s both a text editor and a programming language – and, in fact, that’s exactly what made it such a brilliant tool. So much of the drudgery of programming is stuff that really could be done by a program. But we’ve spent so much time learning to be fancy that we’ve lost track of that. Nowadays, you can write an emacs lisp program to do the stuff you used to do in TECO; only it’s awkward enough that you usually don’t.

The problem, though, with just re-implementing TECO with a modern UI is that it was designed in a different time. When TECO was written, every byte was critical. And so the language, the syntax, the structure, it was completely ridiculous. And, as a result, it became the world’s most useful pathological programming language. It’s a glorious, hideous, wonderful, horrific piece of computing history

TECO is one of the most influential pieces of software ever written. If, by chance, you’ve ever heard of a little editor called “emacs”; well, that was originally a set of editor macros for TECO (EMACS = Editor MACroS). As a language, it’s both wonderful and awful. On the good side, The central concept of the language is wonderful: it’s a powerful language for processing text, which works by basically repeatedly finding text that matches some kind of pattern, taking some kind of action when it finds it, and then selecting the next pattern to look for. That’s a very natural, easy to understand way of writing programs to do text processing. On the bad side, it’s got the most god-awful hideous syntax ever imagined.

Continue reading

Finger Trees Done Right (I hope)

A while ago, I wrote a couple of posts that claimed to talk about finger trees. Unfortunately, I really botched it. I’d read a bunch of data structure papers, and managed to get myself thoroughly scrambled. What I wrote about was distantly related to finger trees, and it was useful to help understand how fingertrees work – but it was not, in any way, shape, or form, actually a description of fingertrees. Since then, I’ve been meaning to write a proper post explaining finger trees – but with the work on my book, and with chaos at work, I just haven’t had the time. This time, in order to do my damnedest to make sure that I don’t screw it up again, I’m basically go to describe finger trees over a couple of posts by walking through the best finger-tree paper that I could find. The paper is “Finger Trees: a simple general-purpose data structure”, by Ralf Hinze and Ross Patterson. This might by the paper that introduced the structure, but I’m not sure.

The point of finger trees is pretty simple. It’s very similar to the point of zippers. Programming in functional languages is terrific. As I’ve described before, there are a lot of advantages to writing functional code. But there are also a lot of places where a naive implementation of an algorithm using a functional data structure is dreadfully inefficient. Functional code may be prettier, more maintainable, and more reusable – but imperative code is frequently much more efficient. When you’re doing an operation that, conceptually, modifies a piece of a complex data structure, then functional code can really suck. Finger trees give you a way around that – for many common updatabale data structures, you can build finger-tree versions that are very close to or fully as good as imperative, updating structures.

Continue reading

Code in the Cloud: My Book Beta is Available!

As I’ve mentioned before, I’ve been spending a lot of time working on a book.
Initially, I was working on a book made up of a collection of material from blog posts;
along the way, I got diverted, and ended up writing a book about cloud computing using
Google’s AppEngine tools. The book isn’t finished, but my publisher, the Pragmatic Programmers,
have a program that they call beta books. Once a book is roughly 60% done, you
can buy it at a discount, and download drafts electronically immediately. As more sections
get done, you can download each new version. And when the book is finally finished, you
get a final copy.

We released the first beta version of the book today. You can look at
excerpts, or buy a copy, by going to
the books page
at Pragmatic’s website.

If you’re interested in what cloud computing is, and how to build cloud applications – or if
you just feel like doing something to support you friendly local math-blogger – please take
a look, and consider getting a copy. I’m not going to harp about the book a lot on the blog; you’re
not going to see a ton of posts that are thinly veiled advertisements, or updates tracking
sales, or anything like that. If there’s something that I would have written about anyway,
and it’s appropriate to mention the book, then I’ll feel free to mention it, but I won’t
waste your time hyping it.

In other news, here’s the main reason that things have been dead on this blog since
the weekend:

photo.jpeg

That’s the view from my driveway as of monday morning. Over the weekend,
we had one of the worst windstorms to hit New York in about thirty years. That
mess is two oak trees, each close to 2 meters in diameter, which came down on
our street on saturday. (If you look closely towards the right hand side, you
can see the remains of my neighbors car.) The telephone pole in the picture
was snapped not by getting hit by a tree, but simply by the wind. Since that
pole had our electrical transformer, and those trees took out the wiring that
fed that transformer, we are (obviously) without electricity, internet, or
(most importantly) heat.

Con-ed is promising to restore our electricity by friday. I’m not holding my
breath.

Anyway, back to the happy stuff. The book exists in electronic form! Buy
a copy for yourself, your friends, your neighbors, and your dog! We’ve got lots
of wonderful new expenses to deal with recovering from that storm! 🙂

Zippers: Making Functional "Updates" Efficient

In the Haskell stuff, I was planning on moving on to some monad-related
stuff. But I had a reader write in, and ask me to write another
post on data structures, focusing on a structured called a
zipper.

A zipper is a remarkably clever idea. It’s not really a single data
structure, but rather a way of building data structures in functional
languages. The first mention of the structure seems to be a paper
by Gerard Huet in 1997
, but as he says in the paper, it’s likely that this was
used before his paper in functional code — but no one thought to formalize it
and write it up. (In the original version of this post, I said the name of the guy who first wrote about zippers was “Carl Huet”. I have absolutely no idea where that came from – I literally had his paper on my lap as I wrote this post, and I still managed to screwed up his name. My apologies!)

It also happens that zippers are one of the rare cases of data structures
where I think it’s not necessarily clearer to show code. The concept of
a zipper is very simple and elegant – but when you see a zippered tree
written out as a sequence of type constructors, it’s confusing, rather
than clarifying.

Continue reading

Advanced Haskell Data Structures: Red-Black Trees

unbalanced-trees.jpg

So, we’ve built up some pretty nifty binary trees – we can use the binary tree both as the basis of an implementation of a set, or as an implementation of a dictionary. But our implementation has had one major problem: it’s got absolutely no way to maintain balance.

What that means is that depending on the order in which things are inserted to the tree, we might have excellent performance, or we might be no better than a linear list. For example, look at these trees. As you can see, a tree with the same values can wind up quite different. In a good insert order, you can wind up with a nicely balanced tree: the minimum distance from root to leaf is 3; the maximum is 4. On the other hand, take the same values, and insert them in a different order and you get a rotten tree; the minimum distance from root to leaf is 1, and the maximum is 7. So depending on luck, you can get a tree that gives you good performance, or one that ends up giving you no better than a plain old list. Playing with a bit of randomization can often give you reasonably good performance on average – but if you’re using a tree, it’s probably because O(n) complexity is just too high. You want the O(lg n) complexity that you’ll get from a binary tree – and not just sometimes.

To fix that, you need to change the structure a bit, so that as you insert things, the tree stays balanced. There are several different approaches to how you can do this. The one that we’re going to look at is based on labeling nodes in ways that allow you to very easily detect when a serious imbalance is developing, and then re-arrange the tree to re-balance it. There are two major version of this, called the AVL tree, and the red-black tree. We’re going to look at the red-black. Building a red-black tree is as much a lesson in data structures as it is in Haskell, but along with learning about the structure, we’ll see a lot about how to write code in Haskell, and particularly about how to use pattern-matching for complex structures.

Continue reading