Yes all men

Unless you’ve been hiding under a rock, you know about the horrible events of last friday. A misogynistic creep killed a bunch of people, because he believed that women owed him sex and affection, because in his own opinion, he was a terrific guy, an absolute prince. But they didn’t give him what he deserved, and so, he decided to punish them for their “crimes”, and went off on a killing spree.

Seven dead people and a dozen injured ones later, people started reacting. Many women, quite appropriately, pointed out the misogynistic venom of this guy, and how common it is among men. And predictably, tons of men got annoyed at that, and replied saying “Not all men are like that”, and therefore, not all men are responsible.


Yes, we are responsible. We live in this culture. We sit here, day after day, watching this shit, and doing nothing. Because we maintain that we aren’t guilty. Women on the internet are regularly threatened for the crime of speaking, and we sit by and watch, without doing or saying anything about it.

We are part of the problem, because, for the most part, we don’t care. We aren’t the targets of the abuse. We aren’t the ones who can’t walk down the street without getting harassed. We aren’t the ones who can’t speak without being threatened. And so we just don’t care.

When a man like this guy goes out and murders people because of his hatred of women, our main concern isn’t how common people like him are. It’s not how many women are threatened by people like him, or how many women are actually killed by people like him. It’s about how unfair it is that women talk about the kind of hatred they face from men, without making a specific exception for guys like us. What we worry about isn’t the threats they face – it’s how their reaction to being threatened with actual violence hurts our poor, precious feelings.

Yes, it’s all men who are responsible. Let’s face it: we live in a culture where we are the dominant group. If we got together, stood up, and said “We’re not going to tolerate this shit anymore” – if even a decent-sized minority of us were willing to stand up and say it – the hateful assholes would be driven underground. If we stood up and said “No”, and made sure that any shit-headed bigoted woman-hater actually paid some price in standing in our communities, the threats would end.

If we acknowledged that the violent hatred of women was not just a sickness; that a threat to women is a real threat to other human beings that was serious; that those threats are crimes. That the everyday threats against women are more serious that the threats of terrorism that we’ve used to justify war. If we did that, we’d have to admit that we need to do something about it.

But we don’t. We sit and watch, and downplay the threats. We say that they’re not really serious, that’s just how people act on the internet. We say that the threats don’t matter – those fragile women just need to “man up”, and grow thicker skins. And when women die – as they do every day – we just say it was a crazy person, there’s nothing we can do about it.

3000 people died in a terrorist attack 13 years ago. We were so horrified by it that we started two wars, and virtually rewrote our entire legal system, because it was such a horrible, terrifying threat! We needed to do something to protect ourselves from the terrorists!

The men who threaten women are terrorists too. They’re working from exactly the same motivations: the use of fear to control behavior. Men who threaten women are making threats to force women to behave the way they want them to.

We’re willing to make insane sacrifices to protect ourselves from the threats of terrorists. But we’re not willing to sacrifice anything to protect women from other men.

Until that’s not true anymore – until we stand up and say “Enough!”, and make the behavior truly unacceptable – then we are guilty. All of us.

You can’t even describe most numbers!

(This is a revised version of an old post; you can see the original here.)

Please read the addendum at the bottom – I got carried away with computability, and ended up screwing up quite badly.

In the comments on my last post, a bit of discussion came up about some of the strange properties of real numbers, and that made me want to go back and get this old post about numbers that can’t even be described, and update it.

In those comments, we were talking about whether or not π contains all sorts of interesting subsequences. The fact is, when it comes to π, we just don’t know.

But… We do know that there are many numbers that do. There’s a property called normality. Informally, normality just says that taken to infinity, all sequences of digits are equally likely to occur. In an infinitely long normal number, that means that any finite length subsequence will occur, given enough time. My name, your name, the complete video of the original version of Star Wars, a photograph of the dinner I just ate and never took a photo of – it’s all there, in any normal number.

If you think about that, at first, it might seem profound: there are numbers that contain absolutely everything that ever did exist, and that ever could exist. That seems like it should be amazing. If the numbers were at all special, it might be. But they aren’t. Normality isn’t a rare property that’s only possessed by a special meaningful number like π. It’s a property that’s possessed by almost every number. If you could create a randomly selected list of a billion real numbers (you can’t in reality, but you can mathematically, using a finite version of the axiom of choice), odds are, all of them would be normal – all of them would have this property.

But here’s where it gets really strange. Those numbers, those normal numbers that contain everything? Most of them can’t even be described.

It gets stranger than that. We know that we can’t really write down an irrational number. We can write down successively more and more precise approximations, but any finite representation won’t work. So we can’t actually write down the overwhelming majority of real numbers. But in fact, not only can’t we write them down, we can’t describe the vast majority of numbers.

Numbers are something that we deal with every day, and we think we understand them. Over the ages, people have invented a variety of notations that allow us to write those numbers down: the familiar arabic notation, roman numerals, fractions, decimals, continued fractions, algebraic series, etc. I could easily spend months on this blog just writing about different notations that we use to write numbers, and the benefits and weaknesses of each notation.

But the fact is, when it comes to the vast, overwhelming majority of numbers, all of those notations are utterly useless! That statement seems bizarre at best. As strange as it it seems, though it’s true. In order to understand it, though, we have to start at the very beginning: What does it mean for a number to be describable?

Continue reading You can’t even describe most numbers!

Infinite and Non-Repeating Does Not Mean Unstructured

So, I got in to work this morning, and saw a tweet with the following image:


Pi is an infinite, non-repeating decimal – meaning that every possible number combination exists somewhere in pi. Converted into ASCII text, somewhere in that string of digits is the name of every person you will ever love, the date, time, and manner of your death, and the answers to all the great questions of the universe. Converted into a bitmap, somewhere in that infinite string of digits is a pixel-perfect representation of the first thing you saw on this earth, the last thing you will see before your life leaves you, and all the moments, momentous and mundane, that will occur between those points.

All information that has ever existed or will ever exist, the DNA of every being in the universe.

EVERYTHING: all contaned in the ratio of a circumference and a diameter.

Things like this, that abuse misunderstandings of math in the service of pseudo-mystical rubbish, annoy the crap out of me.

Before I go into detail, let me start with one simple fact: No one knows whether or not π contains every possible finite-length sequence of digits. There is no proof that it does, and there is no proof that it does not. We don’t know. At the moment, no one does. If someone tells you for certain that it does, they’re bullshitting. If someone tell you for certain that it doesn’t,
they’re also bullshitting.

But that’s not really important. What bothers me about this is that it abuses a common misunderstanding of infinity. π is an irrational number. So is e. So are the square roots of most integers. In fact, so are most integral roots of most integers – cube roots, fourth roots, fifth roots, etc. All of these numbers are irrational.

What it means to be irrational is simple, and it can be stated in two different ways:

  1. An irrational number is a number that cannot be written as a ratio (fraction) of two finite integers.
  2. An irrational number is a number whose precise representation in decimal notation is an infinitely long non-repeating sequence of digits.

There are many infinitely long sequences of digits. Some will eventually include every finite sequence of digits; some will not.

For a simple example of a sequence that will, eventually, contain every possible sequence of digits: 0.010203040506070809010011012013014015016…. That is, take the sequence of natural numbers, and write them down after the decimal point with a 0 between them. This will, eventually, contain every possible natural number as a substring – and every finite string of digits is the representation of a natural number.

For a simple example of a sequence that will not contain every possible sequence of digits, consider 0.01011011101111011111… That is, the sequence of natural numbers written in unary form, separated by 0s. This will never include the number combination “2”. It will never contain the number sequence “4”. It will never even contain the digit sequence for four written in binary, because it will never contain a “1” followed by two “0”s. But it never repeats itself. It goes on and on forever, but it never starts repeating – it keeps adding new combinations that never existed before, in the form of longer and longer sequences of “1”s.

Infinite and non-repeating doesn’t mean without pattern, nor does it mean without structure. All that it means is non-repeating. Both of the infinite sequences I described above are infinitely long and non-repeating, but both are also highly structured and predictable. One of those has the property that the original quote talked about; one doesn’t.

That’s the worst thing about the original quotation: it’s taking a common misunderstanding of infinity, and turning it into an implication that’s incorrect. The fact that something is infinitely long and non-repeating isn’t special: most numbers are infinitely long and non-repeating. It doesn’t imply that the number contains all information, because that’s not true. </p.

Hell, it isn’t even close to true. Here’s a simple piece of information that isn’t contained anywhere in π: the decimal representation of e. e is, like π, represented in decimal form as an infinitely long sequence of non-repeating digits. e and π are, in fact, deeply related, via Euler’s equation: e^{i\pi} + 1 = 0. But the digits of e never occur in π, because they can’t: in decimal form, they’re both different infinitely long sequences of digits, so one cannot be contained in the other.

Numbers like π and e are important, and absolutely fascinating. If you take the time to actually study them and understand them, they’re amazing. I’ve writted about both of them: π here and e here. People have spent their entire lives studying them and their properties, and they’re both interesting and important enough to deserve that degree of attention. We don’t need to make up unproven nonsense to make them interesting. We especially don’t need to make up nonsense that teaches people incorrect “fact” about how infinity works.

Parser Combinators Part 2: This time with types!

If you didn’t read my first post on parser combinators, I highly recommend going back and reading it. This post assumes both that you understand parser combinators, and that you’re already familiar with the way I implemented parser combinators in Python.

All of the code discussed in this post is available in a Github repos. It’s still not anything that I’d seriously use for anything important, but it’s good enough to explain stuff.

Continue reading Parser Combinators Part 2: This time with types!

Net Neutrality

A reader sent me a question about the Net Neutrality stuff that’s been in the news lately:

I keep hearing about net neutrality and the FCC’s new rules. What is net neutrality? Is it a good thing or a bad thing? Should I be worried about it?

I’d actually been meaning to write something about it, but hadn’t gotten around to it, so this seems like a good time!

Before I can really explain net neutrality, we need to go back to some basics, so that we understand how the internet works. A few years back, a not-too-bright congresscritter described the internet as a being like a series of tubes. Lots of people made fun of him for it. But honestly, it’s not a bad description! The network is something like a huge collection of tubes. Each connection has a certain amount of capacity, sort of like the diameter of a tube. What you pay for when you get connected to the internet is a connection with a certain amount of capacity, called bandwidth.

Net neutrality is a property of a particular kind of network. Any network is built out of a massively tangled web of wires and other kinds of connections. In the old days of telephones, telephones worked on something called circuit switching. Simplifying massively, in a circuit switched network, when you called your grandmother on the phone, the telephone network flipped a series of switches creating a circuit between your phone and your grandmother’s phone. The network would find some sequence of wires in the telephone network that ran from your phone to the local service center, from there through a sequence of switches leading to the local service center near your grandmother, and from there to your grandmothers phone. That series of wires was assigned to your connection, and the switches were set to turn that into an electrical circuit connecting your phone to your grandmother’s phone.

att-windowless Circuit switching is remarkably inefficient. The wires carrying your conversation can carry a huge quantity of information, but the voice signal that it actually carried consumed a very small portion of it. It also required massive buildings full of nothing but switches! (For example, there’s this windowless skyscraper in Manhattan that used to be the ATT switching station for NYC. That huge building used to contain nothing but thousands upon thousands of electronic switches!)

The internet takes a different approach, called packet switching. Instead of dedicating wires to corrections, all communication on the internet is broken into little chunks, called packets. Each packet contains routing information that says where it’s coming from, and where it’s going. Your computer is really physically connected to a network called a LAN (local area network) that’s got a small number of computers on it. But one of the computers that’s watching on that wire is a special one called a router. The router is connected to both your LAN, and to some other network(s). It takes any packet that’s put on the network that’s supposed to get delivered to a computer that’s not on the LAN, copies it, and sends it to a network that’s closer to the computer it’s supposed to get sent to. That keeps happening – from network to network to network, until eventually it gets to its destination. The end result is a lot like what used to happen on the telephone network – some sequence of wires gets found which forms a path from your computer to the computer it wants to talk to. But instead of physically reconfiguring the network by flipping mechanical switches, some sequence of routers copies your packets, one at a time, from network to network to network until it gets to where it’s supposed to be going.

The key to the whole shebang is that copying step. When a network router gets a packet, and it’s capable of getting that packet closer to its destination, it’s supposed to copy the packet and send it on its way.

Network neutrality is a very simple idea. It says that your router has to treat all packets equally. It’s not allowed to look at a packet, and say, “Oh, that packet is from Joe, and I don’t like him, so I’m not going to send his packet”, or even “I don’t like Joe, so I’ll send his packet down a slower connection”.

It’s important to understand that in this system, there are no free riders. Joe paid his internet provider for a certain amount of bandwidth. In order to give Joe what he paid for, his internet provider paid their upstream for a certain amount of bandwidth. And so on. The bandwidth is paid for.

Suppose, for example, that I want to watch “Arrested Development” on Netflix (they need some competition so check out: ShowBox App Download). To do that, I pay a monthly fee to my local cable company for a high bandwidth internet connection. I’m paying to be able to send and receive a certain amount of data per month, and I’m choosing to use a part of that bandwidth to watch a stupid comedy show.

So my computer sends a message to netflix, saying “Please send me this TV show.” I’m using the bandwidth that I paid for – both the upward bandwidth to send the request, and the downward bandwidth to receive the packets from the video stream – to send and receive packets from my local cable companies network. As part of my agreement with the cable company, they’re not just supposed to provide me with access to computers on their network – they’re supposed to route my packets out onto the broader internet, and to provide me with a certain amount of bandwidth to do that.

My request goes from my computer to the cable company’s computer. From there, it probably gets sent onto a backbone connection – one of the very high bandwidth connections that, as the name suggests, form the backbone of the internet. My cable company needs to pay for that connection. They pay for a certain amount of bandwidth – both up and down, and the service that I paid for is really a slice of that.

From the backbone, the request goes to an Amazon datacenter. (I think Netflix uses Amazon’s EC2 servers.) Amazon paid for a connection to the backbone, with a certain amount of bandwidth. Part of that is downward – to receive messages like mine request to watch a TV show, and part is upward, for them to send me the packets in the video stream that make up the show.

Finally, Netflix pays Amazon for a collection of computers in their datacenter, and for the bandwidth that they use. Netflix receieves the request, finds the video in their storage, and starts to send me the packets of the livestream. To do that, they use the upward bandwidth that they paid for from Amazon, which uses the upward bandwidth that they paid for to send it to the backbone. From the backbone, it gets to the connection to my local cable company, where it uses the downward bandwidth that my cable company paid for, and then from there to my computer, using the downward bandwidth that I paid for.

At no step of this chain is anyone free-riding. Every bit of bandwidth is paid for.

Net Neutrality essentially says three things:

  1. Bandwidth gets paid for once. If I pay my cable company for 1 gigabyte of downward bandwidth, then that bandwidth is paid for: they can’t go to netflix and said “MarkCC paid for bandwidth, but if you want him to be able to download your video, you also need to pay us for that bandwidth”;
  2. When I pay for an internet connection, I pay for bandwidth – that is, for total quantity. I don’t pay for the ability to watch just says that when I pay for an internet connection, I pay for bandwidth. I don’t pay separately for bandwidth from netflix and bandwidth from twitter.
  3. Routers need to treat all traffic equally. They don’t get to say “Oh, yeah, MarkCC paid for his bandwidth, but since he’s downloading from netflix, and they didn’t pay us extra to deliver it to him, we’ll slow it down.”.

Should you be worried about it? Hell yes! Neutrality is the entire reason that the internet, as we know it, exists. Take it away, and it becomes close to impossible for new businesses to set up shop and compete with existing ones. Netflix has lots of money; if Comcast decides to demand that they pay extra for their customers to be able to watch movies, they’ll be able to afford it. But a new company that wants to set up shop and start competing with Netflix won’t have the resources to do it.

But ignore all of that. The real thing to understand about the opposition to net neutrality is that it’s all about double billing. Internet service providers are already highly profitable businesses, and they deserve to be. They’re providing a valuable service that requires a substantial amount of expensive infrastructure. They built that infrastructure, and they’re profiting from it. That’s all fine. But what they’re trying to do now is say that when you pay for a service, they’re not obligated to actually give you that service unless someone else also pays for it.

You’re already paying for your network connection and your bandwidth: what the people opposing neutrality are arguing is that that’s not enough. They want to be able to double-charge for traffic. You pay for your download bandwidth. The services that you use, like Netflix, pay (a lot!) for their upload bandwidth. And now, your internet service provider wants to be able to charge the services again for the download bandwidth that you already paid for.

That’s what it’s all about. Net neutrality is really just saying that you buy bandwidth from your ISP, and they have to provide that service. When they try to claim that they should be able to separately bill a company like Netflix for their traffic, what they’re saying is that they want to hold the service that you already paid for hostage until they get paid for it again.

To close, let’s jump back to the old telephone metaphor. In the old days of telephone service, you used to pay a basic fee every month for your connection to the telephone network. That didn’t cover calls outside of your local phone companies exchanges – those were extra. If you wanted to call your grandmother, you had to pay for the use of the switches and wires that would be used to connect you to her. You paid for your connection to the network. Your grandmother paid for her connection to the network. And you paid for the use of the lines to connect your network to her network for the call.

Under the arguments of the anti-net neutrality people, the telephone company should have been allowed to say “Ok, sure, you paid for the network to connect to and talk to your grandma. But we’re not going to let you talk to her unless she also pays for the connection that you already paid for.”

Depression and Arrogant Assholes

This is somewhat off-topic for the blog, but pretty damned on-topic for my life.

I’ve talked about this before, but it’s the kind of thing that just keeps coming up.

I’ve got a bunch of different medical conditions.

I’m a post-surgical GERD sufferer – for all intents and purposes, I was born without a sphincter at the top of my stomach. That means that acid could easily get out of my stomach, into my esophagus, my vocal chords, and my lungs. Without surgery, I would probably be dead of esophageal cancer as a result. Since the surgery, I’ve had lingering problems with my swallowing reflex being fouled up, being unable to burp or vomit, and suffering from pretty severe chest pain from muscle spasms. I take a variety of medication as a result, to keep it all under control. For over ten years, I took benzodiazapenes every day as part of the treatment regimen for that: benzos are an addictive drug.

I’ve also got some of the worst allergies of anyone I know. When I first got tested for allergies, based on my symptoms, the allergist selected a set of 45 possible allergens to test me with. Most people with bad allergies would show a significant reaction to 15 or so. I came up positive for 42. I needed to get allergy shots for 25 years, and I continue to take antihistamines and inhaled steroids on a daily basis to treat it.

I also have clinical depression, and very severe social anxiety. I take an SSRI every day for the depression, but I’ve yet to find anything that works for the social anxiety.

No one ever gives me any grief about my stomach troubles. I mean, what could I do? The muscle at the top of my stomach didn’t work. There was a huge amount of acid getting into my throat and even my lungs every day! Of course I had to do something about it!

And allergies? Man, that sucks. Everyone feels bad when they see me sniffling, or when I have to pull out an inhaler because I can’t breathe. But you know, luck of the draw, some people get stuck with allergies. No fun, but it’s manageable with medication, right?

But depression? Whoa, baby. Whole different story. Instead of the benign sympathy or indifference that I get from people who hear about my other troubles, I get shit like this:

To be clear, that wasn’t directed at me in particular. No, it was directed at everyone who’s suffering with depression. See, depression isn’t a real illness. People who are living with depression aren’t suffering from a real illness. No, we’re worshipping our depression. All we need to do is stop being such pathetic assholes, and get up and “make strides to improve our life”.

Before I started taking my SSRIs, I didn’t worship my depression. I didn’t even realize that I was depressed. I just felt like the world had gone flat. It’s hard to describe it better than that. I didn’t really feel much of anything. I wasn’t sad. I wasn’t happy. I wasn’t anything. The world was all grays, no colors. Good things happened, and I couldn’t feel good. Bad things happened, and I couldn’t feel bad. My wife was pregnant with our second child, and I wasn’t happy, I wasn’t nervous, I wasn’t anything. I was just flat.

When I finally realized that there was something wrong with me, I got a referral from my doctor, and I saw a psychiatrist, who prescribed medication for me. About two weeks later, I realized that stuff was different, because I was noticing colors. I never stopped seeing them, but they stopped registering – they didn’t mean anything. I literally felt as if I weighed less – like someone had laid an invisible lead blanket on me, and now it had been removed. Something bad happened at work – the project that I’d been working on for the previous two years had not gotten its funding – and I was upset about it!

It was an amazing thing, a total change of the world. The pills didn’t make me happy. With the stuff at work, and the stress of a new child coming, I was probably more unhappy than I was before I started taking them. But the unhappiness was real, it was mine, and I felt it.

After about a year of taking the pills, on the advice of my doctor, I tried stopping it. For people with symptoms like mine, he said that about 40% would relapse pretty quickly, but 60% would be fine without any medication. After about three months without it, the world went back to that flat drab nothingness – I was part of that 40%. So I restarted the medication, and I haven’t stopped since.

I won’t even get into the social anxiety stuff here. That’s even worse that the depression. Depression is commonly viewed more as a personal weakness than an illness; but SA isn’t even seen as that: it’s just a joke.

Contrary to what the assholes out there say, depression isn’t a personal weakness. It’s not the pathetic obsession of weak-souled losers who just need to get off their couch and stop being such a schlub. It’s a real illness. It’s difficult, and it’s painful.

My depression is just as real as my GERD was, or as my allergies and asthma are. It needed to be treated just as seriously as they did. Getting stomach surgery probably saved my life. Getting treated for depression probably did too.

The assholes who try to portray depression as weakness, who mock people for suffering from depression, who make cracks about being nice to our moms and getting off of our butts: they’re not helping. But really, they don’t want to help. They want to feel smug and superior. But they’re doing worse than just not helping. They’re actively making things worse. By reinforcing the stigma against mental illness, they’re making it less likely that people who desperately need help will be able to get it.

So, as I responded to the tweet I quoted above:

Combinator Parsing, part 1

My professional work is focused on making the lives of engineers building software better. I got into that area because I was fascinated – or even obsessed – with programming languages. That led me to the more general area of engineering tools of all types. But programming languages and how they’re implemented remain one of my great fascinations.

If you’re trying to implement a new programming language (or re-implement an old one), the first thing you need to do is called parsing. Parsing consists of reading in a file containing a bunch of text, making sure that it’s correctly structured, and then using its structure to translate it into some kind of data structure.

For example, I implemented a silly programming language called Cruise. In Cruise, a snippet from a program looks like:

actor !Adder {
  behavior :Add() {
    on Plus(Z(),$x, $target) { send $x to $target }
    on Plus(I($x), $y, $target) { send Plus($x,I($y), $target) to $self }
  initial :Add

In the Cruise implementation, the parser reads that input, and translates it into a structure called an abstract syntax tree. For the code in the example above, a part of the syntax tree for that code looks something like the following:

      action=[Send[Var["x"], Var["Target"]],
        Tuple[name="I", Var["x"]],
            Tuple[name="I", Var["y"]]],

When you want to write a parser, you need to start by describing
what you want to parse. Most of the time, we do that using
a context free grammar (CFG). A CFG is a set of rules that
describe how to produce texts with a particular structure. For
example, suppose you wanted to implement a simple calculator (one
of the canonical examples used when teaching parsing!). You’d
need to describe what the expressions you want to parse
look like. A pretty typical version of this would be:

expr := add_expr
add_expr := mult_expr ( ( '+' | '-' ) mult_expr)*
mult_expr := unary_expr ( ('*' | '/') unary_expr)*
unary_expr := '-' simple | simple
simple := number | '(' expr ')'
number: digit+

This grammar is a way of expressing a system for producing valid expressions. The easiest way to see how it works is to look at an example. We know that something like “3 + 4 * (-5 + 6)” is an expression. So how would be produce that using the grammar?

  1. Start with expr. That always expands to add_expr.
  2. Expand add_expr to mult_expr ‘+’ mult_expr.
  3. Expand the first instance of mult_expr to unary_expr, giving us “unary_expr+mult_expr“.
  4. Expand the first unary expr to simple: “simple+mult_expr“.
  5. Expand simple to number: “number+mult_expr“.
  6. Expand number to the number 3: “3+mult_expr“.
  7. Expand the mult_expr to “unary_expr*unary_expr“: “3+unary_expr*unary_expr“.
  8. Expand the first unary_expr to simple, and simple to number, and number to “4”: “3+4*unary_expr“.
  9. Expand the remaining unary_expr to simple, and simple to a parenthesized expression: “3+4*(expr)”.
  10. And so on, until we end up with “3+4*(-5+6)”.

When I started learning to write compilers, there were two ways that people wrote parsers. They either implemented them manually, using a technique called recursive descent, or they used a table-driven parser generator like yacc, where you wrote the grammar using a special syntax, and then a tool translated it into the implementation of a parser.

Writing a parser using recursive descent was a huge pain. It’s very tedious and repetitive, and like any meticulous process, it’s very error-prone. In short, it’s no damn fun at all.

Writing a parser using a generator was usually much easier – but if something didn’t work, it would be a huge pain. The problem is that the code you write doesn’t end up looking very much like the code you run. For example, to write our expression language, in a typical parser generator, you’d write it something like the follows. (The grammar gets re-arranged a bit, to fit the particular underlying parser algorithm that the generator uses).

%start expr
expr: add_expr;
    add_expr '+' mult_expr
  | add_expr '-' mult_expr
  | mult_expr

    mult_expr '*' unary_expr
  | mult_expr '/' unary_expr
  | unary_expr

    '-' simple
  | simple

  | '(' expr ')'

The parser generator reads the grammar that we wrote, and translates it into a set of lookup tables, which get interpreted by semi-generic parser code provided by the generator toolkit. What that means is that when your parser actually executes, there’s nothing called unary_expr. There’s some set of states encoded into magic variables with names like yytable and yytrans. If you got something wrong in your parser, debugging it is painful at best, because the grammar is gone: it’s been translated into something completely incomprehensible to human eyes.

Not all parser generators generate code that’s this extreme when it comes to readability. But it’s the output from the most widely used one, called YACC, which has been translated and used by C/C++, Caml, Go, Fortran, Pascal, D, SML, and more.

But even if you’re not using YACC, it’s not a good scene. Even ANTLR, my favorite parser generator, which makes a heroic effort to generate efficient but human-readable parsers, the fact remains that the code that you’re running and trying to debug isn’t familiar to you: it’s not what you wrote. You’re still stuck trying to work through an unfamiliar translation.

Honestly, both approaches – the manual parser, and the parser generator – suck an awful lot of the time.

Then, a few years ago, a gang of functional programmers came up with a truly brilliant scheme. You can describe a parser as a special kind of function from inputs to outputs. If you do that, you can also combine parsers using higher-order functions. Using those functions, all you need to do is write tiny parsers for the simplest elements of your language, and then you can build the parser for your language by just combining your simple parsers in different ways. When you implement a parser this way, what you get is very close to the original grammar of the language, but it’s an actual executable program: the code that you need to read and debug is exactly the code that you wrote.

For example, suppose I wanted to parse that expression language. Using a Python version of parser combinators, I can write:

digit = Parser.match(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
number = digit.many(1)
parens = Parser.match(['(']) & Reference('expr') & Parser.match([')'])
simple = number | parens
unary_expr = Parser.match(['-']).opt() & simple
mult_expr = unary_expr &  (Parser.match(['*', '/']) & unary_expr).many()
add_expr = mult_expr & (Parser.match(['-', '+']) & mult_expr).many()
expr = add_expr
Reference.register_named_parser('expr', add_expr)

That’s not a contrived example. That’s actually a real, honest-to-god parser for our expression language! We’ll look at it in more detail later. But first, we need to understand what parsers and parser combinators really are, and how they work.

At this point, it’s time to start getting more concrete. Just what is a parser, precisely?

In keeping with the fact that this approach came from the functional programming community, it makes sense to describe a parser as a function. If you think of it this way, a parser is a special kind of function:

  • It takes a sequence of inputs;
  • It can either succeed or fail;
  • If it succeeds, it consumes part of its input, and produces an output.

That means that we can come up with a sort-of type signature pretty easily. It’s a function from an input to a result value. The result value can be two types: either a success, or a failure. If it’s a success, it contains both the output value produced by the parser and the unconsumed part of the input. If it fails, the failure contains nothing.

In Python, that’s easy to represent in several different ways. We’ll do it in the one that’s closest to a typed approach: an abstract class ParseResult, with subclasses for success or failure.

class ParseResult(object):
  def succeeded(self):
    return False

class Success(ParseResult):
  """A successful parse result, containing the value produced by a parser,
  and the unconsumed part of its input.
  def __init__(self, output, rest):
    self.output = output = rest

  def succeeded(self):
    return True

class Failure(ParseResult):
  def __init__(self):

The inputs are just a sequence of values. The parsers can look at one element of the input at a time. In Python, we’ll represent them using a class.

class ParserInput(object):
  def at_end(self):
    """Return True if there's no remaining input."""

  def first(self):
     """Return the first element of this input."""

  def rest(self):
     """Return a ParserInput consisting of the remainder after the first
     element has been consumed.

With those definitions, a parser becomes an object with one method, which takes a ParserInput, and produces a ParseResult.

class Parser(object):
  def parse(self, inp):
    """inp should be an instance of ParserInput; returns an instance of ParseResult"""

Before we get to the combinators, let’s look at an example of a simple parser. Suppose we want to accept a digit. We can implement that as:

class DigitParser(Parser):
  def parse(self, inp):
    if inp.first().isdigit():
      return Success(inp.first(),
      return Failure()

This looks at the first character of the input stream. This is mostly functional, so it does this without altering the input stream. If the first character is a digit, then it succeeds, returning the digit character and the remaining input; otherwise, it fails.

Simple code like that little snippet are all you need to write yourself. It’s so incredibly simple that looking at it, it seems like it’s almost too simple. How on earth can anything this trivially simple actually be the basis of something as powerful as the infamous YACC?

The answer is combinators: functions that combine parsers togethers. This simple basic, functional mechanism makes it possible to combine simple parsers using simple functions – producing a limitless variety of parsers, which can parse pretty much anything you choose to throw at them. So what kinds of combinators do we need? There are four basics that I used in my implementations, and one that’s an artifact of the implementation.

  1. Sequence: Parse an “A” followed by a “B”. In my Python implementation, this is written A & B. It succeeds only if both “A” and “B” succeed in sequence. The output of this is a list containing the outputs from “A”, “B”, and whatever else is in the sequence with them.
  2. Choice: Parse either an “A” or if that fails, parse a “B”. That’s written A | B. The output is whichever one succeeds, or else fails.
  3. Option: Parse either an “A”, or nothing, written a.opt(). The output is either the output from parsing an “A”, or None.
  4. Repetition: Parse more than one “A”, written a.many(min). The result is a list of the outputs of the invocations of “A”.
  5. Reference: I’m going to tell you what kind of thing to parse here later. For now, it’s just named “A”. The output is, of course, whatever the referenced parser outputs.

The last one definitely needs some extra explanation. Most interesting grammars are, ultimately, recursive. In our calculator’s expression grammar, the rule expr contains an add_expr, which contains a mult_expr, which contains a unary_expr, which contains a simple, which could expand to something that contains an expr. That leads to a problem constructing our parsers: we can’t actually build an instance of an expr, because it needs to contain itself. The Reference combinator is the solution to this: it lets you say Call the parser named “A”; I’ll tell you what parser that is later. If you look at the parser implementation I showed earlier, you can see this:

parens = Parser.match(['(']) & Reference('expr') & Parser.match([')'])
Reference.register_named_parser('expr', add_expr)

In the definition of the parens rule, we used Reference('expr'), which says “call whatever parser is registered with the name ‘expr'”. Then later on, we registered the add_expr parser with the name ‘expr’. Voila! Recursion.

There’s one more thing that I used in the calculator example: Parser.match(). That’s just a helpful little piece of code that I wrote to create parsers that look for one character from a fixed set. It’s implemented with a class named SetParser. Parser.match(['(']) will only match a “(” in the input; Parser.match(['a', 'b', 'c']) will match either an “a”, a “b”, or a “c”. Implementing it is as easy as you’d guess at this point:

class SetParser(Parser):
  def __init__(self, chars):
    self.chars = chars

  def pr(self):
    return "SetParser%s" % self.chars

  def parse(self, inp):
    if inp.first() in self.chars:
      return Success(inp.first(),
      return Failure()

With these, you should now be able to completely read and understand the implementation of our expression parser. Let’s walk through it.

digit = Parser.match(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']) 
number = digit.many(1)
parens = Parser.match(['(']) & Reference('expr') & Parser.match([')'])
simple = number | parens
unary_expr = Parser.match(['-']).opt() & simple
mult_expr = unary_expr &  (Parser.match(['*', '/']) & unary_expr).many()
add_expr = mult_expr & (Parser.match(['-', '+']) & mult_expr).many()
expr = add_expr
Reference.register_named_parser('expr', add_expr)
  • digit: this is a SetParser that reads a single digit.
  • number: a number is just a sequence of at least one digit.
  • parens: this is a parenthesized expression: an open paren followed by an expression, followed by a close paren.
  • simple: a simple expression is either a number, or a parenthesized expression.
  • unary: a unary expression is an optional “-“, followed by a simple.
  • mult_expr: a multiplication expression is a unary, followed by a collection of multiplication operators and operands.
  • add_expr: the same thing as a mult_expr, except that it uses add operations. By defining them seperately, with multiplication embedded inside addition, we gave multiplication a higher precedence.
  • expr is just an alias for add_expr.
  • Reference: finally, we bind the reference from the parenthesized expression.

Now that we have a parser, we can run it!

>>> inp = StringParserInput("1+2*(3+5*4)*(6+7)")
>>> print(expr.parse(inp).output)
[None, ['1'], [], [['+', [None, ['2'], [['*', [None, ['(', [None, ['3'], [], [['+', [None, ['5'],
 [['*', [None, ['4']]]]]]]], ')']]], ['*', [None, ['(', [None, ['6'], [], [['+', [None, ['7'],
  []]]]], ')']]]]]]]]

It works, but wow, that output is horrible! It’s basically a tree, formed from lists, that describes the parse process. It’s got all of the information we’d need to evaluate the expressions parsed by our grammar, only it’s in a terrible form.

That brings us to one last thing to add: Actions. In terms of parser combinators, an action is a parser that takes another parser, and a function. The output of the action is the result of applying the function to the embedded parser:

class Action(Parser):
  def __init__(self, parser, act):
    self.parser = parser
    self.action = act

  def pr(self):
    return "Action[%s]" %

  def parse(self, inp):
    result = self.parser.parse(inp)
    if result.succeeded():
      return Success(self.action(result.output),
      return Failure()

Using that, we can turn our expression parser into a calculator!

def digits_to_number(digits, running=0):
  """Convert a list of digits to an integer"""
  if len(digits) == 0:
    return running
    r = (running * 10) + int(digits[0])
    return digits_to_number(digits[1:], r)

digit = Parser.match(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
number = Action(digit.many(1), digits_to_number)

In this version, number now actually outputs an integer, which is the number represented by the list of digits produced by the parser.

We can attach similar actions to all of the other rules. For example, the new version of our addition expression is:

def eval_add(lst):
  """Evaluate an addition expression. For addition rules, the parser will return
  [number, [[op, number], [op, number], ...]]
  To evaluate that, we start with the first element of the list as result value,
  and then we iterate over the pairs that make up the rest of the list, adding
  or subtracting depending on the operator.
  first = lst[0]
  result = first
  for n in lst[1]:
    if n[0] == '+':
      result += n[1]
      result -= n[1]
  return result

add_expr = Action(mult_expr & (Parser.match(['-', '+']) & mult_expr).many(), eval_add)

With similar actions attached to the others, now, it works. (The code for the other actions is in the github repository, linked below):

>>> inp = StringParserInput("1+2*(3+5*4)*(6+7)")
>>> print(expr.parse(inp).output)

That’s about it for doing this in Python. The code is all here on Github, in the “python” subdirectory. The rest of the github project is the Java version. It’s more complicated to do all of this in Java, because of the type system. We’ll take a look at that in the next post. The syntax is more verbose, and the types make things a bit more complicated, but it’s still amazingly easy to build a useful parser this way.

This implementation isn’t quite what I’d call industrial strength:

  1. It’s not super-efficient; particularly when it’s parsing a multiple-precedence expression, it can do a whole lot of backtracking, which makes things very slow. To fix this, the best thing would be to add something called operator precedence parsing for handling infix expressions. Parsec, the Haskell library that introduced my to combinator parsing, does this, and it workes beautifully. It would be nice to add that, but for my purposes (ie, writing this blog post, plus a bit of language parsing for my Apex project), it’s not necessary.
  2. It has terrible error handling. In fact, it doesn’t have error handling at all. If there’s a syntax error in your input, all that will happen is that the parser will fail. To fix this, I’ll need to add a third kind of return value to the basic parser function: a parser should be able to succeed, fail without error, and fail with an error. This is something I really need to add if I ever want to use this code for anything even semi-real.
  3. It doesn’t track positions. In other words, when I get it to the point where it can actually produce a syntax error, it can’t say where the error is. I’d really like to be able to say “There’s an error at line 3 column 7”, but right now, there’s no way to do that. This one is relatively easy to add: the parser input just needs to be able to report a line and column number. I’m definitely planning on adding that.