# The Heartbleed Bug

There’s a lot of panic going around on the internet today, about something called the Heartbleed bug. I’ve gotten questions, so I’m giving answers.

### I’ve heard lots of hype. Is this really a big deal?

Damn right it is!

It’s pretty hard to wrap your head around just how bad this actually is. It’s probably even more of a big deal that the hype has made it out to be!

This bug affects around 90% of all sites on the internet that use secure connections. Seriously: if you’re using the internet, you’re affected by this. It doesn’t matter how reputable or secure the sites you connect to have been in the past: the majority of them are probably vulnerable to this, and some number of them have, in all likelihood, been compromised! Pretty much any website running on linux or netbst, using Apache or NGINX as its webserver is vulnerable. That means just about every major site on the net.

The problem is a bug in a commonly used version of SSL/TLS. So, before I explain what the bug is, I’ll run through a quick background.

### What is SSL/TLS?

When you’re using the internet in the simplest mode, you’re using a simple set of communication protocols called TCP/IP. In basic TCP/IP protocols, when you connect to another computer on the network, the data that gets sent back and forth is not encrypted or obscured – it’s just sent in the clear. That means that it’s easy for anyone who’s on the same network cable as you to look at your connection, and see the data.

For lots of things, that’s fine. For example, if you want to read this blog, there’s nothing confidential about it. Everyone who reads the blog sees the same content. No one is going to see anything private.

But for a lot of other things, that’s not true. You probably don’t want someone to be able to see your email. You definitely don’t want anyone else to be able to see the credit card number you use to order things from Amazon!

To protect communications, there’s another protocol called SSL, the Secure Sockets Layer. When you connect to another site that’s got a URL starting with `https:`, the two computers establish an encrypted connection. Once an SSL connection is established between two computers, all communication between them is encrypted.

Actually, on most modern systems, you’re not really using SSL. You’re using a successor to the original SSL protocol called TLS, which stands for transport layer security. Pretty much everyone is now using TLS, but many people still just say SSL, and in fact the most commonly used implementation of it is in package called OpenSSL.

So SSL/TLS is the basic protocol that we use on the internet for secure communications. If you use SSL/TLS correctly, then the information that you send and receive can only be accessed by you and the computer that you’re talking to.

Note the qualifier: if you use SSL correctly!

SSL is built on public key cryptography. What that means is that a website identifies itself using a pair of keys. There’s one key, called a public key, that it gives away to everyone; and there’s a second key, called a private key, that it keeps secret. Anything that you encrypt with the public key can only be decrypted using the private key; anything encrypted with the private key can only be decrypted using the public key. That means that if you get a message that can be decrypted using the sites public key, you know that no one except the site could have encrypted it! And if you use the public key to encrypt something, you know that no one except that site will be able to decrypt it.

Public key cryptography is an absolutely brilliant idea. But it relies on the fact that the private key is absolutely private! If anyone else can get a copy of the private key, then all bets are off: you can no longer rely on anything about that key. You couldn’t be sure that messages came from the right source; and you couldn’t be sure that your messages could only be read by an authorized person.

### So what’s the bug?

The SSL protocol includes something called a heartbeat. It’s a periodic exchange between the two sides of a connection, to let them know that the other side is still alive and listening on the connection.

One of the options in the heartbeat is an echo request, which is illustrated below. Computer A wants to know if B is listening. So A sends a message to B saying “Here’s X bytes of data. Send them back to me.” Then A waits. If it gets a message back from B containing the same X bytes of data, it knows B was listening. That’s all there is to it: the heartbeat is just a simple way to check that the other side is actually listening to what you say.

The bug is really, really simple. The attacker sends a heartbeat message saying “I’m gonna send you a chunk of data containing 64000 bytes”, but then the data only contains one byte.

If the code worked correctly, it would say “Oops, invalid request: you said you were going to send me 64000 bytes of data, but you only sent me one!” But what the buggy version of SSL does is send you that 1 byte, plus 63,999 bytes of whatever happens to be in memory next to wherever it saved the byte..

You can’t choose what data you’re going to get in response. It’ll just be a bunch of whatever happened to be in memory. But you can do it repeatedly, and get lots of different memory chunks. If you know how the SSL implementation works, you can scan those memory chunks, and look for particular data structures – like private keys, or cookies. Given a lot of time, and the ability to connect multiple times and send multiple heartbeat requests each time you connect, you can gather a lot of data. Most of it will be crap, but some of it will be the valuable stuff.

To make matters worse, the heartbeat is treated as something very low-level which happens very frequently, and which doesn’t transfer meaningful data. So the implementation doesn’t log heartbeats at all. So there’s no way of even identifying which connections to a server have been exploiting this. So a site that’s running one of the buggy versions of OpenSSL has no way of knowing whether or not they’ve been the target of this attack!

See what I mean about it being a big deal?

### Why is it so widespread?

When I’ve written about security in the past, one of the things that I’ve said repeatedly is: if you’re thinking about writing your own implementation of a security protocol, STOP. Don’t do it! There are a thousand ways that you can make a tiny, trivial mistake which completely compromises the security of your code. It’s not a matter of whether you’re smart or not; it’s just a simple statement of fact. If you try to do it, you will screw something up. There are just so many subtle ways to get things wrong: it takes a whole team of experts to even have a chance to get it right.

Most engineers who build stuff for the internet understand that, and don’t write their own cryptosystems or cryptographic protocols. What they do is use a common, well-known, public system. They know the system was implemented by experts; they know that there are a lot of responsible, smart people who are doing their best to find and fix any problems that crop up.

Imagine that you’re an engineer picking an implementation of SSL. You know that you want as many people trying to find problems as possible. So which one will you choose? The one that most people are using! Because that’s the one that has the most people working to make sure it doesn’t have any problems, or to fix any problems that get found as quickly as possible.

The most widely used version of SSL is an open-source software package called OpenSSL. And that’s exactly where the most bug is: in OpenSSL.

### How can it be fixed?

Normally, something like this would be bad. But you’d be able to just update the implementation to a new version without the bug, and that would be the end of the problem. But this case is pretty much the worst possible case: fixing the implementation doesn’t entirely fix the problem. Because even after you’ve fixed the SSL implementation, if someone got hold of your private key, they still have it. And there’s no way to know if anyone stole your key!

To fix this, then, you need to do much more than just update the SSL implementation. You need to cancel all of your keys, and replace them new ones, and you need to get everyone who has a copy of your public key to throw it away and stop using it.

So basically, at the moment, every keypair for nearly every major website in the world that was valid yesterday can no longer be trusted.

Over on twitter, some folks were chatting about the latest big security botch. A major service, called Evernote, had a security breach where a password file was stolen. Evernote has handled the situation quite well, being open about what happened, and explaining the risks.

In their description of the breach, they said that the stolen passwords were “both hashed and salted”. Apparently this sounds funny to people outside of software. (Amazing how jargon becomes so ingrained that I didn’t even notice the fact that it could be interpreted in a funny way until it was pointed out to me!)

Anyway, since discussion of this is going around, I thought I’d explain just what password hashing and salting means.

Let’s start at the beginning. You’re some kind of system that wants to have password security. Obviously, you need to save the passwords somewhere, right?

As we’ll see, that’s only partially correct – but let’s go with it for now. You need to store something that lets you check if a user supplied the right password, so you need to store something.

The most naive approach is create a file (or database, or whatever) that contains the usernames and passwords of all of your users. Something like:

```alice:abc
mark:pass
joe:123
jen:hello
```

Suppose you were a thief, and you wanted to crack this password file, what would you do? You’d try to steal that file! If you can get hold of that password file, then you’d have all of the passwords for all of the users of the system.

That means that this is a terrible way of storing the passwords. One step, and a thief has completely breached your system. We don’t want that. So what should we do?

First, we could encrypt the file.

This might seem like a good idea at first. If a thief were the steal the file, the wouldn’t be able to find your user’s passwords without figuring out the encryption key! It’s going to take a lot of work to figure out that key!

The first problem with this is that any password can be cracked given enough time and power. If there’s only one encryption key for the entire file, then it’s worth investing a lot of time and power into breaking it – and once it’s broken, then everything is revealed.

The second problem is: how does your system check a user’s password? It needs to decrypt the file! That means that the encryption key must be available to your system! So all that a thief needs to do is figure out where your system is getting the key from. You’ve got your entire security system for all of your users set up with a single point of protection, and somewhere in your system, everything that you need to break that protection is available!

What can we do to improve this? The answer is something called crypto graphic hashing.

Cryptographic hashing is a combination of two concepts: hashing, and one-way functions.

A really simple example of a not-very-good hash function of a string would be something like: convert all of the characters in the string to their numeric values, and exclusive-or the binary representation of those bits. With that hash function, you could take a string like “ABCD”, and convert it to the numeric values of the characters ([65, 66, 67, 68]), and then x-or them together (1000001 xor 1000010 xor 1000011 xor 1000100 = 0000100) for a result of 4. Real practical hash functions are more complicated.

For example, at least some versions of Java use the following as the default hash for a string of characters:

$text{hash}(s) = left(sum_{i in text{length}(s)} 31^{length(s) - i - 1}*s[i] right) mod 2^{32}$

There’s a class of mathematical functions called one-way functions. A one way function is a function $f$, where given $x$, it’s easy to compute $f(x)$, but given $f(x)$ it’s extremely difficult (or even impossible) to compute $x$.

If we combine those two, we have what’s called a crpytogrphic hash function: a function that takes an arbitrary input string, and converts it to a number, in a way where it’s very difficult to figure out what the input string that produced the number was. That’s great for storing passwords! We don’t store the password at all. We just store the hash-value produced from the password. Then when a user comes and logs in, we take their password, hash it, and compute the result to the stored hash.

```alice:7a2d28fc
mark:dfd4e1c6
joe:ed849ee1
jen:bb76e739
```

This is much better than storing the encrypted password. There is no encryption key that a thief can use to decrypt the password. Even if a thief knows the hash values of your user’s passwords, they can’t get in to the system! And your system actually never stores the actual values of the user’s passwords – just their hashcodes!

So again, let’s look at this from the perspective of a thief. How can a thief break into a system with hashed passwords?

If they don’t know what hash function you’re using, then they’re completely stuck. Sadly, they can probably figure it out. Designing new crpytographic hash functions is hard. Implementing cryptographic hash functions correctly is hard. As a result, most people just use a hash function from a library. That means that for a thief, it’s usually pretty easy to figure out what hash function is being used by a system.

Once they know what hash function you used, their only choice to break your system is to try to guess the passwords. That is, they can guess passwords, compute their hash codes, and search through your password file to see if any of the users password hashes matches. If they find one, they’re gold!

In fact, there’s a common strategy based on this idea called a rainbow table. A rainbox table is a list of common passwords, and the numeric value that they hash to with a common crptographic hash value. Something like:

pass 1b93eb12
abc 7a2d28fc

If you can somehow steal the passwords file, then with a rainbow table, you can find users with common passwords. For example, in the table above, you can see that the hashcode “7a2d28fc” occurs in the passwords file for the username “alice”, and it’s also in the rainbow table for the password “abc”. So a thief could determing that alice’s password was “abc”. Even with the best crpytographic hash, all it takes is one idiot user who uses “password” as their password, and your system’s security is breached.

Salting passwords addresses that problem. In a salting strategy, you don’t hash a user’s password by itself: you combine it with some additional data, and then hash that combination. The additional information is called the salt..

You can use lots of different things for the salt. There’s a complex set of tradeoffs in the exact salting strategy, which are beyond the scope of this post, but a few examples include:

1. Always use a fixed salt string. This is weak, but better than nothing. It’s got a similar weakness to the encrypted password system: you only need one salt to give you a handle on breaking all of the passwords, and that one salt needs to be in the system.
2. Add a random piece of data for each password. The catch here is that you need to store the salt data for each password. This is what unix passwords used to use. They added 12 random bits to each password. In the passwords file, they stored the salt and the hashed password. The weakness of this is that the salt is right there with the password entry. But because each user has a different salt, that means that any attempt to breach the system needs to look at each user separately.
3. Salt on metadata: that is, take information about the user that isn’t part of their username, and use that as the salt. For example, you could use a person’s birthday as the salt for their account.

If each user has a different salt, then even if you’ve got terrible passwords, a thief needs to do a lot of work to try to break your system. Even with a rainbow-table like strategy, they can’t compute the hashcode for a given common password once, and then search the password hash list for that code – they need to recompute it for each possible salt value!

What salting does is, effectively, increase the amount of effort needed to break the passwords. If you add 12 bits of salt, then a rainbow table needs 4096 times more entries to find common passwords! If your salt is long enough, then it can make it effectively impossible to create a rainbox table at all. If they try to attack you without a rainbow table, a 12 bit salt means that your attacker needs to attack the passwords of each of your users seperately! Even if they know the value of the salt, you’ve made it much harder for them to breach your security.

# Cryptographic Integrity using Message Authentication Codes

I don’t have a lot of time to write; I’m having my fifth (I think) upper endoscopy done tomorrow, which means that the day’s going to be a wash; and Yom Kippur is thursday, and I need to cook, so between the personal crap and work, I’m not going to have much time for blogging. So I’m trying to make use of the time I have to write one short but (hopefully) interesting post.

One thing that I’ve mentioned in passing is the distinction between message confidentiality, and message integrity.

Confidentiality is most of what we’ve been talking about
so far. Confidentially provides a guarantee that when you send an encrypted message, no one but your intended recipient is able