# Introducing Cryptanalysis

To understand why serious encryption algorithms are so complex, and why it’s
so important to be careful with the critical secrets that make an encryption
system work, it’s useful to understand something about how people break
encryption systems. The study of this is called cryptanalysis, and it’s
an amazingly fascinating field of applied mathematics. I’m going to be
interspersing information about cryptanalysis with my cryptography posts. One
thing to remember here is that we’ll be talking about it mainly in the context
of how you can break an encryption system – but cryptanalysis is also used for
designing cryposystems, because you can only design a successful cryptosystem by
thinking about how it can defeat the ways that it could be broken.

One caveat: I’m going to be describing cryptanalysis in terms of how I understand it, which is sometimes different from classical descriptions by cryptanalysists. My
understanding is strongly rooted in computation and information theory, rather than pure math. So sometimes my presentation will be a bit different, but hopefully by staying in the ground where I’m most comfortable, I can do a better job of making it comprehensible.

# Rotating Ciphers

So, last time, we looked at simple substitution ciphers. In a substitution
cipher, you take each letter, and pick a replacement for it. To encrypt a
message, you just substitute the replacement for each instance of each letter.
As I explained, it’s typically pretty each to break that encryption – the basic
secret of the encryption is the substitution system, and it’s pretty easy to
figure that out, because the underlying information being encrypted still has a
lot of structure.

There are a couple of easy improvements on a simple substitution cipher, some of which came up in the comments. For example, two
good easy improvements are:

1. Instead of defining substitutions for single characters, define
substitutions for groups (pairs, triplets) of characters. This improves things,
because it allows you to work with groups that will reduce the visibility of
patterns. Still, because there’s so much structure in human language, given
enough data, an encrypted message is still likely to be easy to decode. So this
is great for short messages, but not for anything bigger.
2. Multiple substitutions: instead of always substituting, say, “x” for “a”,
substitute each letter with a two-digit number. Then for common letters, allow
multiple possible substitutions. By assigning many codes to common letters, and
few codes to uncommon letters, you can make the coded symbols appear with
roughly equal frequency. This can seriously hamper frequency based analyses.

Both of those changes help. They work particularly well when combined. To do
a two-character version of that, you create a list of all possible two-character
sequences. Then you generate a frequency table for how often each two-character
sequence occurs in a large sample of the kind of text you’re going to encode.
Then, finally, you assign a number of substitutions for each pair so that they
occur with approximately equal frequency. That gives you a pretty good
system.

Still, it’s not great. Given enough encoded text, it can be cracked with a
relatively small amount of computational power. If I know the basic idea of the
cipher, and I’ve got a decent amount of encoded text, I can write a program that
will figure it out pretty quickly. Plus, it’s really a lot of work to generate
the cipher – you need to generate frequency tables, and work out the number of
substitions, etc. It’s definitely not trivial to set up, and it’s still pretty
easy to crack.

For that reason, those kinds of solutions aren’t used much – there’s a lot
of prep work, and the secret that you need to share with your partner is large
and complicated. You can get better quality with less effort and a
simple secret using a different scheme called a rotating cipher.

# Simple Encryption: Introduction and Substitution Ciphers

The starting point talking about encryption is to understand
what the point of it is; what it’s supposed to do, what problems it’s supposed to avoid.

Encryption is fundamentally about communication: you’ve got two parties who want to communicate, but don’t want anyone else to be able to listen in.

They way that you do that is by sharing a secret. You use that secret to somehow modify the information that you’re going to send, so that it can’t be read by someone who doesn’t have the secret. People often think of encryption as a way of using a password to hide information, but a password is just one of many kinds of secrets that you can use. The secret that you share with your counterpart can be a password, a number, a textbook, or just about anything else you can imagine.

# Encryption, Privacy, and You

As you’ve probably heard, the US customs service has, recently, asserted the right to confiscate any and all computers and/or digital storage carried by anyone crossing the US border. They further assert
the right to demand all passwords, encryption keys, etc., from
the owners. They even further assert the right to keep or make copies of any data that they find, and to share it without limit with anyone they choose.

I don’t think I really need to stress how insane this is. Back
when I worked for IBM, I frequently travelled to Canada, because I
worked with development labs in Toronto and Ottawa. When I did that, I
carried a computer full of stuff that IBM considered to be highly
confidential and highly sensitive. (I’ve even still got a wall-plaque
from IBM thanking for me work on a project, where I’m not allowed to
ever tell anyone what I did to earn it!) What this policy
says is that the border service would have the right to turn that
information over to anyone they wanted, without informing me
or IBM that they had done so. Further, some of the information on that
laptop was encrypted, and I did not have the key. They were
encrypted with a system that would only allow them to be opened if the
computer could contact a particular IBM server from inside the IBM
firewall. So not only could the border service have confiscated the
computer and passed on confidential or private information – but they
could have arrested me for refusing to decrypt the information on the
computer – even though I couldn’t decrypt it.

This isn’t new news. They’ve been doing this for a while, and we know they’ve been doing it – they’ve made absolutely no attempt to
hide it.

The reason that I’m writing about it now is because I just read
something on Salon about how an allegedly knowledgeable and tech-savvy
person recommends coping with this, and I can’t possible disagree more
strongly. On the Salon Machinist blog, Denise Caruso wrote:

Swire notes that agents at the border are going further than just
taking image copies of people’s hard drives. They’re actually
demanding passwords and encryption keys so they can examine the
contents.

Of course, they promise to destroy the copies and the keys as soon
as they’re done — as long as they don’t find anything illegal, like a
downloaded song you didn’t pay for — so no security worries there,
right? There’s no such thing as a crooked customs or Border Patrol
agent.

spreadsheets, documents and personal financial information like credit
card receipts and photos, nowadays they can also listen to your stored
Skype calls and voice mails.

Not to mention that just having encrypted data on your hard drive
causes suspicion, or at least throws down the gauntlet. If you were
looking for illegal stuff and you ran into a file that looked like
this,

```qANQR1DBwU4D/TlT68XXuiUQCADfj2o4b4aFYBcWumA7hR1Wvz9rbv2BR6WbEUsy
ZBIEFtjyqCd96qF38sp9IQiJIKlNaZfx2GLRWikPZwchUXxB+AA5+lqsG/ELBvRa
c9XefaYpbbAZ6z6LkOQ+eE0XASe7aEEPfdxvZZT37dVyiyxuBBRYNLN8Bphdr2zv z/9Ak4
/OLnLiJRk05/2UNE5Z0a+3lcvITMmfGajvRhkXqocavPOKiin3hv7+Vx88
```

wouldn’t you immediately need to know what it said? It could be a conspiracy! It could be a list of child pornographers! It could be a copyrighted magazine article! It could be a bootleg Led Zepplin video!

Urgh.

So I figure the best solution is to encode your files rather than
encrypt them, so that you could hide your stuff in plain sight. If
agents don’t know something is encrypted and it looks innocuous, they
won’t compel you to give them the key. “Here’s your laptop, ma’am.
Sorry for the inconvenience.”

That’s the wrong answer. The solution isn’t to try to hide the
fact that you’re taking your own/your employer’s privace seriously. The answer is to make encryption so absolutely routine that (A) finding encrypted files on a computer is so common and routine that it can’t be used as a distinguishing characteristic to allow them to justify confiscating your computer, and (B) to make it so incredibly painful and laborious for them to get any data off of a computer that they give up.

The first part of instructions for how to do this are below.