I’m going to write a few posts about programming in machine language. It seems that many more people are interested in learning about the ARM processor, so that’s what I’ll be writing about. In particular, I’m going to be working with the Raspberry Pi running Raspbian linux. For those who aren’t familiar with it, the Pi is a super-inexpensive computer that’s very easy to program, and very easy to interface with the outside world. It’s a delightful little machine, and you can get one for around $50!
Anyway, before getting started, I wanted to talk about a few things. First of all, why learn machine language? And then, just what the heck is the ARM thing anyway?
Why learn machine code?
My answer might surprise you. Or, if you’ve been reading this blog for a while, it might not.
Let’s start with the wrong reason. Most of the time, people say that you should learn machine language for speed: programming at the machine code level gets you right down to the hardware, eliminating any layers of junk that would slow you down. For example, one of the books that I bought to learn ARM assembly (Raspberry Pi Assembly Language RASPBIAN Beginners: Hands On Guide) said:
even the most efficient languages can be over 30 times
slower than their machine code equivalent, and that’s on a good
This is pure, utter rubbish. I have no idea where he came up with that 30x figure, but it’s got no relationship to reality. (It’s a decent book, if a bit elementary in approach; this silly statement isn’t representative of the book as a whole!)
In modern CPUs – and the ARM definitely does count as modern! – the fact is, for real world programs, writing code by hand in machine language will probably result in slower code!
If you’re talking about writing a single small routine, humans can be very good at that, and they often do beat compilers. Butonce you get beyond that, and start looking at whole programs, any human advantage in machine language goes out the window. The constraints that actually affect performance have become incredibly complex – too complex for us to juggle effectively. We’ll look at some of these in more detail, but I’ll explain one example.
The CPU needs to fetch instructions from memory. But memory is dead slow compared to the CPU! In the best case, your CPU can execute a couple of instructions in the time it takes to fetch a single value from memory. This leads to an obvious problem: it can execute (or at least start executing) one instruction for each clock tick, but it takes several ticks to fetch an instruction!
To get around this, CPUs play a couple of tricks. Basically, they don’t fetch single instructions, but instead grab entire blocks of instructions; and they start retrieving instructions before they’re needed, so that by the time the CPU is ready to execute an instruction, it’s already been fetched.
So the instruction-fetching hardware is constantly looking ahead, and fetching instructions so that they’ll be ready when the CPU needs them. What happens when your code contains a conditional branch instruction?
The fetch hardware doesn’t know whether the branch will be taken or not. It can make an educated guess by a process called branch prediction. But if it guesses wrong, then the CPU is stalled until the correct instructions can be fetched! So you want to make sure that your code is written so that the CPUs branch prediction hardware is more likely to guess correctly. Many of the tricks that humans use to hand-optimize code actually have the effect of confusing branch prediction! They shave off a couple of instructions, but by doing so, they also force the CPU to sit idle while it waits for instructions to be fetched. That branch prediction failure penalty frequently outweighs the cycles that they saved!
That’s one simple example. There are many more, and they’re much more complicated. And to write efficient code, you need to keep all of those in mind, and fully understand every tradeoff. That’s incredibly hard, and no matter how smart you are, you’ll probably blow it for large programs.
If not for efficiency, then why learn machine code? Because it’s how your computer really works! You might never actually use it, but it’s interesting and valuable to know what’s happening under the covers. Think of it like your car: most of us will never actually modify the engine, but it’s still good to understand how the engine and transmission work.
Your computer is an amazingly complex machine. It’s literally got billions of tiny little parts, all working together in an intricate dance to do what you tell it to. Learning machine code gives you an idea of just how it does that. When you’re programming in another language, understanding machine code lets you understand what your program is really doing under the covers. That’s a useful and fascinating thing to know!
What is this ARM thing?
As I said, we’re going to look at machine language coding on the
ARM processor. What is this ARM beast anyway?
It’s probably not the CPU in your laptop. Most desktop and laptop computers today are based on a direct descendant of the first microprocessor: the Intel 4004.
Yes, seriously: the Intel CPUs that drive most PCs are, really, direct descendants of the first CPU designed for desktop calculators! That’s not an insult to the intel CPUs, but rather a testament to the value of a good design: they’ve just kept on growing and enhancing. It’s hard to see the resemblance unless you follow the design path, where each step follows directly on its predecessors.
The Intel 4004, released in 1971, was a 4-bit processor designed for use in calculators. Nifty chip, state of the art in 1971, but not exactly what we’d call flexible by modern standards. Even by the standards of the day, they recognized its limits. So following on its success, they created an 8-bit version, which they called the 8008. And then they extended the instruction set, and called the result the 8080. The 8080, in turn, yielded successors in the 8088 and 8086 (and the Z80, from a rival chipmaker).
The 8086 was the processor chosen by IBM for its newfangled personal computers. Chip designers kept making it better, producing the 80286, 386, Pentium, and so on – up to todays CPUs, like the Core i7 that drives my MacBook.
The ARM comes from a different design path. At the time that Intel was producing the 8008 and 8080, other companies were getting into the same game. From the PC perspective, the most important was the 6502, which
was used by the original Apple, Commodore, and BBC microcomputers. The
6502 was, incidentally, the first CPU that I learned to program!
The ARM isn’t a descendant of the 6502, but it is a product of the 6502 based family of computers. In the early 1980s, the BBC decided to create an educational computer to promote computer literacy. They hired a company called Acorn to develop a computer for their program. Acorn developed a
beautiful little system that they called the BBC Micro.
The BBC micro was a huge success. Acorn wanted to capitalize on its success, and try to move it from the educational market to the business market. But the 6502 was underpowered for what they wanted to do. So they decided to add a companion processor: they’d have a computer which could still run all of the BBC Micro programs, but which could do fancy graphics and fast computation with this other processor.
In a typical tech-industry NIH (Not Invented Here) moment, they decided that none of the other commercially available CPUs were good enough, so they set out to design their own. They were impressed by the work done by the Berkeley RISC (Reduced Instruction Set Computer) project, and so they adopted the RISC principles, and designed their own CPU, which they called the Acorn RISC Microprocessor, or ARM.
The ARM design was absolutely gorgeous. It was simple but flexible
and powerful, able to operate on very low power and generating very little heat. It had lots of registers and an extremely simple instruction set, which made it a pleasure to program. Acorn built a lovely computer with a great operating system called RiscOS around the ARM, but it never really caught on. (If you’d like to try RiscOS, you can run it on your Raspberry Pi!)
But the ARM didn’t disappear. Tt didn’t catch on in the desktop computing world, but it rapidly took over the world of embedded devices. Everything from your cellphone to your dishwasher to your iPad are all running on ARM CPUs.
Just like the Intel family, the ARM has continued to evolve: the ARM family has gone through 8 major design changes, and dozens of smaller variations. They’re no longer just produced by Acorn – the ARM design is maintained by a consortium, and ARM chips are now produced by dozens of different manufacturers – Motorola, Apple, Samsung, and many others.
Recently, they’ve even starting to expand even beyond embedded platforms: the Chromebook laptops are ARM based, and several companies are starting to market server boxes for datacenters that are ARM based! I’m looking forward to the day when I can buy a nice high-powered ARM laptop.