More Basics: Compilers, Programs, and Languages

After my “what is an OS?” post, a couple of readers asked me to write a similar post about compilers.

Before I can answer what a compiler is, it’s helpful to first answer a different question: what is a program?

And here we get to one of my pet peeves. The most common answer to that question is “a detailed step-by-step sequence of instructions”. For example, here’s what wikipedia says:

A computer program, or just a program, is a sequence of instructions, written to perform a specified task with a computer.

This is wrong.

Back when people first started to study the idea of computing devices, they talked about computing machines as devices that performed a single, specific task. If you think about a basic Turing machine, you normally define Turing machines that perform a single computation. They’ve got a built-in sequence of states, and a built in transition table – the machine can only perform one computation. It took one kind of input, and performed its computation on that input, producing its output.

Building up from these specific machines, they came up with the idea of a universal computing device. A universal computer was a computing machine whose input was a description of a different computing machine. By giving the universal machine different inputs, it could perform different computations.

The point of this diversion is that looking at this history tells us what a program really is: it’s a description of a computing machine. Our computers are universal computing machines; they take programs as input to describe the computing machines we want them to emulate. What we’re doing when we program is describing a computing machine that we’d like to create. Then we feed it into our universal computing machine, and it behaves as if we’d built a custom piece of hardware to do our computation!

The problem is, our computers are simultaneously very primitive and overwhelming complex. They can only work with data expressed in fixed-length sequences of on/off values; to do anything else, we need to find a way of expressing in terms of extremely simple operations on those on/off values. To make them operate efficiently, they’ve got a complex structure: many different kinds of storage (registers, l1 and l2 caches, addressable memory), complicated instruction sets, and a whole lot of tricky perfomance tweaks. It’s really hard to program a computer in terms of its native instructions!

In fact, it’s so hard to program in terms of native instructions that we just don’t do it. What we do is write programs in terms of different machines. That’s the point of a programming language.

Looked at this way, a program language is a way of describing computing machines. The difference between different programming languages is how they describe computing machines. A language like C describes von Neumann machines. Haskell describes machines that work via lambda calculus computations using something like a spineless G-machine. . Prolog describes machines that perform computations in terms of intuitionistic logical inference like a Warren Abstract Machine.

So finally, we can get to the point: what is a compiler? A compiler is a program that takes a description of a computing device defined in one way, and translates into the kind of machine description that can be used by our hardware. A programming language ets us ignore all of the complexities of how our actual hardware is built, and describe our computations in terms of a simple abstraction. A compiler takes that description, and turns it into the form that computer hardware can actually use.

For anyone who’s read this far: I’ve gotten a few requests to talk about assembly language. I haven’t programmed in assembly since the days of the Motorola 68000. This means that to do it, I’ll need to learn something more up-to-date. Would you be more interested in seeing Intel, or ARM?

15 thoughts on “More Basics: Compilers, Programs, and Languages

  1. _Arthur

    ARM! ARM! I vote for ARM!
    ARM is in almost every cell phone and tablet nowadays, and there’s talk of making ARM servers and computers too.

    If I understand right, there’s not one be several ARM machine languages, now ?

    Reply
  2. Carleh

    I vote too for ARM, but that’s a selfish vote, since I already know intel asm, and I’d like to learn something new. Thanks.

    Reply
  3. Martijn Hoekstra

    Interesting you bring up the Wikipedia article, where quiet a lot of discussion went in to the interestingly complex definition. Is the source code of the program the program? Is its executable form the program? Is source code with a syntax error a computer program? Is (hypothetical) source code according to a language specification for which no compiler exists a program? None of these questions have strait forward answers.

    Reply
  4. Peter B

    I vote for ARM. Cortex-M3 specifically.

    I started with the IBM 1620 in what today would be called absolute binary. I key punched one self loading “IBM card” per instruction. At the time I resisted using an assembler because what I knew worked for me. After finally looking at the 1620 assembler I found that it greatly simplified my programming efforts. Assemblers are simply tools to get bits into computers.

    I’ve grown fussy after programming over 20 distinctly processors in assembly. I’m currently doing assembly on ARM Cortex-M3 and Cortex-M4 processors. Its “Thumb-2” instruction set is straightforward. As if it actually means anything, I give the award of the best interrupt controller ever to the ARM Nested Vector Interrupt Controller.

    What’s not to love with ARM. 16 32-bit registers. Even the 3 special registers (program counter, stack pointer and subroutine return register) can be used in almost all instructions.

    ARM machines are RISC which I believe means load/store architecture. All arithmetic is performed in registers. To add 1 to a location in a memory location one must first fetch the contents of that location into a register then add 1 to the register then write it back.

    On the other hand you might go low end and take apart an 8-bit processor, the Microchip PIC 16F57. 72 whole bytes of RAM, inconveniently arranged in 4 pages. On PICs one can add one to memory with a single instruction. So I guess it qualifies as CISC. Such machines are not useless. For example my 16F57 code operates a pellet stove – see http://sierraproductsinc.net/easyfire-pellet-stoves.htm.

    Reply

Leave a Reply to sugarfrosted Cancel reply