Writing Basic Functions in Haskell (edited repost)

(This is a heavily edited repost of the first article in my original
Haskell tutorial.)

(I’ve attempted o write this as a literate haskell program. What that
means is that if you just cut-and-paste the text of this post from your
browser into a file whose name ends with “.lhs”, you should be able to run it
through a Haskell compiler: only lines that start with “>” are treated as
code. The nice thing about this is that this blog post is itself a
compilable, loadable Haskell source file – so I’ve compiled and tested
all of the code in here in exactly this context.)

Haskell is a functional programming language. At the simplest
level, what that means is that you write programs by writing functions – and the
functions are truly mathematical functions: they take a set of inputs, and generate
a set of outputs, and the result of the function call is dependent solely
on the values of the inputs. So, the best way to start out
looking at Haskell is to look at how to write basic functions.

So let’s dive right in a take a look at a very simple Haskell definition
of the factorial function:

> simplest_fact n = if n == 0
>     then 1
>     else n * simplest_fact (n - 1)

This is the classic implementation of the factorial. Some things to
notice:

  1. In Haskell, a function definition uses no keywords. Just “name
    params = impl
  2. Function application is written by putting
    things side by side. So to apply the factorial function to x, we
    just write simplest_fact x. Parens are only used for managing precedence.
    The parens in “fsimplest_act (n - 1)” aren’t there because the function
    parameters need to be wrapped in parens, but because otherwise, the default
    precedence rules would parse it as “(simplest_fact n) - 1“.
  3. Haskell uses infix
    syntax for most mathematical operations. That doesn’t mean that they’re
    special: they’re just an alternative way of writing a binary function
    invocation. You can define your own mathematical operators using normal
    funcion definitions.
  4. There are no explicit grouping constructs in Haskell:
    no “{/}”, no “begin/end”. Haskell’s formal syntax uses braces, but the
    language defines how to translate indentation changes into braces; in
    practice, just indenting the code takes care of grouping.

That implementation of factorial, while correct, is not how most Haskell
programmers would normally implement it. Haskell includes a feature called
pattern matching, and most programmers would use pattern matching
rather than the if/then statement for a case like that. Pattern matching
separates things and often makes them easier to read. The basic idea of
pattern matching in function definitions is that you can write what look
like
multiple versions of the function, each of which uses a different set of
patterns for its parameters (we’ll talk more about what patterns look like in
detail in a later post); the different cases are separated not just by
something like a conditional statement, but they’re completely separated at
the definition level. The case that matches the values at the point of call is
the one that is actually invoked. So here’s a more stylistically correct
version of the factorial:

> pattern_fact 0 = 1
> pattern_fact n = n * pattern_fact (n - 1)
>

In the semantics of Haskell, those two are completely equivalent. In fact,
the deep semantics of Haskell are based on pattern matching – so it’s more
correct to say that the if/then/else version is translated into the pattern
matching version than vice-versa! At a low level, both will basically be
expanded to the Haskell pattern-matching primitive called the
case” statement, which selects from among a list of patterns:

> cfact n = case n of
>              0 -> 1
>              _ -> n * cfact (n - 1)

Another way of writing the same thing is to use Haskell’s list type. Lists
are a very fundamental type in Haskell. They’re similar to lists in Lisp,
except that all of the elements of a list must have the same type. A list is
written between square brackets, with the values in the list written inside,
separated by commas, like:

[1, 2, 3, 4, 5]

As in lisp, the list is actually formed from pairs, where the first
element of the pair is a value in the list, and the second value is the rest
of the list. Pairs are constructed in Haskell using “:“,
so the list could also be written in the following ways:

1 : [2, 3, 4, 5]
1 : 2 : [3, 4, 5]
1 : 2 : 3 : 4 : 5 : []

If you want a list of integers in a specific range, there’s a shorthand
for it using “..“. To generate the list of values from
x to y, you can write:

[ x .. y ]

Getting back to our factorial function, the factorial of a number “n” is
the product of all of the integers from 1 to n. So another way of saying that
is that the factorial is the result of taking the list of all integers from 1
to n, and multiplying them together:

listfact n = listProduct [1 .. n]

But that doesn’t work, because we haven’t defined listProduct
yet. Fortunately, Haskell provides a ton of useful list functions. One of the
useful types of list operations is fold, which comes in two versions:
foldl and foldr. What folds do is iterate over a
list using some functions to combine elements. It takes a function to merge
two values, and an initial value for the merge. Then it takes the first
element in the list, and merges it with the initial value of the result. Then
it takes the second value, and merges it with the result of the first. And so
on.

The difference between foldl and foldr
is the associative order in which they do things: foldl starts
from the left, and foldr starts from the right.

To illustrate, suppose we had a list [1, 2, 3, 4, 5, 6]. If
we did foldl (*) 1 [1, 2, 3, 4, 5, 6], it would evaluate as
“(((((1 * 1) * 2) * 3) * 4) * 5) * 6”. That is, it would start with 1, and
then 2, and then 3, and so on. foldr would group
right-associative; it would do “1*(2*(3*(4*(5*(6*1)))))”. Since *
is associative, the result value doesn’t matter. But it can make a significant
difference in performance; the way that the Haskell ends up evaluating
function calls, you can easily wind up with programs where foldr
can be much faster that foldl.

So, using fold, you can write the factorial using lists:

> listfact n = listProduct [ 1 .. n ]
>        where listProduct lst = foldl (*) 1 lst

One thing that some Haskell programmers like to do is to write
what they call points-free functions: that is, functions
that do not need to name any of their parameters. You can do an amazing
amount of stuff in points-free Haskell. Personally, I’m not a fan of
points-free programming: I find it much harder to read. But some people
love it.

To go points-free, you define a function in terms
of series of compositions of other functions. The elements of
those compositions take care of pushing the values around to
get them to the right place. For factorial, it’s pretty easy. There’s
a function called “enumFromTo“, which takes two
parameters, m and n, and generates a list of the values from m to n. Through
a haskell feature called currying, if you take a two-parameter
function and only give it one parameter, you get back a one-parameter
function. So:

> enumFromOne = enumFromTo 1

is exactly the same as:

enumFromOne n = enumFromTo 1 n

There’s a list function, “product” which takes the product of
all of the elements of a list – it’s basically a shorthand for “zipwith
(*) 1
“. Finally, if you’ve got two functions f and g, you can
write the composition of the two functions as g . f. So,
to write a points-free factorial function, you could just write:

> pointsfree_fact = product . (enumFromTo 1)

I need to explain one more list function before moving on. There’s a
function called “zipWith>” for performing an operation pairwise
on two lists. For example, given the lists “[1,2,3,4,5]” and
[2,4,6,8,10]“, “zipWith (+) [1,2,3,4,5]
[2,4,6,8,10]
” would result in “[3,6,9,12,15]“.

Now we’re going to jump into something that’s going to seem really
strange. One of the fundamental properties of how Haskell runs a program is
that Haskell is a lazy language. What that means is that no
expression is actually evaluated until its value is needed. So you
can do things like create infinite lists – since no part of the list is
computed until it’s needed, you can create something that defines an infinite
list – but since you’ll never access more than a finite number of elements of
it, it’s no problem. Using that, we can do some really clever things, like
define the complete series of fibonnaci numbers:

> fiblist = 0 : 1 : (zipWith (+) fiblist (tail fiblist))

This looks very strange if you’re not used to it. But if we tease it apart a
bit, it’s really pretty simple:

  1. The first element of the fibonacci series is “0”.
  2. The second element of the fibonacci series is “1”.
  3. For the rest of the series, take the full fibonacci list, and
    line up the two copies of it, offset by one (the full list, and the
    list without the first element), and add the values pairwise.

That third step is the tricky one. It relies on the laziness of Haskell:
Haskell won’t compute the nth element of the list until you
explicitly reference it; until then, it just keeps around the
unevaluated code for computing the part of the list you haven’t
looked at. So when you try to look at the nth value, it will compute
the list only up to the nth value. So the actual computed
part of the list is always finite – but you can act as if it wasn’t. You can
treat that list as if it really were infinite – and retrieve any
value from it that you want. Once it’s been referenced, then the list up to
where you looked is concrete – the computations won’t be repeated.
But the last tail of the list will always be an unevaluated expression that
generates the next pair of the list – and that pair will
always be the next element of the list, and an evaluated expression for the
pair after it.

Just to make sure that the way that “zipWith” is working in
fiblist” is clear, let’s look at a prefix of the parameters to
zipWith, and the result. (Remember that those three are all actually the
same list! The diagonals from bottom left moving up and right are the same
list elements.)

fiblist      = [0  1  1  2  3  5  8 ...]
tail fiblist = [1  1  2  3  5  8  ...  ]
zipWith (+)  = [1  2  3  5  8  13 ...  ]

Given that list, we can find the nth element of the list very
easily; the nth element of a list l can be retrieved with
l !! n“, so, the fibonacci function to get the nth
fibonacci number would be:

> list_fib n = fiblist !! n

And using a very similar trick, we can do factorials the same way:

> factlist = 1 : (zipWith (*) [1..] factlist)
> factl n = factlist !! n

The nice thing about doing factorial this way is that the values of all of
the factorials less than n are also computed and remembered, so the
next time you take a factorial, you don’t need to repeat those
multiplications.

0 thoughts on “Writing Basic Functions in Haskell (edited repost)

  1. Nomen Nescio

    ok, i can feed the post (moderately edited) into ghci and play with the functions in the toplevel, and that works. but when i try to add a “main” function and feed it through non-interactive ghc, i typically get some sort of type error. what’s ghci loading that ghc (version 6.8.2) doesn’t link in?

    Reply
  2. Patrick

    Also, Nomen, I was able to run the post just fine without editing it; for a main, I just used
    > main = print (list_fib 10)
    What’re you using?

    Reply
  3. Hugo

    Obligatory nitpicks:
    product = foldl (*) 1 — not zipWith
    list access with (!!) is O(n). To get truly O(1) memoizing solutions for fac and fib you would have to build arrays from those lists. This does have some incoveniences, though, so I don’t think it is appropriate for a short introduction.
    Otherwise, it is always nice to see a blog post about Haskell!

    Reply
  4. Nomen Nescio

    …ah. i was missing the print.
    either my brain’s running low on caffeine after a long workweek, or else i’ve spent so much time in a Python toplevel that i’ve entirely forgot how compiled languages work. :-/ my mistake, sorry.

    Reply
  5. Daniel Sobral

    That last example wasn’t very easy, particularly because the result is the list of factorials from 0, and previous alternatives always concerned themselves with factorials from 1, leaving 0 as a special case.
    By the way, is !! 0-based or 1-based? You mention “nth element”, which suggests it being 1-based, in which case the factorial function isn’t really correct.

    Reply

Leave a Reply to Patrick Cancel reply