In Haskell, there are no looping constructs. Instead, there are two alternatives: there are list iteration constructs (like foldl which we’ve seen before), and tail recursion. Let me say, up front, that in Haskell if you find yourself writing any iteration code on a list or tree-like structure, you  should always look in the libraries; odds are, there’s some generic function in there that can be adapted for your use. But there are always cases where you need to write something like a loop for yourself, and tail recursion is the way to do it in Haskell.
 Tail recursion is a kind of recursion where the recursive call is the very last
thing in the computation of the function. The value of tail recursion is that in a tail
recursive call, the caller does nothing except pass up the value that’s returned
by the the callee; and that, in turn, means that you don’t need to return to the caller
at all! If you think of it in terms of primitive machine-level code, in a tail-recursive call, you can use a direct branch instruction instead of a branch-to-subroutine; the tail-recursive call does *not* need to create a new stack frame. It can just reuse the callers frame. 
 Let’s look at one quick example. Suppose you’ve got a list of integers, and you’d like to
take the sum. If you were writing that in an imperative language like C, you’d write
something like:
int sum(int len, int *array) {
int sum=0;
int i;
for (i=0; i < len; i++) {
sum += array[i];
}
return sum;
}
 What’s nice about that code is that it doesn’t use any stack space for the
iteration. You can call it for an array of essentially any size, and it use the same
amount of memory to produce the sum. The naive way of writing that in Haskell (assuming we were being stupid and not using any of the library functions like `foldr`) would be
something like:
>naiveSumList :: [Integer] -> Integer >naiveSumList (x:xs) = x + (naiveSumList xs) >naiveSumList [] = 0
 The problem with implementing it that way is that it’s very wasteful of space. It ends up
using **O**(n) space in the length of the list! That’s because for each element of the list,
there is a recursive call to the function which allocates a new stack frame. To illustrate this, consider the two following diagrams: one is a stack diagram, where the vertical line
represents the lifetime of the stack frame, and new frames appear to the right of their parents; the other is a lambda-calculus expansion showing how a pure syntactic expansion
would evaluate.

naiveSumList [3, 4, 5, 6] => 3 + (naiveSumList [4,5,6] => 3 + (4 + naiveSumList [5,6])) => 3 + (4 + (5 + (naiveSumList [6]))) => 3 + (4 + (5 + (6 + (naiveSumList [])))) => 3 + (4 + (5 + (6 + 0))) => 3 + (4 + (5 + 6)) => 3 + (4 + 11) => 3 + 15 => 18
 As you can see, the stack frame of each call needs to survive longer than the frame of any functions that it calls – because each call to naiveSumList for a non-empty list needs to get the return value of the recursive call, and add it to its x parameter before it can return. All of the stack frames need to be preserved – because an action needs to be performed in the context of each
frame before that frame can be dropped.
 In a tail-recursive function, the stack frames do not need to be preserved. There is no action that needs to be taken in the context of a frame after it makes its
recursive call. So instead of creating a stack frame for each member of the list, it can
create one stack frame, and just reuse it.
So, let’s look at a tail recursive addition loop in Haskell.
>sumList :: [Integer] -> Integer >sumList lst = sumLoop lst 0 where > sumLoop (x:xs) i = sumLoop xs (i+x) > sumLoop [] i = i
 Now, let’s look at the corresponding diagrams for the tail-recursive version. The only frame that needs to be kept is the frame for the initial call to sumList; it calls sumLoop, which can reuse the same stack frame – which calls itself recursively, reusing the same stack frame; and when it’s
done with the list, sumLoop can return its result directly to the caller of sumList.

sumList [3,4,5,6] => sumLoop [3,4,5,6] 0 => sumLoop [4,5,6] 3 => sumLoop [5,6] 7 sumLoop [6] 12 => sumLoop [] 18 => 18
 The trick that I used to create the tail-recursive version is a very common technique for creating a tail-recursive loop. What would be loop index variables/accumulator variables in an imperative language become parameters in the tail-recursive version. So instead of the line of C code that initializes the accumulator sum to 0, in the Haskell version, sumList calls sumLoop with an initial parameter value of 0. Instead of adding the next list element to the accumulator, we pass the sum as a parameter to the next iteration. 
 How about our binary tree example? Binary tree search is already tail recursive.
Searching checks the value in the current node, and either returns immediately, or returns the
value of calling search on a subtree. What if we wanted to do insert in a tail
recursive way? In the typical implementation, you call insert on a subtree, and then return a new tree node that includes the value of the subtree node: so there's something that needs to
be done after the recursive call returns, which means that the stack frame needs to be kept? We can do it. It's pretty pointless: realistically, we probably wouldn't want to bother doing it tail recursively, because the stack for recursively working down a binary tree will never get that deep, and the information that we need to pass as parameters will be as large as the set of stack frames. But it makes a nice explanatory example, so we'll do it anyway.)
 The trick in converting to tail recursion is to capture whatever information is needed
to generate the result into a parameter that's passed along with the function. For a tree
insert, we need to find the correct place to insert the new value, and then we need to
patch that insertion into the path of nodes up the tree to the root. So what we need to
pass down the recursive calls is the path taken down the tree in the course of searching
for an insertion location.
 That description that I just wrote is pretty clearly a two-phase thing: search for the
insertion location; and then patch the tree. Anytime you see a description like that, it's a
good clue that you probably want to write it as separate functions. We'll split it into
findInsertPath, which searches down the tree, finding the insert location and
generating a list containing the from the insertion point back to the tree root; and
patchTree, which takes a new node for a newly inserted value, and follows the
path created by findInsertPath back up the tree creating the parent nodes for the
new tree with the inserted value. Since the only time we're going to call
patchTree or findInsertPath is inside of ttInsert,
we'll make them local functions  using a where clause.
>data (Ord a) => TailTree a = TTNode a (TailTree a) (TailTree a) > | TTEmpty deriving (Show) > >ttInsert :: (Ord a) => a -> TailTree a -> TailTree a >ttInsert newval tree = > patchPath (TTNode newval TTEmpty TTEmpty) (findInsertPath newval tree []) where > findInsertPath newval node@(TTNode v left right) path > | v > newval = findInsertPath newval left (node:path) > | otherwise = findInsertPath newval right (node:path) > findInsertPath newval TTEmpty path = path > patchPath node [] = node > patchPath node@(TTNode a _ _) ((TTNode v left right):path) > | v > a = patchPath (TTNode v node right) path > | otherwise = patchPath (TTNode v left node) path
 When you're not used to it, that looks pretty weird. But if you take the time to look
through it carefully, it's reasonably easy to follow. The downward half, when
findInsertPath is walking down the tree should be easy to understand. The second
stage, when patchPath is walking back up the tree looks a bit odd, but if you
trace it out on paper you'll see that it's doing exactly the same thing, in exactly the same
order as the return path of a the regular non-tail-recursive version. The only real difference
is that instead of letting the call stack maintain the list of nodes that need to be rebuilt
for the post-insert tree, and then guide the way through rebuilding the tree, we've taken that and made it explicit in the code, doing everything with tail calls.
Let's trace it, just to see. To make it easier to read, I'm going to use a more compact non-Haskell syntax for the trees. For empty nodes, I'll just leave their values empty,
and for non-empty nodes, I'll write {val:left/right}. Let's start with a tree:
{10:{5:{3:/}/{7:/}/{16:{14:{12:/}/}/{18:{17:/}/{21:{19:/}/}}}}}
Drawn in tree form, that's:

 To make things easier, I'll also abbreviate tree nodes sometimes. To reproduce a subtree that appeared before, I'll just write {val:...}. So, for example, I might
abbreviate the tree above as {10:{5:...}/{16:...}}. Lets walk through
the evaluation of ttInsert to add the value 15 to thee xample tree above.
ttInsert 15 {10:{5:{3:/}/{7:/}/{16:{14:{12:/}/}/{18:{17:/}/{21:{19:/}/}}}}} =>
patchPath {15:/} (findInsertPath 15 {10:{5:...}/{16:...}} []) =>
findInsertPath 15 {10:{5:...}/{16:...}} [] =>
findInsertPath 15 {16:{14:...}/{18:...}} [{10:...}] =>
findInsertPath 15 {14:{12:/}/} [{16:...},{10:...}] =>
findInsertPath 15 {} [{14:...},{16:...},{10:...}] =>
patchPath {15:/} [{14:...},{16:...},{10:...}] =>
patchPath {14:{12:/}/{15:/}} [{16:...},{10:...}] =>
patchPath {16:{14:{12:/}/{15:/}}/{18:...}} [{10:...}] =>
patchPath {10:{5:...}/{16:{14:{12:/}/{15:/}}/{18:...}} [] =>
{10:{5:{3:/}/{7:/}}{16:{14:{12:/}/{15:/}}/{18:/{17:/}/{21:{19:/}/}}}}
If you look at the tree parameter in each call of patchPath,
it's precisely the return value of the corresponding recursive call
to insert in the non-tail recursive version. (In the first draft of this,
I screwed up in the third call to findInsertPath. Thanks to Craig Fratrik
for pointing out the error.)

Note though that tail recursion in Haskell is a slight bit tricker to reason about than it is in something like, e.g., scheme because of lazy evaluation.
For example, in Haskell it’s often much more natural and efficient to use foldr instead of foldl, even though the former is not tail recursive and the latter is, or at least it appears so naively.
For example, suppose I want to find the first negative value in a list (I’ll return 0 if it isn’t there). One perfectly efficient way to to this in Haskell is: (watch the comment system screw up my formatting)
firstNeg :: (Num a) => [a] -> a
firstNeg [] = 0
firstNeg x:xs = pickFirstNeg x (firstNeg xs)
where pickFirstNeg a b = if (a < 0) then a else b
Now in Haskell, that ends up being tail recursive. Actually, looking at that, I wonder if perhaps your initial sum function is anywhere near as inefficient as you think it is.
I think your trace made a mistake in the trace at node with value 14, moving to the left sub child instead of inserting on the right sub child. Your resulting tree has
{14:{12:/{15:/}}/}
which I believe is invalid.
Daniel:
foldr it’s efficient only when you can make good use of lazy evaluation, but since both (+) and your pickFirstNeg are strict in their arguments (they need both their arguments to produce the result) you are better off with tail recursion.
Yes, Sanzhiyan is right. In fact, in a naive Haskell implementation, Mark’s sumList uses O(n) stack space!
The reason is that the accumulator is not evaluated until it’s needed. So sumList actually creates a tree of additions which are evaluated at the base case.
I haven’t checked, but GHC should optimise this because sumList is defined on Integers, and GHC knows that Integer addition is strict. For a general Num, however, it can’t know until specialisation time that addition is strict. That’s why the built-in list sum function has extra stuff to ensure strictness.
Pseudonym >
And this issue with strictness is, in fact, the reason why haskell has
Data.List.foldr'andData.List.foldl1': they’re strict (therefore truly tail-recursive) versions offoldlandfoldl1Slightly off topic, but two quick questions: Do you consider pure functional languages to be practical for general purpose programming? I mean this in the sense of being able to elegantly express algorithms (particularly those that are traditionally expressed using mutable state) and training average programmers to program in such languages effectively.
Second, what’s your opinion of a language, whose type system would distinguish between effect free and ‘effected’ expressions (the ‘effect’ bit of a type could be subject to quantification, allowing one to write expressions that are polymorphic with respect to effects. And naturally there would be a subtyping relation allowing coercion from effect-free to effected)? The idea being, that it would allow using the best of both worlds. Are there any such languages?
I love Haskell. But as you’ve nicely demonstrated, it takes practice to learn to reason reliably about memory usage in Haskell 🙂
*Main> sumList [1..1000000]
*** Exception: stack overflow
Flaky:
Actually, that’s three questions, because the first one is a two-parter.
I *do* consider pure languages like Haskell to be practical
for general purpose programming, in the sense of being able to elegantly express algorithms. I don’t worry to much about effectively expressing algorithms based on mutable state, because in most (not all, but most) cases, there is a corresponding version that can be expressed functionally. You’ll see a nice example of that in the next post in the Haskell series, where I’m going to show a red-black tree in Haskell 🙂
Training average programmers is a different question, and a different problem. There are far too many people who are considered professional programmers who have no real education, no real comprehension of what they’re doing. For people like that, learning any new language is a
difficult, even traumatic experience. (One summer, I worked
for an insurance companies IT department. There were 40-odd “professional software developers” working there; two of them had bachelors degrees. The rest were graduates of 3 month “tech schools”; they could write the RPG code necessary for generating the companies reports, and some of them could even write a little bit of COBOL, but that was it. Those guys (and they were pretty much all guys), I don’t
think could learn Haskell if their lives depended on it.
Even the two college guys would have had a tough slog learning Haskell. I think they could have done it, but the way that our undergraduate program was set up, getting out of the imperative state of mind would be tough.
So on a purely technical level, I definitely think that a pure language like Haskell to be practical and useful. The generally low level of education among professional developers makes the social/human factors side of it a much harder question, and I just don’t know. What I do think is that a new startup trying to build applications to compete with the big guys, where the programming is being done by a small group of highly skilled people, would be well-advised to consider non-traditional languages – whether that’s
pure functional like Haskell, a non-pure functional like one of the modern lisps or OCaml, a logic/constraint language like Prolog, etc.
Finally, getting to your last question. Yes, I think that that’s possible, and in fact, I think that it’s name is Haskell. 🙂
What monads do is allow you to capture various kinds of side effects in a typed sub-language. IO based programs in Haskell are, in terms of semantics, generating lists of “IO events”. It happens that those IO events are being interpreted by the runtime to perform actual IO, but that’s hidden behind the monad. You can do similar things with
mutable variables and data structures, etc. The type system captures the kind of effect that you want, and the underlying implementation takes care of making it work. The main problem in Haskell is the difficulty of combining different *kinds* of non-pure effects together, but there is steady progress being made in that area.
Pseudonym, I’m not quite sure that your idea that the sumList creates a tree of additions is correct. I’m pretty new to Haskell, so I may be way off here, but from what I can tell the value
iis evaluated at each call of sumLoop. First, as Sanzhiyan has pointed out,(+)is strict, so theivalue should be evaluated with every(i+x)expression. Second, the Haskell type system should be able to determine from the+operation thatiis a member of the(Num a)class and thus theivalue in(i+x)would need to be evaluated before passing the value into thesumLoopfunction, otherwise a curried function would be passed into thesumLoopfunction and this would not match the expected type.That’s how I see it anyway. Mark, I would love to hear your input on this. As I said, I am just now learning Haskell, and any help in understanding the nuances of the language, such as its use of lazy evaluation, would help out immensely.
Oh, and keep up the good work Mark–love the articles!
If you ask Haskell for the value of a+b it will be forced to evaluate a and b immediately. But if you don’t explicitly call for the value of a+b, a+b will be left as a closure. That’s what ‘strict’ means in this context. The strictness of a function is only relevant when you get to the point of actually evaluating the function. Mark’s code fills the stack with closures. My example sumList [1..1000000] demonstrates this. (Though switching on optimisation would probably cure this.)
Dan P,
Thanks for the explanation, I didn’t even think of using a closure for each of these operations, for some reason I was thinking of the whole topic within the bounds of the language itself and not thinking of how it was implementing everything “under the hood”. Anyway, the explanation really cleared some things up for me. Oh, and by the way, I did recompile the sumList program with optimization turned on and the list cranked up several orders of magnitude and there was no stack overflow problem to be found. Spot on, Dan, thanks.
Sanzhiyan said:
since both (+) and your pickFirstNeg are strict in their arguments (they need both their arguments to produce the result)…
pickFirstNeg most definitely is not strict. Observe:
esau:/tmp$ cat a.hs
pickFirstNeg :: (Num a, Ord a) => a -> a -> a
pickFirstNeg a b = if (a < 0) then a else b
— Leave out base cases so that evaluating it would blow up
someOtherFunc :: Int -> Int
someOtherFunc n = (someOtherFunc (n-1)) + (someOtherFunc (n-2))
esau:/tmp$ hugs
skipping hugs logo
Hugs.Base> :load a.hs
Main> pickFirstNeg (-2) $ someOtherFunc 2
-2
For those following at home, let me mention that your ttInsert is the result of defunctionalizing the continuation-passing-style transformation of a regular, non-tail-recursive tree-insert function. In particular, TailTree is the defunctionalized continuation type.
For ttInsert, why do you need the path list anyway. You could have done something like the following and it would still be tail recursive.
> patchPath node TTEmpty = node
> patchPath node@(TTNode a _ _) (TTNode v left right)
> | v > a = patchPath (TTNode v node right) left
> | otherwise = patchPath (TTNode v left node) right
Isn’t it fascinating that people are convinced that Haskell is just the bestest thing ever, but no two of them can agree on what a simple program means? This is the flip side of my earlier comment that, whereas I agree that Haskell is an interesting language, just about everything that anyone says about it is meaningless or false. Case in point: Haskell is supposedly “pure”, but no one can tell you what that means (because it isn’t true). The situation is worsened by the absence of any sort of definition of Haskell: it is just what ghc happens to be this week. And the coup de grace is laziness, which ruins MCC’s post about tail recursion, which is a perfectly adequate summary of the situation in ML, but not in Haskell! As usual, everything that is said about Haskell is false …. What is it about the language that provokes such confusion?
Anonymous:
(1) I am not convinced that Haskell is “the bestest thing ever”; I think it’s an interesting language that teaches a different way of thinking, and that that’s a good thing.
(2) I have no idea why you think no one can tell you what “pure” means. Pure means that in Haskell, every function call is a true function call: given any specific set of parameters to a function, that function will *always* generate the same result. That’s true about every function call in Haskell – even the monadic operators. What monads accomplish is the ability to capture a state as a parameter,
and to pass that state from call to call in sequence without having to explicitly write it.
(3) Haskell is one of the most well-defined languages I’ve seen. The Haskell language report documents the semantics of the basic language in great, precise detail; the library specifications detail the semantics of the library in great detail. All of the Haskell code that I’ve written runs equally well in Hugs and GHC, which are the two main compilers that I use. I’ve also used nbc and hbi in the past, and had no portability issues with any code. It’s true that most implementations are also used as testbeds for extensions, and that the extensions may not be well documented, but if you use the base language (which you’d do well to stick with for non-experimental code), it’s got better cross-implementation portability than any other language I’ve used. (ML, on the other hand, much as I love it, is a total train-wreck for portability. SML has largely become “Whatever SML-NJ does”, and OCaml is “Whatever the current version of OCaml does”.)
(4) Laziness *is* a difficult concept. It definitely takes effort to really understand how lazy code will end up executing. The fact that *I* screwed up and mis-calculated, thinking that one of my calls was strict when it wasn’t is hardly the fault of the language; and it doesn’t mean that no one can understand what laziness will do. I screw up lots of the time :-). I can remember a spectacular botch that I wrote when I was learning Prolog; but overall, I think Prolog (minus cuts) is one of the clearest languages I’ve seen. I just didn’t *get it* yet. Same with laziness here; I’m learning as I go: I haven’t done that much Haskell programming yet, and I’m still working on fully grasping laziness. My programming experience has been very much on the non-lazy side – C/ C++/ Java/ Scheme/ CommonLisp/ Dylan/ SML/ Ocaml/ Python/… So really grasping laziness is taking some time and effort. Which I consider no biggie – the most valuable languages that I’ve learned have all taken an effort to master. (When I first started ML programming, really grasping the type system took quite a bit of effort; then really getting the *module* system took even more. Well worth the effort: the ML module system stands in my mind as the finest example of what a programming language module system should be. Prolog was a huge effort for me to really grasp what it meant, and to be able to write cut-free code that could work no matter which variables were left unbound; but it changed the way that I program, not just in Prolog, but in other languages.)
If what you say about SML is correct, then it’s a bit sad. I remember that when I first came across it one of the selling points was that it had a formal specification.
One worrying thing about functional languages in general is their instability. What will be in Haskell five years from now? To retain interest, it seems academics must strive to find something the language cannot express easily, and then come up with an extension that “solves” it. When tryingto learn Haskell, I’m sometimes discouraged that there are so many extensions, it feels like I’m aiming at a constantly moving target.
But the discouraging thing is that it’s not only academic languages that are ever-growing. It amazes me that there are still new language features scheduled for C++, for instance. Whatever happened to simplicity and uniformity as a design goal?
I recently put LispMe on my Palm Pilot (an old HandSpring Visor Platinum). LispMe is a free and GPL’ed dialect of Scheme, itself a variant of Lisp. My Lisp exposure is 1977, and for about 5 weeks, and frankly, i never really was comfortable with it. So, for me, the semantic differences between LispMe and Lisp can be ignored.
LispMe claims tail recursion optimization. Despite that, a factorial program like:
(define fact
(lambda (n)
(if (
requires O(n) stack. It has to do the multiply after calling itself. So, the recommendation is:
(define fact2
(lambda (n)
(letrec ((tail-recursive-fact2
(lambda (counter accumulator)
(if (> counter n)
accumulator
(tail-recursive-fact2 (+ counter 1)
(* counter accumulator))))))
(tail-recursive-fact2 1 1))))
which is much more complicated. Still, it’s an idiom that can be used for more complicated situations.
LispMe also has an iterator.
(define factorial
(lambda (x)
(do ((i 1 (+ i 1))
(j 1 (* i j)))
((> i x) j))))
I find this form much easier to grok, and the performance is essentially identical to the more complicated optimized tail recursion version.
OK, i admit it. The C ‘for’ loop version is simpler still. What’s more, since i learned C on a PDP-11 with the Ritchie compiler, i have a good assembly language model for the semantics of the language. Even though there’s no PDP-11 in sight, i can pretend that i’m using one, and know in no uncertain terms what’s going to happen. I’ve never investigated how one would build a Lisp (Scheme, etc.) environment, and am therefore constantly blind sided by unexpected side effects. One might say, well, with modern computers, resources are effectively unlimited – and it just doesn’t matter. But, i’m doing this on a 5 MHz Palm that has 100 KB of free RAM. Tic Tac Toe takes three or four seconds to make it’s move, and i can beat it regularly because it only looks one move ahead. The C language chess program i use on that machine beats the tar out of me even when it’s moves are essentially instant. OK, i suck at chess.
For now, i’m just goofing around with what it can do.
I don’t claim to have written any of the above programs. Consider them public. I did make them run in LispMe.
Stephen:
The initial version of the factorial that you wrote *is not* tail recursive. Tail recursion recursion where the recursive call *is the last thing* that the function does; that factorial function calls itself recursively, *and then* it does a multiply. That’s not tail recursion.
A simpler tail recursive fact in Scheme:
(define (tail-fact n)
(let tail ((num n) (result 1))
(if (= num 0)
result
(tail (- num 1) (* result num)))))
It’s basically the same as what you wrote, but using the
function definition and named-let syntax constructs to
get rid of the lambdas, which spread things out and make it harder to read.
Well, you could look at what Haskell was five and ten years ago, and get an idea of how it’s evolving. As far as rate of change, compare it with the rate of change in Java.
Also notice that not every extension sticks — for instance, the implicit arguments extension (it was too difficult to reason about where the implicit arguments got their values). Even the standards remove features sometimes (usually when a better one comes along).