
 For ages, I’ve been promising to write about finger trees. Finger trees are
an incredibly elegant and simple structure for implementing sequence-based
data structures. They’re primarily used in functional languages, but there’s nothing
stopping an imperative-language programmer from using them as well.
 In functional programming languages, lists are incredibly common. They show up everywhere; they’re just so easy to use for recursive iteration that they’re ubiquitous. I can’t think of
any non-trivial program that I’ve written in Haskell, Lisp, or OCaml that doesn’t use lists. (Heck, I’ve wound up using cons-cell libraries a couple of times in C++.)
 Of course, lists aren’t perfect. In fact, for some things, they absolutely suck. Suppose I’ve got a really long list – like a list of names and phone numbers from a telephone book. My local phone book has 600 pages of names; and the pages are incredibly dense – there’ve got to be at least a couple of hundred names per page. So in a phone book for a suburban region of New York City, the list of phone numbers will have 120,000 names. Now imagine that I want to look
up one name in that list – on average, I’m going to have to chase through 60,000 pointers. 60,000 isn’t a huge number for a computer, but still, chasing 60,000 pointers is a pretty big deal. If I stored them in an array, it would be a lot less convenient for some tasks, but I could use
binary search to find names, and I could find any name in no more than 16 comparisons – and the whole shebang would be contiguous, so I wouldn’t even be chasing pointers.
 What finger trees do is give me a way of representing a list that has both the convenience
of the traditional cons list, and the search efficiency of the array based method.
 The basic idea of the finger tree is amazingly simple. It’s a balanced tree where you store
all of the data in the leaves. The internal nodes are just a structure on which you can hang
annotations, which you can use for optimizing search operations on the tree. What makes
the finger tree so elegant is the way that some very smart people have generalized the idea of
annotations to make finger trees into a single, easily customizable structure that’s useful for so
many different purposes: you customize the annotations that you’re going to store in the internal nodes according to the main use of your tree. 
 Let’s look at a simple example – a sorted list, like the phone book. Below is a two-three tree
with values in the leafs. The list is the sequence of leafs from left to right.

 In a finger tree, we label the internal nodes of the tree with a count: the number of
values stored in the subtree rooted at an internal node is stored in the node – like in the image below. So, now, to look up
the seventh element of the list, we look at the left child of the root: it’s got five children. So
we know that the 7th element is in the right subtree. Then we search the right subtree: the left
child has 2 values – so the leaf we want is going to be in the left subtree: since that node has
leaves as children, it’s the second child of the leaf.

By doing that labeling, we’ve got a list where an array-like index operator takes O(lg n) steps. It’s also O(lg n) for arbitrary inserts. And the time to maintain the counts doesn’t change the algorithmic complexity, and it’s extremely fast in practice.
 But there’s more to like about it than that. It’s downright easy to keep every version of the list that ever occurred in the history of the program. If you think about it, each operation to modify the list of values has a bubble-up effect, changing the internal nodes of the tree above it. Those internal nodes are very small and simple: if you copy them instead of mutate them, then you get very lightweight persistence. A great example of how this works is: if you use a finger tree to represent the text buffer of an editor, you don’t need to implement undo: you just keep the old versions of the text buffer! You can do that by just copying a small number of internal nodes as things change; the text itself, you never change – so the total memory for text is the size of the original buffer plus whatever you inserted. In terms of memory use, the
memory performance of a finger-tree based persistent history is roughly the same as the amount of memory needed to create an undo history. (Depending on your exact implementation, and the particular usage pattern of the editor, either one can come out ahead.)
But we still haven’t even gotten to the most beautiful thing about finger trees!
 The idea of a finger tree is that the internal nodes store some kind of data that makes it
possible for you to find a particular desired value very quickly. In the example we looked at, the
thing we wanted to be able to quickly was array style indexing. But finger trees are good for much
more than that. Another canonical example is a priority queue. In a priority queue, you’ve got a
collection of tasks that need to be performed. Each task has a priority, representing how
important it is. Whenever a processor becomes available, the highest priority task should be
executed. Priorities are generally handled in reverse order: priority 1 is the highest, 2 is lower
priority than 1; 3 is lower than 1 and 2, and so on. (It’s done that way so that it’s always easy
to add something at a lower priority.) The way priorities are handled, the highest priority always
goes first; you should never start a priority 2 when there’s a priority 1 in the queue; but if
you’ve got two priority 2 tasks, they should be executed in the order in which they were
inserted.
 We can create a priority queue as a list of tasks, using a finger-tree based list, just like we did for the array indexing. When we add a new task to the queue, we’ll always add it at the
end of the list. What we want to do is to try to pick the leftmost high-priority task in the
list.
 How can we annotate the tree to make it easy to quickly find the leftmost highest priority
task? We just label each internal node with the highest priority containing in any of its
children. Then we search the tree, looking for the highest-value child at each level; if there’s a
tie, we go to the leftmost. Presto: O(lg n) to find and remove the highest priority task; O(lg n)
to insert a new task. 
 Now, the real question is: what do these two have in common? That is, if we wanted to
implement a generic finger tree, what do the annotations have in common?
 The annotations are monoids over some kind of number. If you remember abstract
algebra, a monoid is a set of values with an associative operation and an identity element.
As long as the annotations are a monoid, you can make a finger tree based on the annotations.
(Actually, you don’t even strictly need them to be monoids over numbers; you need to be able
to do an ordered comparison between the annotation values – so what you really need is
just a monoid over ordered values.)
 In the list of integers, the set of annotation values are natural numbers, the operation
is addition, and the identity is (obviously) 0. In the priority queue, the set of annotations are also natural numbers, but the operation is min; identity is the maximum priority value – which will be the maximum size of the integer type used for priority.
 If you parameterize your finger-tree implementation on the monoid, you can use
exactly the same finger-tree code for both the indexable list and the priority queue. How
can that be? This is where things get subtle: we need to dive in a bit more and see just what the
fact that the annotation is a monoid really means.
 As I said above, a monoid is a set of values with a single, associative operation. (For
this discussion, I’m going to use ♦ for the monoid operation.)  What associativity
means is that given any sequence of values (v1, …, vn),
if you apply ♦ to the set, any way of parenthesizing it will produce the
same result: you can group the list any way you want.
 v1 ♦ v2 ♦ v3 ♦ v4 ♦ v5 ♦ v6 ♦ v7 ♦ v8 ♦ v9 =
((v1 ♦ v2) ♦ (v3 ♦ v4)) ♦ ((v5 ♦ v6) ♦ ((v7 ♦ v8) ♦ v9)) =
((((((((v1 ♦ v2) ♦ v3) ♦ v4) ♦ v5) ♦ v6) ♦ v7) ♦ v8) ♦ v9)
 What that means in terms our finger tree is that the value at the root of any
subtree is a measure of its subtree – and that the grouping of the children
of that subtree doesn’t matter. You can structure a tree any way that you like: so long as you preserve the order of the leaves, it doesn’t affect the annotation value of its root; and the annotation value of the root of any subtree tells you a crucial bit of information
about what’s in the subtree.
And there’s the key: what does it tell you?
 When we go to search a finger tree for some value, we’re searching for a
value with a particular measure. What the monoid does is allow us to describe
that measure not just in terms of the measure of individual nodes, but
in terms of sets of nodes. And the tree gives us a natural way of
dividing the list into a set of candidates, and not-candidates. Put the two
together, and you’ve got a structure that is custom designed for enabling binary
search (or, rather, tree based search; if you use a 2/3 tree as the basis of
your finger tree, you’ve got a 2/3 tree search; if you use a balanced binary
tree, then you’ve got binary search.)
 Let’s look at a couple of examples. First, let’s pull out our array tree.
Look back at the example tree towards the top of the post. We’re going to look
for the seventh element of the list. We start at the root: the measure of the
root is
((1♦1♦1)♦(1♦1))♦((1♦1)♦(1♦1)) –
or (1♦1♦1♦1♦1♦1♦1♦1♦1) – which
is 9. 
 This tells us that the entire tree has 9 leafs. Doesn’t matter how they’re
arranged: every tree with 9 leaves will produce the same measure.
 We want to find the 7th value. But we don’t know where it is. So we take the
sequence of values represented by the tree, and split it, and then use the
monoid measure of the two subtrees to compute the measures for the sets
of values contained by their leafs. We split according to the tree: we could
take either the left half, or the right half. To figure out which, we look at
the measures. The combined measure of the nodes on the left is 5; the combined
measure on the right is 4. So we know that our target is on the right, and we’re
looking for element 2 on the right. And again, we want to split the
remaining set of values, and only search the half that might contain
our target. We again look at subtrees, and find that our
target is on the left. We then have a subtree whose measure is 1♦1=2, so
the target is on the right. And we’ve narrowed it to one node.
If you look at that process – it’s precisely a binary search of the list.
 Now, let’s try looking at a priority queue. Here’s a binary tree priority
queue, which contains a set of tasks A (pri 3), B (pri 9), C(pri 4), D(pri 3),
E(pri 1), F(pri 3), and G (pri 4). As a finger tree, that would look like the
following:

 The monoid set for the priority queue is the set of possible priority
values; the operation is natural-number minimum. So for this queue, it’s
3♦9♦4♦3♦1♦3♦4 – or min(3, 9, 4, 3, 1, 3, 4) =
1.
 Now, we want to to split the queue into two subqueues, and only search the
subqueue that contains the highest priority task. We do that using the tree:
we look at the measure of the left subtree (which is annotated with the measure of the left subset), and the right subtree (which is annotated with the measure of the right subset). The left subset has measure 1, the right has measure 4. So the highest priority task is in the left subset (or subtree).
 Again, we want to divide the set of tasks into two subsets, and only search
the subset that contains the highest priority task. The measure of the left
subtree-subset is 3; the right is 1. So it’s in the right subtree. And so on,
until we wind up at E.

That’s the most useful thing I’ve seen posted on the ‘net this year. I think I’ve been been writing badly designed versions of this structure for years!
The monoid use is important, but you forgot the part which leads to the name finger tree.
A finger tree does not have just the structure of an ordinary balanced tree. In addition, small parts of the ends of the tree are kept at the top, not at the bottom, as “fingers”, so that you can reach them fast, in O(1) (amortized) time, not just O(log n). This means that you can use them as you would use a cons list from either end, with the same algorithmical complexity.
That seems like a lot of pointer chasing considering memory speed will absolutely dominate. How could you make this data structure cache friendly?
Wow. I’ve researched finger trees before, but never could figure out what was so special about them. Then, this paragraph,
“””What that means in terms our finger tree is that the value at the root of any subtree is a measure of its subtree – and that the grouping of the children of that subtree doesn’t matter.”””
was what made everything finally click for me. Maybe I’m dense, but this illustration has allowed me to finally understand where a finger tree can be profitably be used.
Thank you!
Ørjan Johanse is right. You described a monoid-annotated-binary-tree, which is not enough to be a finger tree.
See section 3.1 of the the pdf paper linked from http://www.soi.city.ac.uk/~ross/papers/FingerTree.html
This non-binary-tree structure is what makes them different from binary trees and gives them amortized O(1) efficient double ended queues. This also gives them efficient monoid append operation at the level of finger trees.
Trees are cache friendly, since nodes near the root, and nodes in hot sub-trees will be cached. But there are a couple of tricks that will help:
Nodes should not be smaller than an efficient size of a memory read (e.g. a cache-line). If they are, increase the splay of the nodes and reduce the height of the tree.
Be selective in the data you include in the internal nodes. Putting non-pointer data in the internal nodes will either make them bigger or decrease the number of children they can have.
Re #3:
The trees are binary trees – there are a wide variety of ways of building binary trees that are cache friendly. I drew them as pointer-based trees, because it’s easier to see the structure that way, but they could be arrays of tightly packed nodes. The leafs themselves could (and in fact, really should) be stored in a traditional cons-list, with the usual optimization tricks to pack them tightly together in memory.
These really aren’t finger trees. The direct access to leaves is missing, and that changes the datastructure’s performance characteristics significantly. It would be appropriate for you to change the title of the article – what you’re describing is practical and useful, but you’ve presented it under the wrong name.
Incidentally, also, please run your article through a grammar checker. There are all sorts of errors throughout this. Several times, for example, you say “leafs”. It makes it very difficult to take you seriously, when you have mistakes both in the technical and in the linguistic.
You really should read the paper that a prior commentor linked to.
I still don’t get why the annotations have to be in a monoid. Why is a semigroup not enough? We don’t seem to use the identity element anywhere, and for example natural numbers with min does not have one.
@mattiast:
The base case, an empty tree, should yield some value, for that you need the unit of the monoid.
If you constrain yourself to non-empty trees then you can use an arbitrary semigroup.
However, using a monoid provides the benefit that since you now have an empty tree, left-to-right concatenation of your your finger trees (or in this case, just monoid-annotated binary trees) also forms a monoid! One that you can usually reslice cheaply.
You can of course say that you return a ‘Maybe a’ instead of an ‘a’ as the result of reducing, returning Nothing iff the tree is empty, then that is the same as having artificially added a unit to a semigroup, which is the very function that the Maybe monoid serves in Haskell!
@#8:
http://books.google.com/books?client=firefox-a&q=%22leafs%22+computer+science&um=1&lr=&sa=N&start=10
Dude, there’s absolutely nothing wrong with “leafs”. English’s irregular plurals are all kinds of crazy. I am totally in favor of moving them back to the normal plural structure. I do this consciously for many words. English is as English is spoken, after all, and if we can spread the usages we want, the language itself changes after us.
Oh, and Mark: great article!
Is it too off-topic to ask how you made the graphs?
@Xanthir:
“Leafs” is only correct to people who say “dude” a lot. The plural of leaf is leaves. Educated people don’t use “leafs” in spoken English, either.
I agree this is a very nice article, but it’s not about finger trees.
@#14, check the book links above, all highly educated.
Actually, almost all of the books look to feature non-native writers.
Your last figure isn’t a balanced tree. But I think you implied that finger trees are balanced?
Bob Foster said, “Educated people don’t use ‘leafs’ in spoken English”
Ah, in Canada, everyone says “Toronto Maple Leafs”, no one says “Toronto Maple Leaves”. Perhaps no rule in English is absolute.
BTW, thank you Mark, for the article on immutable trees.