During my Haskell tutorial, I used balanced binary search trees as an example. I’ve had people write to me asking me to write about that in a non-functional programming post, because the Haskell one got too tied up in how to do things without assignments, and with tail recursion.
Binary search trees (BST) are another one of the really fundamental simple data structures. They’re conceptually similar to heaps – they’re also a kind of size-ordered binary tree with O(lg n) performance – but they’ve got a different set of basic invariants to represent the different performance goals.
The goal of a structure like a BST is to have a variable-size stucture where
you can cheaply and easily increase or decrease the size of the structure, and where insert, delete, and searching for values are all inexpensive. BSTs are a good choice for implementing things like dictionaries, where you have a key associated with a value, or where you want to be able to search for an arbitrary member quickly.
BSTs are very widely used – for example, the default ordered collections, sets, and maps in the C++ standard libraries are all implemented as a kind of BST (in fact, the very kind that I’m going to describe below.)
If you have a meaningful ordering operation, then a BST is a good choice: expanding and contracting the structure happen automatically every time you do an insert or remove; and insert, delete, and search are all bounded by lg(n), where N is the number of values in the tree.
The basic structure is really simple. A BST is a tree where each node has a maximum of two children (called left and right), and for any node N, N > leftN, and N ≤ rightN.
For example, the diagram to the right is an example of a binary search tree.
It seems easy, initially, to do something like insert a value into a binary search tree. To sketch it out quickly:
class BinarySearchTree(object): def __init__(self, value): self._left = None self._right = None self._value = value def Insert(self, value): if self._value > value: if self._left is None: self._left = BinarySearchTree(value) else: self._left.Insert(value) else: if self._right is None: self._right = BinarySearchTree(value) else: self_right.Insert(value) def Print(self, indent): for i in range(indent): print " ", print self._value if self._left is None: for i in range(indent+1): print " ", print "None" else: self._left.Print(indent+1) if self._right is None: for i in range(indent+1): print " ", print "None" else: self._right.Print(indent+1)
In prose: to insert a value, you look to see if it’s smaller than the root node. If it is, you insert it into the left subtree, otherwise, you insert it into the right subtree. If the subtree is empty, then you create a new subtree with the new value.
The example above is what we’d get if we did an insert in the order 5, then 6, then 3, then 2, then 4, then 1.
Unfortunately, it’s not that simple. What happens if we insert the numbers 1 through six in order? We get this tree:
That’s basically just a very inefficient way of storing a list. Instead of being a fast tree structure, where we can do things in O(lg n) time, it’s a degenerate form, which ends up doing things as if it were a list – that is, linear O(n) time.
The problem is, the binary search tree only works well if the tree is balanced – that is, if for any node in the tree, the node has roughly the same number of children in its left subtree as it does in its right subtree.
So to get the performance that we’d want, we need to find a way of keeping the tree balanced. There are a bunch of ways of doing that; the earliest that I know of was something called an AVL tree. For the most part, people have settled on something called a red-black tree. A red-black tree is a modified version of a binary tree that keeps the tree in approximate balance, and does it in amortized O(lg n) time for all operations. (You can’t beat O(lg n) in a binary search tree, and red-black trees do less work (by a constant factor) maintaining balance than AVL trees.)
Approximate balance means that the tree is kept in a state where for any node, the size of its left subtree and the size of its right subtree differ by no more than a factor of 2.
Amortized time is a harder notion to explain. Amortized O(lg n) time means that if you do a long series of operations, the average cost per operation will remain O(lg n). Every so often, there will be an operation that is significantly more expensive. But by knowing how much it costs, and how often it happens, you can spread its cost out. The idea is, if you can show that an O(n) operation happens once every N times you invoke insert, then you can spread that cost out, by counting it as one extra step in each regular invocation. Then each invocation has an amortized cost of O(lg n + 1), which is the same as O(lg n).
In the red-black tree, we will periodically need to re-balance the tree. Rebalancing is expensive. But we can show that the rebalance operation will happen infrequently – and that it occurs at a rate which can be amortized. So if you observed the performance of 1000 inserts into the tree, you wouldn’t be able to tell that the performance of any of the inserts was O(lg n) – the time to do 1,000 inserts would be 1000 times the cost of doing one insert – and each one would appear to be O(lg n).
A red-black tree is a BST where each node is assigned a color – either red or black – and the nodes have the following color-related properties:
- The root of the tree is always black.
- All branches of the tree end with empty leafs which are counted as black.
- All children of red nodes are black.
- For all nodes in the tree, all downward paths from the node to a leaf contain the same number of black nodes.
What this means is that the longest possible path from the root to a leaf is a path which alternates red and black; the shortest path from root to leaf is a path which is all black. Since they’ve got the same number of black nodes, that means that in the worst possible case, the longest path is twice the length of the shortest.
To make this work, what we do is insert new nodes as leaves exactly the way that we did in the simple BST written above. New nodes are always red. After the initial insert, we need to re-balance the tree, by possibly doing a series of operations called pivots to make sure that the red-black tree properties are maintained.
A pivot rebalances the tree, by rotating the tree around a node and one of its children. For example, in the example diagrams above, you can see an example of a right pivot. The original root of the tree being pivoted is the node containing 4. A right pivot moves the left child counter-clockwise, pushing the node containing four up, to become the root of the tree. The former root, 8, becomes the right child of 4. The former right child of 4 (5) becomes the left child of 8. Now we’ve got a valid BST, and the height of the left subtree has been reduced by one, and the right subtree increased by one.
Now, what we need to do is to think about how inserts could cause the tree to become invalid, and how we can use pivots to correct it. To do that, we need to consider a set of up to 4 nodes: the new node, it’s parent, the sibling of its parent, which we’ll call it’s uncle., and the parent of its parent, which we’ll call its grandparent. For example, look at the red-black tree below. In this example, if we look at the node “8”, its parent is “7”; it’s grandparent is “5”; it’s uncle is “4”.
All possible violations of the tree balance focus on red nodes: newly inserted nodes are always red, and as we rebalance, we’ll always push the imbalance upwards, with the focus on a red node. If we look at a red node, N, to rebalance:
- If the node we’re re-balancing is the root of the tree (i.e., it has no parent), then we just change its color to black.
- If we’re re-balancing around the red node N, and N’s parent is black, then everything is fine. The tree remains valid, with no invariants violated.
- If the node’s parent is red, then we have a red node with a red parent, and we need to do something to fix it:
- If the uncle is also red, then we can change the colors of both the parent and the uncle to black, and the color of the grandparent to red. Then we need to re-balance around the newly red grandparent.
- If the uncle is black, then we’ll need to do a pivot.
If we need to do a pivot, that means that the tree really is out of balance. It’s not just a matter of book-keeping – we’ve got a real path in the tree from root to leaf that’s more than twice the length of the shortest path.
If the focal node, N, is the left child of its parent, and its parent is the right child of its grandparent, then do a right pivot around N and its parent, and continue to balance around the newly red former parent.
If N is a left child, and N’s parent is the left child of its grandparent, then we don’t pivot N; we do a right pivot of N’s parent and grandparent. In this case, we also do some re-coloring: we swap the colors of N’s parent and grandparent, so that the parent is black and the grandparent is red, and we’re in balance.
IF N is a right child, then we do the mirror image: if N’s parent is a left child, then we pivot N and its parent left, and then focus on the former parent. If N’s parent is also a right child, we pivot N’s parent and grandparent left, recolor, and we’re done.
That sounds scary, but it really isn’t that complicated. Below, there’s a quick Python implementation of a red-black tree, with the pivot annotated. (There were originally a couple of errors in this code, resulting from me changing the code from explicit member access to accessor methods at the last minute. Thanks to commenter Sasa for catching them.)
class RedBlackTree(object): def __init__(self): self._tree = None def Insert(self, n): if self._tree == None: self._tree = RedBlackTreeNode(n) self._tree.SetColor("Black") else: self._tree = self._tree.Insert(n) def Print(self): if self._tree == None: print "Empty" else: self._tree.Print(1) class RedBlackTreeNode(object): def __init__(self, value): self._left = None self._right = None self._value = value self.SetColor("Red") self._parent = None def GetParent(self): return self._parent def SetParent(self, parent): self._parent = parent def GetColor(self): return self._color def SetColor(self, color): self._color = color def GetLeft(self): return self._left def SetLeft(self, left): self._left = left def GetRight(self): return self._right def SetRight(self, right): self._right = right def GetGrandParent(self): if self.GetParent() != None: return self.GetParent().GetParent() else: return None def GetUncle(self): grand = self.GetGrandParent() if grand is not None: if grand.GetLeft() == self.GetParent(): return grand.GetRight() else: return grand.GetLeft() else: return None def Rebalance(self): # WP case 1: tree root if self.GetParent() is None: self.SetColor("Black") return self # WP case 2: The parent of the target node is BLACK, # so the tree is in fine balance shape; just return the # tree root if self.GetParent().GetColor() == "Black": return self.GetRoot() # From here on, we know that parent is Red. # WP Case 3: self, parent, and uncle are all red. if self.GetUncle() is not None and self.GetUncle().GetColor() == "Red": self.GetUncle().SetColor("Black") self.GetParent().SetColor("Black") self.GetGrandParent().SetColor("Red") return self.GetGrandParent().Rebalance() # By now, we know that self and parent are red; and the uncle is black. # We also know that the grandparent is not None, because if it were, the # parent would be root, which must be black. So this means that we # need to do a pivot on the parent return self.PivotAndRebalance() def GetRoot(self): if self.GetParent() is None: return self else: return self.GetParent().GetRoot() def PivotAndRebalance(self): # First, distinguish between the case where where my parent is # a left child or a right child. if self.GetGrandParent().GetLeft() == self.GetParent(): if self.GetParent().GetRight() == self: # WP case 4: I'm the right child of my parent, and my parent is the # left child of my grandparent. Pivot right around me. return self.PivotLeft(False) else: # WP case 5 # If I'm the left child, and my parent is the left child, then # pivot right around my parent. return self.GetParent().PivotRight(True) else: # My parent is the right child. if self.GetParent().GetLeft() == self: # WP case 4, reverse. return self.PivotRight(False) else: # WY case 5 reverse return self.GetParent().PivotLeft(True) def PivotRight(self, recolor): # Hurrah, I'm going to be the new root of the subtree! left = self.GetLeft() right = self.GetRight() parent = self.GetParent() grand = self.GetGrandParent() # move my right child to be the left of my soon-to-be former parent. parent.SetLeft(right) if right is not None: right.SetParent(parent) # Move up, and make my old parent my right child. self.SetParent(grand) if grand is not None: if grand.GetRight(parent) == parent: grand.SetRight(self) else: grand.SetLeft(self) self.SetRight(parent) parent.SetParent(self) if recolor is True: parent.SetColor("Red") self.SetColor("Black") return self.GetRoot() else: # rebalance from the new position of my former parent. return parent.Rebalance() def PivotLeft(self, recolor): # Hurrah, I'm going to be the new root of the subtree! left = self.GetLeft() right = self.GetRight() parent = self.GetParent() grand = self.GetGrandParent() # move my left child to be the right of my soon-to-be former parent. parent.SetRight(left) if left is not None: left.SetParent(parent) # Move up,and make my old parent my right child. self.SetParent(grand) if grand is not None: if grand.GetRight() == parent: grand.SetRight(self) else: grand.SetLeft(self) self.SetLeft(parent) parent.SetParent(self) if recolor is True: parent.SetColor("Red") self.SetColor("Black") return self.GetRoot() else: # rebalance from the position of my former parent. return parent.Rebalance() def Insert(self, value): if self._value > value: if self.GetLeft() is None: self.SetLeft(RedBlackTreeNode(value)) self.GetLeft().SetParent(self) return self.GetLeft().Rebalance() else: return self.GetLeft().Insert(value) else: if self.GetRight() is None: self.SetRight(RedBlackTreeNode(value)) self.GetRight().SetParent(self) return self.GetRight().Rebalance() else: return self.GetRight().Insert(value) def Print(self, indent): for i in range(indent): print " ", print "%s (%s)" % (self._value, self.GetColor()) if self.GetLeft() is None: for i in range(indent+1): print " ", print "None(Black)" else: self.GetLeft().Print(indent+1) if self.GetRight() is None: for i in range(indent+1): print " ", print "None(Black)" else: self.GetRight().Print(indent+1)
For a quick example of creating and populating, here’s code to generate
the red-black tree used an an example above.
b = RedBlackTree() for i in range(10): b.Insert(i) b.Print()