Last post, I described the basic idea of the binary heap data
structure. But I didn’t talk about how to implement it – and there’s
a very cool way of implementing it – optimally space efficient,
very fast, and with respectable cache locality for moderate sized
The idea is to use a compact, array-based implementation of a binary tree. If we put the nodes in a graph into an array in the order in which they would by traversed by a breadth-first traversal of the tree, then you’ll get a
representation of a graph in which the children of the node at position N in the tree will be located at position 2N+1 and 2N+2; and the
parent of the node at position M will be located at either (N-1)/2 (if N is odd) or (N-2)/2 (if N is even). For example, look at
the example heap below; this is the same example heap from yesterday’s post, and beneath it is the array-representation of the heap. I’ve added a couple of children arrows, to let you see how it’s laid out.
Remember how yesterday I alluded to heapsort? It’s amazingly easy to implement it using this form as an in-place sort – that is, a sort that
can sort a list in the original set of memory locations that it occupies. Heapsort needs only enough extra storage to store one integer (the current size of the heap), and two temporary copies of a single value from the
list (for implementing swaps).
The basic idea is, you’ve got a heap of N elements, you know that the
largest value is the one in position 0. So you remove that, and do the
bubbling to reinstate the heap properties. Now you’ve got a temporary copy of
what used to be the largest value in the heap; and you’ve got a heap of size
N-1. So you can put the largest value – the one you just removed – into
position N, because that’s no longer part of the heap. Each time you remove
the largest value, the heap shrinks by one, and you get the correct position
into which you should insert that removed value. When the heap size reaches 0,
your list is now sorted.
Implementing a heap using the array-based storage is also really easy.
Below is a simple implementation of an array-based heap in Python. (This is
not an optimal implementation; it’s written to be easy to read. There’s a bunch of places where its performance could be improved, or where
it could be written more compactly.)
class Heap(object): def __init__(self): self.list =  def Insert(self, val): # to insert, append the new value to the end, # and then bubble it up to its correct position. l = len(self.list) self.list.append(val) self.BubbleUp(l) def BubbleUp(self, pos): if pos == 0: return parent = self._GetParent(pos) if self.list[pos] > self.list[parent]: self._SwapEntries(pos, parent) self.BubbleUp(parent) def _GetParent(self, pos): if pos == 1 or pos == 2: return 0 # if pos is an odd number, then it's a left child; # if it's even, then it's a right child. if pos % 2 == 0: return (pos-2)/2 else: return (pos-1)/2 def RemoveLargest(self): # to remove largest, store the largest, insert the # last element into its position, and then bubble it # down. result = self.list self.list = self.list.pop() self.BubbleDown(0) return result def BubbleDown(self, pos): children = self._GetChildrenOf(pos) # Three cases: leaf, one child, two children if children == : return elif len(children) == 1: if self.list[children] > self.list[pos]: self._SwapEntries(chldren, pos) self.bubbleDown(children) else: # len(children) == 2 left = self.list[children] right = self.list[children] root = self.list[pos] if left > root or right > root: if left > right: self._SwapEntries(children, pos) self.BubbleDown(children) else: self._SwapEntries(children, pos) self.BubbleDown(children) def _SwapEntries(self, p1, p2): tmp = self.list[p1] self.list[p1] = self.list[p2] self.list[p2] = tmp def _GetChildrenOf(self, pos): l = len(self.list) if l <= (2*pos+1): return  elif l == (2*pos+2): return [2*pos+1] else: return [2*pos+1, 2*pos+2] def _PrettyPrint(self, position, indent): children = self._GetChildrenOf(position) for i in range(indent): print " ", print self.list[position] for c in children: self._PrettyPrint(c, indent+1) def PrettyPrint(self): self._PrettyPrint(0, 0) def HeapUseExample(): h = Heap() for i in range(20): h.Insert(i) print "===============================" h.PrettyPrint() print "===============================" h.RemoveLargest() print "===============================" h.PrettyPrint()