{"id":634,"date":"2008-04-29T10:49:49","date_gmt":"2008-04-29T10:49:49","guid":{"rendered":"http:\/\/scientopia.org\/blogs\/goodmath\/2008\/04\/29\/implementing-compact-binary-heaps\/"},"modified":"2008-04-29T10:49:49","modified_gmt":"2008-04-29T10:49:49","slug":"implementing-compact-binary-heaps","status":"publish","type":"post","link":"http:\/\/www.goodmath.org\/blog\/2008\/04\/29\/implementing-compact-binary-heaps\/","title":{"rendered":"Implementing Compact Binary Heaps"},"content":{"rendered":"<p> Last post, I described the basic idea of the binary heap data<br \/>\nstructure. But I didn&#8217;t talk about how to implement it &#8211; and there&#8217;s<br \/>\na very cool way of implementing it &#8211; optimally space efficient,<br \/>\nvery fast, and with respectable cache locality for moderate sized<br \/>\ndatasets.<\/p>\n<p><!--more--><\/p>\n<p> The idea is to use a compact, array-based implementation of a binary tree. If we put the nodes in a graph into an array in the order in which they would by traversed by a breadth-first traversal of the tree, then you&#8217;ll get a<br \/>\nrepresentation of a graph in which the children of the node at position N in the tree will be located at position 2N+1 and 2N+2; and the<br \/>\nparent of the node at position M will be located at either (N-1)\/2 (if N is odd) or (N-2)\/2 (if N is even). For example, look at<br \/>\nthe example heap below; this is the same example heap from yesterday&#8217;s post, and beneath it is the array-representation of the heap. I&#8217;ve added a couple of children arrows, to let you see how it&#8217;s laid out.<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" alt=\"heap-array-example.png\" src=\"https:\/\/i0.wp.com\/scientopia.org\/img-archive\/goodmath\/img_306.png?resize=362%2C361\" width=\"362\" height=\"361\" class=\"inset\" \/><\/p>\n<p> Remember how yesterday I alluded to heapsort? It&#8217;s amazingly easy to implement it using this form as an in-place sort &#8211; that is, a sort that<br \/>\ncan sort a list in the original set of memory locations that it occupies. Heapsort needs only enough extra storage to store one integer (the current size of the heap), and two temporary copies of a single value from the<br \/>\nlist (for implementing swaps).<\/p>\n<p> The basic idea is, you&#8217;ve got a heap of N elements, you know that the<br \/>\nlargest value is the one in position 0. So you remove that, and do the<br \/>\nbubbling to reinstate the heap properties. Now you&#8217;ve got a temporary copy of<br \/>\nwhat used to be the largest value in the heap; and you&#8217;ve got a heap of size<br \/>\nN-1. So you can put the largest value &#8211; the one you just removed &#8211; into<br \/>\nposition N, because that&#8217;s no longer part of the heap. Each time you remove<br \/>\nthe largest value, the heap shrinks by one, and you get the correct position<br \/>\ninto which you should insert that removed value. When the heap size reaches 0,<br \/>\nyour list is now sorted.<\/p>\n<p> Implementing a heap using the array-based storage is also really easy.<br \/>\nBelow is a simple implementation of an array-based heap in Python. (This is<br \/>\n<em>not<\/em> an optimal implementation; it&#8217;s written to be easy to read. There&#8217;s a bunch of places where its performance could be improved, or where<br \/>\nit could be written more compactly.)<\/p>\n<pre>\n<b>class<\/b> Heap(object):\n<b>def<\/b> __init__(self):\nself.list = []\n<b>def<\/b> Insert(self, val):\n<em># to insert, append the new value to the end,\n# and then bubble it up to its correct position.<\/em>\nl = len(self.list)\nself.list.append(val)\nself.BubbleUp(l)\n<b>def<\/b> BubbleUp(self, pos):\nif pos == 0:\nreturn\nparent = self._GetParent(pos)\nif self.list[pos] &gt; self.list[parent]:\nself._SwapEntries(pos, parent)\nself.BubbleUp(parent)\n<b>def<\/b> _GetParent(self, pos):\nif pos == 1 or pos == 2:\nreturn 0\n<em># if pos is an odd number, then it's a left child;\n# if it's even, then it's a right child.<\/em>\nif pos % 2 == 0:\nreturn (pos-2)\/2\nelse:\nreturn (pos-1)\/2\n<b>def<\/b> RemoveLargest(self):\n<em># to remove largest, store the largest, insert the\n# last element into its position, and then bubble it\n# down.<\/em>\nresult = self.list[0]\nself.list[0] = self.list.pop()\nself.BubbleDown(0)\nreturn result\n<b>def<\/b> BubbleDown(self, pos):\nchildren = self._GetChildrenOf(pos)\n<em># Three cases: leaf, one child, two children<\/em>\nif children == []:\nreturn\nelif len(children) == 1:\nif self.list[children[0]] &gt; self.list[pos]:\nself._SwapEntries(chldren[0], pos)\nself.bubbleDown(children[0])\nelse: # len(children) == 2\nleft = self.list[children[0]]\nright = self.list[children[1]]\nroot = self.list[pos]\nif left &gt; root or right &gt; root:\nif left &gt; right:\nself._SwapEntries(children[0], pos)\nself.BubbleDown(children[0])\nelse:\nself._SwapEntries(children[1], pos)\nself.BubbleDown(children[1])\n<b>def<\/b> _SwapEntries(self, p1, p2):\ntmp = self.list[p1]\nself.list[p1] = self.list[p2]\nself.list[p2] = tmp\n<b>def<\/b> _GetChildrenOf(self, pos):\nl = len(self.list)\nif l &lt;= (2*pos+1):\nreturn []\nelif l == (2*pos+2):\nreturn [2*pos+1]\nelse:\nreturn [2*pos+1, 2*pos+2]\n<b>def<\/b> _PrettyPrint(self, position, indent):\nchildren = self._GetChildrenOf(position)\nfor i in range(indent):\nprint \"  \",\nprint self.list[position]\nfor c in children:\nself._PrettyPrint(c, indent+1)\n<b>def<\/b> PrettyPrint(self):\nself._PrettyPrint(0, 0)\n<b>def<\/b> HeapUseExample():\nh = Heap()\nfor i in range(20):\nh.Insert(i)\nprint \"===============================\"\nh.PrettyPrint()\nprint \"===============================\"\nh.RemoveLargest()\nprint \"===============================\"\nh.PrettyPrint()\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Last post, I described the basic idea of the binary heap data structure. But I didn&#8217;t talk about how to implement it &#8211; and there&#8217;s a very cool way of implementing it &#8211; optimally space efficient, very fast, and with respectable cache locality for moderate sized datasets.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[83],"tags":[],"class_list":["post-634","post","type-post","status-publish","format-standard","hentry","category-data-structures"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4lzZS-ae","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/634","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/comments?post=634"}],"version-history":[{"count":0,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/634\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/media?parent=634"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/categories?post=634"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/tags?post=634"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}