{"id":643,"date":"2008-05-28T20:21:25","date_gmt":"2008-05-28T20:21:25","guid":{"rendered":"http:\/\/scientopia.org\/blogs\/goodmath\/2008\/05\/28\/the-basic-balanced-search-tree-red-black-trees\/"},"modified":"2023-02-11T12:19:40","modified_gmt":"2023-02-11T17:19:40","slug":"the-basic-balanced-search-tree-red-black-trees","status":"publish","type":"post","link":"http:\/\/www.goodmath.org\/blog\/2008\/05\/28\/the-basic-balanced-search-tree-red-black-trees\/","title":{"rendered":"The Basic Balanced Search Tree: Red-Black trees."},"content":{"rendered":"<p>During my Haskell tutorial, I <a href=\"http:\/\/scientopia.org\/blogs\/goodmath\/2007\/01\/balanced-binary-trees-in-haskell\">used balanced binary search trees as an example<\/a>. I&#8217;ve had people write to me asking me to write about that in a non-functional programming post, because the Haskell one got too tied up in how to do things without assignments, and with tail recursion.<\/p>\n<p>Binary search trees (BST) are another one of the really fundamental simple data structures. They&#8217;re conceptually similar to heaps &#8211; they&#8217;re also a kind of size-ordered binary tree with O(lg n) performance &#8211; but they&#8217;ve got a different set of basic invariants to represent the different performance goals.<\/p>\n<p>The goal of a structure like a BST is to have a variable-size stucture where<br \/>\nyou can cheaply and easily increase or decrease the size of the structure, and where insert, delete, and searching for values are all inexpensive. BSTs are a good choice for implementing things like dictionaries, where you have a key associated with a value, or where you want to be able to search for an arbitrary member quickly.<\/p>\n<p>BSTs are very widely used &#8211; for example, the default ordered collections, sets, and maps in the C++ standard libraries are all implemented as a kind of BST (in fact, the very kind that I&#8217;m going to describe below.)<\/p>\n<p>If you have a meaningful ordering operation, then a BST is a good choice: expanding and contracting the structure happen automatically every time you do an insert or remove; and insert, delete, and search are all bounded by lg(n), where N is the number of values in the tree.<\/p>\n<p><!--more--><\/p>\n<p>The basic structure is really simple. A BST is a tree where each node has a maximum of two children (called <em>left<\/em> and <em>right<\/em>), and for any node N, N &gt; left<sub>N<\/sub>, and N \u2264 right<sub>N<\/sub>.<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" class=\"inset right alignright\" src=\"https:\/\/i0.wp.com\/scientopia.org\/img-archive\/goodmath\/img_311.png?resize=182%2C165\" alt=\"insert-ex.png\" width=\"182\" height=\"165\" \/><\/p>\n<p>For example, the diagram to the right is an example of a binary search tree.<\/p>\n<p>It seems easy, initially, to do something like insert a value into a binary search tree. To sketch it out quickly:<\/p>\n<pre>class BinarySearchTree(object):\n   def __init__(self, value):\n      self._left = None\n      self._right = None\n      self._value = value\n\n   def Insert(self, value):\n      if self._value &gt; value:\n         if self._left is None:\n            self._left = BinarySearchTree(value)\n         else:\n            self._left.Insert(value)\n      else:\n         if self._right is None:\n            self._right = BinarySearchTree(value)\n         else:\n            self_right.Insert(value)\n\n   def Print(self, indent):\n      for i in range(indent):\n         print \"  \",\n         print self._value\n         if self._left is None:\n            for i in range(indent+1):\n               print \"  \",\n              print \"None\"\n         else:\n            self._left.Print(indent+1)\n            if self._right is None:\n               for i in range(indent+1):\n                  print \"  \",\n                  print \"None\"\n            else:\n               self._right.Print(indent+1)\n<\/pre>\n<p>In prose: to insert a value, you look to see if it&#8217;s smaller than the root node. If it is, you insert it into the left subtree, otherwise, you insert it into the right subtree. If the subtree is empty, then you create a new subtree with the new value.<\/p>\n<p>The example above is what we&#8217;d get if we did an insert in the order 5, then 6, then 3, then 2, then 4, then 1.<\/p>\n<p>Unfortunately, it&#8217;s not that simple. What happens if we insert the numbers 1 through six in order? We get this tree:<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/scientopia.org\/img-archive\/goodmath\/img_312.png?resize=222%2C234\" alt=\"insert-in-order.png\" width=\"222\" height=\"234\" \/><\/p>\n<p>That&#8217;s basically just a very inefficient way of storing a list. Instead of being a fast tree structure, where we can do things in O(lg n) time, it&#8217;s a degenerate form, which ends up doing things as if it were a list &#8211; that is, linear O(n) time.<\/p>\n<p>The problem is, the binary search tree only works well if the tree is <em>balanced<\/em> &#8211; that is, if for any node in the tree, the node has roughly the same number of children in its left subtree as it does in its right subtree.<\/p>\n<p>So to get the performance that we&#8217;d want, we need to find a way of keeping the tree balanced. There are a bunch of ways of doing that; the earliest that I know of was something called an AVL tree. For the most part, people have settled on something called a <em>red-black tree<\/em>. A red-black tree is a modified version of a binary tree that keeps the tree in <em>approximate<\/em> balance, and does it in <em>amortized<\/em> O(lg n) time for all operations. (You can&#8217;t beat O(lg n) in a binary search tree, and red-black trees do less work (by a constant factor) maintaining balance than AVL trees.)<\/p>\n<p><em>Approximate balance<\/em> means that the tree is kept in a state where for any node, the size of its left subtree and the size of its right subtree differ by no more than a factor of 2.<\/p>\n<p>Amortized time is a harder notion to explain. Amortized O(lg n) time means that if you do a long series of operations, the average cost per operation will remain O(lg n). Every so often, there will be an operation that is significantly more expensive. But by knowing how much it costs, and how often it happens, you can spread its cost out. The idea is, if you can show that an O(n) operation happens once every N times you invoke insert, then you can spread that cost out, by counting it as one extra step in each regular invocation. Then each invocation has an <em>amortized<\/em> cost of O(lg n + 1), which is the same as O(lg n).<\/p>\n<p>In the red-black tree, we will periodically need to <em>re-balance<\/em> the tree. Rebalancing is expensive. But we can show that the rebalance operation will happen infrequently &#8211; and that it occurs at a rate which can be amortized. So if you observed the performance of 1000 inserts into the tree, you wouldn&#8217;t be able to tell that the performance of any of the inserts was O(lg n) &#8211; the time to do 1,000 inserts would be 1000 times the cost of doing one insert &#8211; and each one would appear to be O(lg n).<\/p>\n<p>A red-black tree is a BST where each node is assigned a color &#8211; either red or black &#8211; and the nodes have the following color-related properties:<\/p>\n<ol>\n<li>The root of the tree is always black.<\/li>\n<li>All branches of the tree end with empty leafs which are counted as black.<\/li>\n<li>All children of red nodes are black.<\/li>\n<li>For all nodes in the tree, all downward paths from the node to a leaf contain the same number of black nodes.<\/li>\n<\/ol>\n<p>What this means is that the longest possible path from the root to a leaf is a path which alternates red and black; the shortest path from root to leaf is a path which is all black. Since they&#8217;ve got the same number of black nodes, that means that in the worst possible case, the longest path is twice the length of the shortest.<\/p>\n<p>To make this work, what we do is insert new nodes as leaves exactly the way that we did in the simple BST written above. New nodes are always red. After the initial insert, we need to <em>re-balance<\/em> the tree, by possibly doing a series of operations called <em>pivots<\/em> to make sure that the red-black tree properties are maintained.<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/scientopia.org\/img-archive\/goodmath\/img_313.png?resize=410%2C215\" alt=\"rotate-right.png\" width=\"410\" height=\"215\" \/><\/p>\n<p>A pivot rebalances the tree, by rotating the tree around a node and one of its children. For example, in the example diagrams above, you can see an example of a right pivot. The original root of the tree being pivoted is the node containing 4. A right pivot moves the left child counter-clockwise, pushing the node containing four up, to become the root of the tree. The former root, 8, becomes the <em>right<\/em> child of 4. The former right child of 4 (5) becomes the left child of 8. Now we&#8217;ve got a valid BST, and the height of the left subtree has been reduced by one, and the right subtree increased by one.<\/p>\n<p>Now, what we need to do is to think about how inserts could cause the tree to become invalid, and how we can use pivots to correct it. To do that, we need to consider a set of up to 4 nodes: the new node, it&#8217;s parent, the sibling of its parent, which we&#8217;ll call it&#8217;s <em>uncle.<\/em>, and the parent of its parent, which we&#8217;ll call its <em>grandparent<\/em>. For example, look at the red-black tree below. In this example, if we look at the node &#8220;8&#8221;, its parent is &#8220;7&#8221;; it&#8217;s grandparent is &#8220;5&#8221;; it&#8217;s uncle is &#8220;4&#8221;.<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/scientopia.org\/img-archive\/goodmath\/img_314.png?resize=332%2C231\" alt=\"red-black-example.png\" width=\"332\" height=\"231\" \/><\/p>\n<p>All possible violations of the tree balance focus on red nodes: newly inserted nodes are always red, and as we rebalance, we&#8217;ll always push the imbalance upwards, with the focus on a red node. If we look at a red node, N, to rebalance:<\/p>\n<ol>\n<li>If the node we&#8217;re re-balancing is the root of the tree (i.e., it has no parent), then we just change its color to black.<\/li>\n<li>If we&#8217;re re-balancing around the red node N, and N&#8217;s parent is black, then everything is fine. The tree remains valid, with no invariants violated.<\/li>\n<li>If the node&#8217;s parent is red, then we have a red node with a red parent, and we need to do something to fix it:\n<ol>\n<li>If the uncle is also red, then we can change the colors of both the parent and the uncle to black, and the color of the grandparent to red. Then we need to re-balance around the newly red grandparent.<\/li>\n<li>If the uncle is black, then we&#8217;ll need to do a pivot.<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<p>If we need to do a pivot, that means that the tree really is out of balance. It&#8217;s not just a matter of book-keeping &#8211; we&#8217;ve got a real path in the tree from root to leaf that&#8217;s more than twice the length of the shortest path.<\/p>\n<p>If the focal node, N, is the left child of its parent, and its parent is the <em>right<\/em> child of its grandparent, then do a right pivot around N and its parent, and continue to balance around the newly red former parent.<\/p>\n<p>If N is a left child, and N&#8217;s parent is the left child of its grandparent, then we don&#8217;t pivot N; we do a right pivot of N&#8217;s parent and grandparent. In this case, we also do some re-coloring: we swap the colors of N&#8217;s parent and grandparent, so that the parent is black and the grandparent is red, and we&#8217;re in balance.<\/p>\n<p>IF N is a right child, then we do the mirror image: if N&#8217;s parent is a left child, then we pivot N and its parent left, and then focus on the former parent. If N&#8217;s parent is also a right child, we pivot N&#8217;s parent and grandparent left, recolor, and we&#8217;re done.<\/p>\n<p>That sounds scary, but it really isn&#8217;t that complicated. Below, there&#8217;s a quick Python implementation of a red-black tree, with the pivot annotated. <em>(There were originally a couple of errors in this code, resulting from me changing the code from explicit member access to accessor methods at the last minute. Thanks to commenter Sasa for catching them.)<\/em><\/p>\n<pre>class RedBlackTree(object):\n   def __init__(self):\n      self._tree = None\n\n   def Insert(self, n):\n      if self._tree == None:\n         self._tree = RedBlackTreeNode(n)\n         self._tree.SetColor(\"Black\")\n      else:\n         self._tree = self._tree.Insert(n)\n\n   def Print(self):\n      if self._tree == None:\n         print \"Empty\"\n      else:\n         self._tree.Print(1)\n\nclass RedBlackTreeNode(object):\n   def __init__(self, value):\n      self._left = None\n      self._right = None\n      self._value = value\n      self.SetColor(\"Red\")\n      self._parent = None\n\n   def GetParent(self):\n      return self._parent\n\n   def SetParent(self, parent):\n      self._parent = parent\n\n   def GetColor(self):\n      return self._color\n\n   def SetColor(self, color):\n      self._color = color\n\n   def GetLeft(self):\n      return self._left\n   def SetLeft(self, left):\n      self._left = left\n   def GetRight(self):\n      return self._right\n   def SetRight(self, right):\n      self._right = right\n   def GetGrandParent(self):\n      if self.GetParent() != None:\n         return self.GetParent().GetParent()\n      else:\n         return None\n\n   def GetUncle(self):\n      grand = self.GetGrandParent()\n      if grand is not None:\n         if grand.GetLeft() == self.GetParent():\n            return grand.GetRight()\n         else:\n            return grand.GetLeft()\n      else:\n         return None\n\n   def Rebalance(self):\n      # WP case 1: tree root\n      if self.GetParent() is None:\n         self.SetColor(\"Black\")\n         return self\n      # WP case 2: The parent of the target node is BLACK,\n      #   so the tree is in fine balance shape; just return the\n      #   tree root\n      if self.GetParent().GetColor() == \"Black\":\n         return self.GetRoot()\n      # From here on, we know that parent is Red.\n      # WP Case 3:  self, parent, and uncle are all red.\n      if self.GetUncle() is not None and self.GetUncle().GetColor() == \"Red\":\n         self.GetUncle().SetColor(\"Black\")\n         self.GetParent().SetColor(\"Black\")\n         self.GetGrandParent().SetColor(\"Red\")\n         return self.GetGrandParent().Rebalance()\n\n      # By now, we know that self and parent are red; and the uncle is black.\n      # We also know that the grandparent is not None, because if it were, the\n      # parent would be root, which must be black. So this means that we\n      # need to do a pivot on the parent\n      return self.PivotAndRebalance()\n\n   def GetRoot(self):\n      if self.GetParent() is None:\n         return self\n      else:\n         return self.GetParent().GetRoot()\n\n   def PivotAndRebalance(self):\n      # First, distinguish between the case where where my parent is\n      # a left child or a right child.\n      if self.GetGrandParent().GetLeft() == self.GetParent():\n         if self.GetParent().GetRight() == self:\n            # WP case 4: I'm the right child of my parent, and my parent is the\n            # left child of my grandparent. Pivot right around me.\n            return self.PivotLeft(False)\n         else:\n            # WP case 5\n            # If I'm the left child, and my parent is the left child, then\n            # pivot right around my parent.\n            return self.GetParent().PivotRight(True)\n      else: # My parent is the right child.\n         if self.GetParent().GetLeft() == self:\n            # WP case 4, reverse.\n            return self.PivotRight(False)\n         else:\n            # WY case 5 reverse\n            return self.GetParent().PivotLeft(True)\n\n      def PivotRight(self, recolor):\n         # Hurrah, I'm going to be the new root of the subtree!\n         left = self.GetLeft()\n         right = self.GetRight()\n         parent = self.GetParent()\n         grand = self.GetGrandParent()\n         # move my right child to be the left of my soon-to-be former parent.\n         parent.SetLeft(right)\n         if right is not None:\n            right.SetParent(parent)\n         # Move up, and make my old parent my right child.\n         self.SetParent(grand)\n         if grand is not None:\n            if  grand.GetRight(parent)  == parent:\n               grand.SetRight(self)\n            else:\n               grand.SetLeft(self)\n               self.SetRight(parent)\n            parent.SetParent(self)\n      if recolor is True:\n         parent.SetColor(\"Red\")\n         self.SetColor(\"Black\")\n         return self.GetRoot()\n      else:\n         # rebalance from the new position of my former parent.\n         return parent.Rebalance()\n\n   def PivotLeft(self, recolor):\n      # Hurrah, I'm going to be the new root of the subtree!\n      left = self.GetLeft()\n      right = self.GetRight()\n      parent = self.GetParent()\n      grand = self.GetGrandParent()\n      # move my left child to be the right of my soon-to-be former parent.\n      parent.SetRight(left)\n      if left is not None:\n         left.SetParent(parent)\n      # Move up,and make my old parent my right child.\n      self.SetParent(grand)\n      if grand is not None:\n         if  grand.GetRight() == parent:\n            grand.SetRight(self)\n         else:\n            grand.SetLeft(self)\n            self.SetLeft(parent)\n            parent.SetParent(self)\n      if recolor is True:\n         parent.SetColor(\"Red\")\n         self.SetColor(\"Black\")\n         return self.GetRoot()\n      else:\n         # rebalance from the position of my former parent.\n         return parent.Rebalance()\n   def Insert(self, value):\n      if self._value &gt; value:\n         if self.GetLeft() is None:\n            self.SetLeft(RedBlackTreeNode(value))\n            self.GetLeft().SetParent(self)\n            return self.GetLeft().Rebalance()\n         else:\n            return self.GetLeft().Insert(value)\n      else:\n         if self.GetRight() is None:\n            self.SetRight(RedBlackTreeNode(value))\n            self.GetRight().SetParent(self)\n            return self.GetRight().Rebalance()\n         else:\n            return self.GetRight().Insert(value)\n\n   def Print(self, indent):\n      for i in range(indent):\n         print \"  \",\n         print \"%s (%s)\" % (self._value, self.GetColor())\n         if self.GetLeft() is None:\n            for i in range(indent+1):\n               print \"  \",\n               print \"None(Black)\"\n         else:\n            self.GetLeft().Print(indent+1)\n            if self.GetRight() is None:\n               for i in range(indent+1):\n                  print \"  \",\n                  print \"None(Black)\"\n            else:\n               self.GetRight().Print(indent+1)\n<\/pre>\n<p>For a quick example of creating and populating, here&#8217;s code to generate<br \/>\nthe red-black tree used an an example above.<\/p>\n<pre>b = RedBlackTree()\nfor i in range(10):\n   b.Insert(i)\n   b.Print()\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>During my Haskell tutorial, I used balanced binary search trees as an example. I&#8217;ve had people write to me asking me to write about that in a non-functional programming post, because the Haskell one got too tied up in how to do things without assignments, and with tail recursion. Binary search trees (BST) are another [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[83],"tags":[110,128,248],"class_list":["post-643","post","type-post","status-publish","format-standard","hentry","category-data-structures","tag-balance","tag-coding","tag-tree"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4lzZS-an","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/643","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/comments?post=643"}],"version-history":[{"count":2,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/643\/revisions"}],"predecessor-version":[{"id":4025,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/643\/revisions\/4025"}],"wp:attachment":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/media?parent=643"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/categories?post=643"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/tags?post=643"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}