{"id":759,"date":"2009-03-27T09:55:13","date_gmt":"2009-03-27T09:55:13","guid":{"rendered":"http:\/\/scientopia.org\/blogs\/goodmath\/2009\/03\/27\/two-three-trees-a-different-approach-to-balance\/"},"modified":"2009-03-27T09:55:13","modified_gmt":"2009-03-27T09:55:13","slug":"two-three-trees-a-different-approach-to-balance","status":"publish","type":"post","link":"http:\/\/www.goodmath.org\/blog\/2009\/03\/27\/two-three-trees-a-different-approach-to-balance\/","title":{"rendered":"Two-Three Trees: a different approach to balance"},"content":{"rendered":"<div><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/scientopia.org\/img-archive\/goodmath\/img_369.png?resize=238%2C131\" width=\"238\" height=\"131\" alt=\"23-thumb.png\" class=\"inset right\" \/><\/div>\n<p> This post is very delayed, but things have been busy.<\/p>\n<p> I&#8217;m working my way up to finger trees, which are a wonderful<br \/>\nfunctional data structure. They&#8217;re based on trees &#8211; and like many<br \/>\ntree-based structures, their performance relies heavily on the balance<br \/>\nof the tree. The more balanced the tree is, the better they perform.<\/p>\n<p> In general, my preference when I need a balanced tree is a red-black<br \/>\ntree, which I wrote about before. But there are other kinds of balanced trees,<br \/>\nand for finger trees, many people prefer a kind of tree called a 2\/3 tree.<\/p>\n<p> A two-three tree is basically a B-tree with a maximum of two values per node.<br \/>\nIf you don&#8217;t know  what a B-tree is, don&#8217;t worry. I&#8217;m going to explain all the details.<\/p>\n<p> A two-three tree is a tree where each node contains either one or two values. If<br \/>\nthe node isn&#8217;t a leaf, then its number of children is one more than its number<br \/>\nof values. The basic value constraints are the same as in a binary search tree:<br \/>\nsmaller values to the left, larger values to the right.<\/p>\n<p> The big difference in the 2\/3 tree is balance: it&#8217;s got a much stronger<br \/>\nbalance requirement. In a 2\/3 tree, <em>every<\/em> path from the tree root to the leaves<br \/>\nis exactly the same length. <\/p>\n<p><!--more--><\/p>\n<p> Before getting into the insert algorithm (which is where the beauty and<br \/>\nthe trickiness of the structure resides), let&#8217;s get a bit more formal about what a two-three tree is. The rules of a two-three<br \/>\ntree are:<\/p>\n<ul>\n<li> Every node has either one or two values.<\/li>\n<li> Every node has either 0, 2, or 3 children.<\/li>\n<li> For a given node n, the 1st child subtree contains only values<br \/>\nless than the 1st value of n; the 2nd child subtree contains values<br \/>\ngreater than or equal to the 1st value of n but less than the second;<br \/>\nand the 3rd child subtree contains values greater than or equal to<br \/>\nthe second value in n.<\/li>\n<li> The leaves of the tree are all on the same level.<\/li>\n<\/ul>\n<p> How can it do that? The concept is pretty simple, but the implementation is<br \/>\ntricky. (B-trees in general are incredibly error-prone to implement. There&#8217;s a lot<br \/>\nof subtleties to get the implementation correct. They&#8217;re not difficult, but there<br \/>\nare a lot of tricky cases.)<\/p>\n<p> To insert a node, what you do is a straightforward search for the correct leaf to<br \/>\ninsert it into. Then you insert it into the correct position in that leaf. If after<br \/>\ndoing that, the leaf has three values, then you take that node, and split it. You take<br \/>\nthe middle value, and make a new single-value tree node. Then you take the smallest value<br \/>\nin the node, make it into a single-value node, and make it the left child of the middle node. Similarly, with the largest value, you make a new single value node, and make it the right child of the middle node. That gives you a simple three-node two-three tree where you<br \/>\nused to have a single node.<\/p>\n<p> Now you need to re-attach it to the tree. To do that, you go to the parent of the node where you did the insert. You remove the pointer that used to point to the node where you did the insert, and then you splice in the new node at that position. If that results in<br \/>\nthe parent node having more than two values, then you split the parent, and continue.<\/p>\n<p> As usual, this is much clearer with an example. Here&#8217;s a two-three tree.<\/p>\n<div><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/scientopia.org\/img-archive\/goodmath\/img_370.png?resize=431%2C181\" width=\"431\" height=\"181\" alt=\"two-three-initial.png\" \/><\/div>\n<p> Now, suppose we want want to insert the value 15. The insert node is going to be the third child &#8211; the rightmost node in the tree, which has two values, 14 and 17. So we go ahead and insert it into that node.<\/p>\n<div><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/scientopia.org\/img-archive\/goodmath\/img_371.png?resize=455%2C181\" width=\"455\" height=\"181\" alt=\"two-three-insert-intermediate1.png\" \/><\/div>\n<p> Now we&#8217;ve got a leaf node with three values, 14, 15, and 17. That&#8217;s no good. So we split<br \/>\nit out. We make a new node with the value 15, and make it a left-child node containing 14, and a right child containing 17. Then we insert that into the parent.<\/p>\n<div><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/scientopia.org\/img-archive\/goodmath\/img_372.png?resize=479%2C196\" width=\"479\" height=\"196\" alt=\"two-three-insert-intermediate2.png\" \/><\/div>\n<p> So now we&#8217;ve got a root node with three values: 6, 12, and 15. So we need to split it.<br \/>\nWe do the same basic thing: take the middle value (12), and make it a new node. Then we<br \/>\nmake 6 its left child, and 15 its right. And now we&#8217;ve got a two-three tree that satisfies all the rules, and is beautifully balanced.<\/p>\n<div><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/scientopia.org\/img-archive\/goodmath\/img_373.png?resize=498%2C248\" width=\"498\" height=\"248\" alt=\"twothree-insert-final.png\" \/><\/div>\n<p> Deleting a node uses similar tricks, but based on merging nodes instead of splitting.<br \/>\nThe trick in deletes is making sure that you don&#8217;t lose levels. For example, suppose<br \/>\nyou wanted to remove &#8220;6&#8221;.<\/p>\n<div><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/scientopia.org\/img-archive\/goodmath\/img_374.png?resize=498%2C248\" width=\"498\" height=\"248\" alt=\"two-three-cut0.png\" \/><\/div>\n<p> Looking at our tree, if you deleted &#8220;6&#8221;, the easiest way would be to merge 4 and 7 into a single node, which was the left child of 12, as in this diagram.<\/p>\n<div><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/scientopia.org\/img-archive\/goodmath\/img_375.png?resize=398%2C248\" width=\"398\" height=\"248\" alt=\"two-three-cut1.png\" \/><\/div>\n<p> But when you do that, you&#8217;ve removed a level from the leftmost branch of the tree: the left branch of the root node would have a depth of two, and the right branch would have a depth of three. So to do the delete successfully, you need to shuffle things around a bit. Since you&#8217;re removing a level from the left subtree, you need to also remove one from the right. You do that by merging the root of the right subtree into the tree root:<\/p>\n<div><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/scientopia.org\/img-archive\/goodmath\/img_376.png?resize=425%2C166\" width=\"425\" height=\"166\" alt=\"two-three-cut2.png\" \/><\/div>\n<p> Finally, if the merge of the right branch had caused the root to contain three values, we would have needed to do a split, exactly the way we did in insert.<\/p>\n<p> It&#8217;s a beautiful structure. But as I said, implementing it is a pain. Personally,<br \/>\nI&#8217;d rather take the relatively small performance penalty of the red-black tree,<br \/>\nwhich I find a lot easier to get right.  In general, two-three trees are beautiful in concept,<br \/>\nbut more trouble than they&#8217;re worth in practice. But at least some functional folks love<br \/>\nthem; and I suppose it&#8217;s possible that in a functional language with strong formal<br \/>\nproperties, it&#8217;s easier to get them right than in a typical imperative language.<\/p>\n<p> I&#8217;m not going to show you an implementation of two-three trees &#8211; as I said, the implementation gets rather hairy, covering all of the cases. I put one together, but looking at it, I don&#8217;t think it&#8217;s got much in the way of explanatory value. In the next data structure post, I&#8217;m going to finally talk about finger trees. One of the canonical implementations of them is based on a Haskell implementation of two-three trees &#8211; so if you really want to<br \/>\nsee that, there&#8217;ll be a link in the next post.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This post is very delayed, but things have been busy. I&#8217;m working my way up to finger trees, which are a wonderful functional data structure. They&#8217;re based on trees &#8211; and like many tree-based structures, their performance relies heavily on the balance of the tree. The more balanced the tree is, the better they perform. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[83],"tags":[],"class_list":["post-759","post","type-post","status-publish","format-standard","hentry","category-data-structures"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4lzZS-cf","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/759","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/comments?post=759"}],"version-history":[{"count":0,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/759\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/media?parent=759"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/categories?post=759"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/tags?post=759"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}