{"id":745,"date":"2009-02-18T21:03:03","date_gmt":"2009-02-18T21:03:03","guid":{"rendered":"http:\/\/scientopia.org\/blogs\/goodmath\/2009\/02\/18\/gap-buffers-or-dont-get-tied-up-with-ropes\/"},"modified":"2009-02-18T21:03:03","modified_gmt":"2009-02-18T21:03:03","slug":"gap-buffers-or-dont-get-tied-up-with-ropes","status":"publish","type":"post","link":"http:\/\/www.goodmath.org\/blog\/2009\/02\/18\/gap-buffers-or-dont-get-tied-up-with-ropes\/","title":{"rendered":"Gap Buffers, or, Don&#039;t Get Tied Up With Ropes?"},"content":{"rendered":"<p> Last week, I promised to continue my discussion of ropes. I&#8217;m going to<br \/>\nbreak that promise. But it&#8217;s in a good cause.<\/p>\n<p> If you&#8217;re a decent engineer, one of the basic principles you should<br \/>\nfollow is keeping this as simple as possible. One of the essential skills<br \/>\nfor keeping things simple is realizing when you&#8217;re making something<br \/>\noverly complicated &#8211; even if it&#8217;s cool, and fun, and interesting.<\/p>\n<p> Working out the rope code for more complex operations, I found myself<br \/>\nthinking that it really seemed complex for what I was doing. What I&#8217;m<br \/>\ndoing is writing myself a text editor. I&#8217;m not writing a string<br \/>\nprocessing framework for managing gigabytes of data; I&#8217;m just writing a<br \/>\ntext editor.<\/p>\n<p> So I decided to try, for the purposes of testing, to look at one of<br \/>\nthe simpler data structures. A classic structure for implementing editor<br \/>\nbuffers is called a <em>gap buffer<\/em>. I&#8217;ll explain the details below.<br \/>\nThe drawback of a gap buffer is that large cursor motions are relatively<br \/>\nexpensive &#8211; the cost of moving the cursor is proportional to the distance<br \/>\nthat you move it.<\/p>\n<p> So I wrote a test. I implemented a basic gap buffer. Then I built a<br \/>\ntest that did 300,000 inserts of five characters; then slid the cursor<br \/>\nfrom the end, to the beginning, back to the end, back to the beginning,<br \/>\nand back to the end again. Total execution time? 150 milliseconds. And<br \/>\nthat includes resizing the buffer 12 times, because of a deliberately<br \/>\nstupid buffer sizing and copying strategy.<\/p>\n<p> And with that, out the window went the ropes. Because <em>in Java<\/em>,<br \/>\nusing a slapped together, sloppy piece of test code, which does things in a dumb<br \/>\nbrute force way, it can do something much more complicated than anything an editor<br \/>\nwill encounter in routine use, the gap buffer achieves performance that is more than<br \/>\nadequately fast. (Based on some experiments I did back at IBM, delays of 1\/10th<br \/>\nof a second are roughly when people start to notice that an editor is slow. If you can respond is less than 1\/10th of a second, people don&#8217;t perceive a troublesome delay.)<br \/>\nIf the gap buffer can achieve the needed perfomance that easily, then there&#8217;s<br \/>\nabsolutely <em>no reason<\/em> to do anything more complicated.<\/p>\n<p><!--more--><\/p>\n<p> Ok. So, what&#8217;s a gap buffer?<\/p>\n<p> It&#8217;s a fascinatingly simple data structure &#8211; exactly the kind of<br \/>\nthing that appeals to my sense of software aesthetics. It&#8217;s a great big<br \/>\nbuffer &#8211; an array of characters. It&#8217;s got two segments of text &#8211; one at<br \/>\nthe beginning of the buffer, which grows upwards; and one at the end of<br \/>\nthe buffer, which grows downwards. The space between those two segments<br \/>\nis called the <em>gap<\/em>. The gap <em>is<\/em> the cursor in the<br \/>\nbuffer. To move the cursor forwards, you just <em>move the gap<\/em>: move one character from the<br \/>\nbeginning of the end segment to the end of the beginning segment. To<br \/>\ninsert a character at the cursor, you add a character to the beginning<br \/>\nsegment. To delete the character before the cursor, you just subtract one<br \/>\nfrom the length of the beginning segment.<\/p>\n<p> For example, below is a diagram of a gap buffer. In the first version, it contains the text &#8220;Hello there readers&#8221;, with the cursor<br \/>\nafter the &#8220;r&#8221; in &#8220;readers&#8221;.<\/p>\n<div><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/scientopia.org\/img-archive\/goodmath\/img_365.png?resize=538%2C244\" width=\"538\" height=\"244\" alt=\"hgaps.png\" \/><\/div>\n<p> The next version<br \/>\nmoves the cursor backwards one step. So we copy one character from<br \/>\nthe end of the pre-gap region to the beginning of the post-gap region,<br \/>\nand then adjust the pointers. We don&#8217;t even bother to erase the buffer<br \/>\nposition within the gap that used to contain the &#8220;r&#8221; &#8211; we just move<br \/>\nthe pre-gap pointer.<\/p>\n<p> In the third version, we insert the string &#8220;my&#8221; &#8211; just write the characters &#8220;m&#8221; and &#8220;y&#8221; into the next positions in the pre-gap region,<br \/>\nand advance the pre pointer. The resulting buffer contains the text<br \/>\n&#8220;Hello there myreaders&#8221;.<\/p>\n<p> That&#8217;s all there is to it. A gap buffer is just a character array, with two pointers\/indices to track the position of the beginning and ending segments. It&#8217;s incredibly simple &#8211; and yet, it&#8217;s also<br \/>\nquite surprising. I certainly wouldn&#8217;t have thought of it on my own!<\/p>\n<p> For its simplicity, it still manages to achieve outstanding performance on all of the most common actions in an edit buffer:<\/p>\n<ul>\n<li> Local cursor motions are extremely fast; it&#8217;s always exactly<br \/>\nO(1) cost for a single character cursor move, O(n) for arbitrary<br \/>\ncursor motions.<\/li>\n<li> Character inserts are really fast. They&#8217;ve got an amortized<br \/>\naverage cost of O(1). (The reason for that amortized cost is<br \/>\nbecause you sometimes need to expand the buffer, which<br \/>\nrequires a lot of copying.)<\/li>\n<li> Character deletes are super fast &#8211; just increment or decrement<br \/>\na pointer &#8211; O(1).<\/li>\n<li> Search\/scan is reasonable. Stream oriented operations like<br \/>\nsearch have an absolute linear-time lower bound, which is<br \/>\nobviously achieved by the gap buffer. And as important,<br \/>\nimplementing stream-oriented operations is extremely<br \/>\neasy to do correctly on a gap buffer.<\/li>\n<\/ul>\n<p> Finally, I&#8217;ll close with some code. This is a basic gap buffer<br \/>\nimplementation in <a href=\"http:\/\/www.scala-lang.org\/\">Scala<\/a>, my new favorite programming language. (I&#8217;m<br \/>\nlong been a huge fan of OCaml. Scala is very much like OCaml adapted<br \/>\nfor the JVM, with a better syntax.)<\/p>\n<pre>\n\/\/ Copyright 2009, Mark C. Chu-Carroll\npackage com.scienceblogs.goodmath.apex.buffer\nclass GapBuffer(size : Int) {\nvar _buf : Array[Char] = new Array[Char](size)\nvar _size = size\nvar _presize = 0;\nvar _postsize = 0;\n\/**\n* Move the cursor one character forward.\n*\/\ndef stepCursorForward() = {\nif (_postsize &gt; 0) {\n_buf(_presize) = _buf(_size - _postsize)\n_presize = _presize + 1\n_postsize = _postsize + 1\n}\n}\n\/**\n* Move the cursor one character backward.\n*\/\ndef stepCursorBackward() = {\nif (_presize &gt; 0) {\nval c = _buf(_presize - 1)\n_buf(_size - _postsize - 1) = c\n_postsize = _postsize + 1\n_presize = _presize - 1\n}\n}\n\/**\n* Move the cursor.\n* @param distance the distance to move the cursor.\n*\/\ndef moveCursor(distance : Int) = {\nif (distance &gt; 0) {\nfor (i &lt;- 0 until distance) {\nstepCursorForward()\n}\n} else {\nval d = - distance\nfor (i &lt;- 0 until d) {\nstepCursorBackward()\n}\n}\n}\n\/**\n* Move the cursor to a specific position.\n* @param loc the desired cursor position.\n*\/\ndef moveCursorTo(loc : Int) = {\nmoveCursor(loc - _presize)\n}\n\/**\n* Insert a character immediately after the cursor position.\n* @param c the character to insert.\n*\/\ndef insertChar(c : Char) = {\nif (_presize + _postsize == _size) {\nexpand()\n}\n_buf(_presize) = c\n_presize = _presize + 1\n}\n\/**\n* Insert a string after the cursor position, and advance\n* the cursor to the position after the inserted string.\n* @param s the string to insert\n*\/\ndef insertString(s : String) = {\nfor (i &lt;- 0 until s.size) {\ninsertChar(s.charAt(i))\n}\n}\n\/**\n* Double the size of the character storage buffer.\n*\/\ndef expand() = {\nval newsize = _size * 2\nvar newbuf = new Array[Char](newsize)\nfor (i &lt;- 0 until _presize) {\nnewbuf(i) = _buf(i)\n}\nfor (i &lt;- 0 until _postsize) {\nnewbuf(newsize - i - 1) = _buf(_size - i - 1)\n}\n_buf = newbuf\n_size = newsize\n}\n}\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Last week, I promised to continue my discussion of ropes. I&#8217;m going to break that promise. But it&#8217;s in a good cause. If you&#8217;re a decent engineer, one of the basic principles you should follow is keeping this as simple as possible. One of the essential skills for keeping things simple is realizing when you&#8217;re [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[83],"tags":[],"class_list":["post-745","post","type-post","status-publish","format-standard","hentry","category-data-structures"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4lzZS-c1","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/745","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/comments?post=745"}],"version-history":[{"count":0,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/745\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/media?parent=745"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/categories?post=745"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/tags?post=745"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}