{"id":374,"date":"2007-04-07T12:08:06","date_gmt":"2007-04-07T12:08:06","guid":{"rendered":"http:\/\/scientopia.org\/blogs\/goodmath\/2007\/04\/07\/basics-binary-search\/"},"modified":"2007-04-07T12:08:06","modified_gmt":"2007-04-07T12:08:06","slug":"basics-binary-search","status":"publish","type":"post","link":"http:\/\/www.goodmath.org\/blog\/2007\/04\/07\/basics-binary-search\/","title":{"rendered":"Basics: Binary Search"},"content":{"rendered":"<p> For the basics, I wrote a bunch of stuff about sorting. It seems worth taking a moment<br \/>\nto talk about something related: binary search. Binary search is one of the most important<br \/>\nand fundamental algorithms, and it shows up in sorts of places.<\/p>\n<p> It also has the amazing property that despite being simple and ubiquitous, it&#8217;s virtually<br \/>\n<em>always<\/em> written wrong.  There&#8217;s a bit of subtlety in implementing it correctly, and virtually<br \/>\neveryone manages to put off-by-one indexing errors into their implementations. (Including me; last time I implemented a binary search, the first version included one of the classic errors.) The errors are so ubiquitous that even in a textbook that discusses the fact that most programmers get it wrong, they <em>got it wrong<\/em> in their example code!<\/p>\n<p><!--more--><\/p>\n<p> In its simplest form, you have a list of values L={l<sub>1<\/sub>,&#8230;,l<sub>n<\/sub>}, in sorted order; and you have a value, x. You want to find out which, if any, of the l<sub>i<\/sub>s is equal to x.<\/p>\n<p> The algorithm is a simple recursive one: <\/p>\n<pre>\nbinary_search(L=[L<sub>1<\/sub>,...,L<sub>n<\/sub>], x)\nif L=[] then return not_found\nlet a = n\/2\nif L<sub>a<\/sub> == x then\nreturn found(a)\nelse if L<sub>a<\/sub> &gt; x then\nreturn binary_search([L<sub>i<\/sub> | i &lt; a ], x)\nelse\nreturn binary_search([L<sub>i<\/sub> | i &gt; a], x)\n<\/pre>\n<p> You pick a position, i, in the middle of the list. If L<sub>i<\/sub> is equal to x, then you&#8217;ve found x at position i. If L<sub>i<\/sub> is greater than x, then you know that if x is in the list, it must be in the half of the list that&#8217;s smaller than L<sub>i<\/sub>. If L<sub>i<\/sub> is less than x, then you know that if x is in the list, then it must be in the half of the list that&#8217;s larger than L<sub>i<\/sub>. If in any of the recursions, you wind up with an empty list, then you know that x wasn&#8217;t in the list.<\/p>\n<p> Before moving on to point out some of the interesting places this pops up, let&#8217;s take a moment to think about how complex it is. Each step of the algorithm, we either find the number we&#8217;re looking for, or we eliminate half of the list. So if in the first step, we&#8217;re searching a list of N elements, then in the second, we&#8217;ll be searching a list of N\/2 elements; in the third, we&#8217;ll be searching a list of N\/4 elements; and so on. So it takes O(lg N) steps in the worst case. <\/p>\n<p> Is this optimal in the number of comparisons? Yes &#8211; and it&#8217;s easy to prove, using Kolmogorov-Chaitin complexity theory. If X is in the list, it could be in any one of N positions in a list of size N. How much information do we need to identify exactly one position in N? We need an integer with at least lg(N+1) bits. (Why N+1? because we also need a way to say &#8220;It wasn&#8217;t there&#8221;.) So if we&#8217;re using comparisons, which generate one bit of information per comparison, and the algorithm needs to generate lg(N+1) bits, then in the worst case, any comparison-based algorithm will need to do lg(N+1) comparisons &#8211; which is O(lg N). <\/p>\n<p> Aside from searching sorted lists, variants of binary search pop up all over the place. For example, think about Newton&#8217;s method for finding the roots of an equation. You start with a guess; check how close it is to zero, and use the result of that to prune off part of the search space &#8211; it&#8217;s a variant binary search; constantly narrowing down the range in which the root can be found: the only real change is that you can cleverly use the derivative to narrow down the answer more quickly than a direct binary pruning. But the basic concept is the same as binary search.<\/p>\n<p> There are a few neat examples from my research area. I do work with SCM systems &#8211; which<br \/>\nnormal people sometimes call &#8220;version control systems&#8221;. A version control system is a<br \/>\nsystem that programmers use that keep a complete record of every change they&#8217;ve made to<br \/>\ntheir program &#8211; so using it, they can go back to old versions, check the history of the<br \/>\ndevelopment of their systems, etc. One thing that comes up in SCM systems is what we call<br \/>\n&#8220;blame identification&#8221;. The idea is, we know that at some time, one particular piece of<br \/>\ncode was added, and we later discovered that that piece of code introduced a bug to the<br \/>\nsystem, and we want to know exactly <em>when<\/em> in was introduced, and who was<br \/>\nresponsible for it. How do we figure out when it got added? <\/p>\n<p> We look at the history of the system, and we pick a point halfway through the history, and look to see if it contains that change. If it doesn&#8217;t, then we know the change was added later than it &#8211; so then we go to search the changes that happened later. It&#8217;s exactly binary search over the history of the program.<\/p>\n<p> John Bentley&#8217;s <em>brilliant<\/em> book <a href=\"http:\/\/www.amazon.com\/gp\/product\/0201657880?ie=UTF8&amp;tag=goodmathbadma-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0201657880\">&#8220;Programming Pearls&#8221;<\/a><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/www.assoc-amazon.com\/e\/ir?t=goodmathbadma-20&amp;l=as2&amp;o=1&amp;a=0201657880\" width=\"1\" height=\"1\" border=\"0\" alt=\"\" style=\"border:none !important;margin:0px !important\" \/> has a great discussion of some of the other places that binary search comes up. It&#8217;s really an astonishingly common algorithm; I was totally blown away when I read Bentley&#8217;s book, and realized how many problems could be understood as variants of binary search.<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/www.assoc-amazon.com\/s\/noscript?tag=goodmathbadma-20\" alt=\"\" \/><\/p>\n","protected":false},"excerpt":{"rendered":"<p>For the basics, I wrote a bunch of stuff about sorting. It seems worth taking a moment to talk about something related: binary search. Binary search is one of the most important and fundamental algorithms, and it shows up in sorts of places. It also has the amazing property that despite being simple and ubiquitous, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[74,80,54],"tags":[],"class_list":["post-374","post","type-post","status-publish","format-standard","hentry","category-basics","category-computational-complexity","category-programming"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4lzZS-62","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/374","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/comments?post=374"}],"version-history":[{"count":0,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/374\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/media?parent=374"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/categories?post=374"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/tags?post=374"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}