{"id":1433,"date":"2011-05-03T14:16:07","date_gmt":"2011-05-03T18:16:07","guid":{"rendered":"http:\/\/scientopia.org\/blogs\/goodmath\/?p=1433"},"modified":"2011-05-03T14:16:07","modified_gmt":"2011-05-03T18:16:07","slug":"the-perils-of-premature-optimization","status":"publish","type":"post","link":"http:\/\/www.goodmath.org\/blog\/2011\/05\/03\/the-perils-of-premature-optimization\/","title":{"rendered":"The Perils of Premature Optimization"},"content":{"rendered":"<p> A reader who&#8217;s known me for a while just sent me a note asking me to say something about premature optimization. The reason that I mention that he&#8217;s known me for a while (in fact, someone who used to be a coworker) is because this is a topic that I&#8217;m prone to rant about in person, but I&#8217;ve never mentioned on the blog. So he knows that it&#8217;s something that I&#8217;ve got a strong opinion about. Basically, he&#8217;s having trouble dealing with an annoying coworker doing this, and he wants a rant-by-proxy. I&#8217;m OK with that. :-).<\/p>\n<p>When you&#8217;re writing a new piece of software, particularly on a modern computer, one of the unintuitive things that frequently happens is that the performance of your system is really quite different from what you&#8217;d expect.And even when everything is as expected, most people don&#8217;t have a particularly good sense of tradeoffs &#8211; if I make <em>this<\/em> thing faster, what effect will it have on <em>that<\/em>? and if it does really improve the performance, how much will it improve it, and at what cost? If you can&#8217;t answer that &#8211; and answer it precisely, with supporting evidence &#8211; then you have no business optimizing.<\/p>\n<p> So when you sit down to write some new code, what you should really do is write code that&#8217;s algorithmically efficient &#8211; that is, you pick an algorithm that has good performance in asymptotic time &#8211; and implement it in a straightforward way.<\/p>\n<p> But what many people do &#8211; in fact, what pretty much all of us do at least  some of the time &#8211; is try to be clever. We try to find opportunities to change the code to make it faster. Doing that <em>before<\/em> you understand where the computer actually spends its time when it&#8217;s running the program is what we call premature optimization.<\/p>\n<p> Why not? <\/p>\n<p><!--more--><\/p>\n<p> There are a bunch of problems with premature optimization. The two big ones are that it can make the program <em>slower<\/em> rather than faster; and it can make the program more complex.  <\/p>\n<p>  How can it make things slower?  <\/p>\n<ol>\n<li>  Compilers are really smart, and they can perform transformations of a program that a human wouldn&#8217;t think of. The catch is that in order to perform  an optimizing transformation, the compiler needs to be able to  <em> prove <\/em>  that the transformation won&#8217;t change the results of the program. If you&#8217;re very clever about writing something in a cool, subtle way, you&#8217;re making it harder for the compiler to figure out if it can do an optimization. For example, a lot of common tricks for supposedly optimizing code in C or C++ do clever things with pointers. By doing that, they save what looks like a bit of time. But by. mucking with pointers, they make the program completely inscrutable to the compiler &#8211; and higher-level optimizations that would have had a great impact. on the actual performance of the system can&#8217;t be done, because the compiler can&#8217;t prove that the transformation would be correct. <\/li>\n<li>  Hardware issues are often subtle. Modern CPUs are astonishingly complex, and performance tuning can be thoroughly counterintuitive. For a simple example, a hand-optimization of code might add a couple of instructions in order to swap an expensive operation for a couple of simple ones. The simple ones might be faster in theory &#8211; but by making the instruction sequence longer, it no longer  fits perfectly into the processor&#8217;s instruction cache. And that can have a dramatic impact on performance. (The cache is a special region of very fast  memory in the processor. When you access memory, the CPU copies a chunk of memory called a <em>block<\/em> into the cache so that it can access it quickly. If you access something that isn&#8217;t in the cache, the CPU will discard something in the cache to make room. The cache operates in terms of blocks; if you can make either the program code or a particular data structure fit into a single cache block, the performance of that block\/data structure will be really good. If you make it a tiny bit larger so that it no longer fits in a block, you get a dramatic change in performance.) <\/li>\n<\/ol>\n<p>  Complexity is a more obvious issue. Changing your code to make it faster is often complicated. It makes the code harder to understand, harder to debug, and harder to maintain. It&#8217;s frequently better to let the code be a little bit  slower, in order to make it easier to be sure that it&#8217;s correct, and easier to understand when someone comes back to look at it six months, or a year, or a decade after it was written.  <\/p>\n<p>  There&#8217;s nothing wrong with hand-optimizing your code. It&#8217;s a good thing, and every good engineer spends time making sure that the crucial code of a  system is as fast as it can be, without having an adverse impact on maintainability. But the catch &#8211; and it&#8217;s a biggie &#8211; is that you need to be sure that you&#8217;re optimizing the right thing, and that the impact of that optimization is actually a net positive.  <\/p>\n<p>  Premature optimization is optimizing your code before you really know its performance characteristics. When you try to optimize too soon, you don&#8217;t know what parts of the code are actually too slow. Just because you  <em> think <\/em> that something is going to be a performance bottleneck doesn&#8217;t mean that it really is. For reasons described above (and a variety of others), what you  <em>think<\/em>  is crucial to performance might not be. And until you know where the bottlenecks are, there&#8217;s a damn good chance that you&#8217;re wasting your time,  making the code harder to understand, harder to maintain, and possibly even slower, by tweaking the wrong things.  <\/p>\n<p>  The bit of this that I personally find most frustrating is that there&#8217;s a huge amount of &#8220;common knowledge&#8221; that revolves around performance issues that people use to make decisions about how to write their code that&#8217;s just plain <em>wrong<\/em>. <\/p>\n<p>  For example, think of the common loop in C++:  <\/p>\n<p><code><br \/>\nfor (int i = 0; i < 1000; i++)  {\n   \/\/  <em> do something <\/em><br \/>\n}<br \/>\n <\/code><\/p>\n<p>  A huge number of C++ programmers will swear to you that using <code>++i<\/code> is faster than using <code>i++<\/code>.  <\/p>\n<p>  <code>++i<\/code> means &#8220;add one to i, and then return the new value of i&#8221;. <code>i++<\/code> means add one to i, and then return the old value of i&#8221;. So in the <code>i++<\/code> case, you need to save the old value; in <code>++i<\/code>, you don&#8217;t. So <code>++i<\/code> must be faster,  right?  <\/p>\n<p>  Not necessarily. It depends on the machine you&#8217;re going to run the code on. For example, let&#8217;s see what the code would look like on some imaginary stack machine.  <\/p>\n<pre>\n \/\/ i++\n1: PUSH i          \/\/ push the address of i onto the stack.\n2: FETCH          \/\/ fetch the value stored in the address on top of the stack.\n3: DUP              \/\/ duplicate the value on top of the stack\n4: PUSH 1         \/\/ push the value 1 onto the stack.\n5: ADD              \/\/ add the top two values of the stack.\n6: PUSH i          \/\/ push the address of i onto the stack.\n7: STORE         \/\/ store the value in the second position of the stack into\n                       \/\/ the address on top of the stack.\n\n\/\/ ++i\n1: PUSH i\n2: FETCH\n3: PUSH 1\n4: ADD\n5: DUP\n6: PUSH i\n7: STORE\n<\/pre>\n<p> They&#8217;re exactly the same length. In fact, they&#8217;re exactly the same instructions &#8211; there&#8217;s just a tiny change to the order of the instructions.  On the other hand, if we were using a register machine, it would look like: <\/p>\n<pre>\n\/\/ i++; \"result\" stored in R0.\n1: LOAD R0, @I    \/\/ load R0 with the contents of address I.\n2: LOAD R1, R0    \/\/ load R1 with the contents of R0\n3: ADD R1, 1.       \/\/ add 1 to R1\n4: STORE R1, @I. \/\/ store r1 to address I\n\n\/\/ ++i\n1: LOAD R0, @I\n2: ADD R0, 1\n3: STORE R0, @I\n<\/pre>\n<p>  In the register machine, <code>++i<\/code> is one instruction shorter that <code>i++<\/code>. So on a register machine (which is, really, a more accurate model of a real computer than a stack machine), <code>++i<\/code> really is faster.  <\/p>\n<p>  Except that it&#8217;s usually not.  <\/p>\n<p>  Because if the value of R0 (the saved copy of i before incrementing) is never referenced, then any compiler that isn&#8217;t a total piece of garbage will merge the R0 and R1 registers, dropping the extra save of the pre-increment value. So they&#8217;ll end up being exactly the same.  In reality, the <em>only<\/em> place where the performance of <code>++i<\/code> is really different from <code>i++<\/code> is when it&#8217;s embedded in a complex context &#8211; and in a case like that, using the &#8220;++&#8221; operation is almost always the wrong thing to do, because any performance benefit is outweighed by the loss of clarity caused by a side-effecting arithmetic operator embedded in an expression.<\/p>\n<p>  There are a lot of places like that, where &#8220;obvious&#8221; saving don&#8217;t save anything. Another really common example is inlining. Normally, when you&#8217;re programming, if you&#8217;ve got some piece of code that&#8217;s going to be used in several different places, you turn it into a simple function, and call it from each of those places. Inlining is taking the calls to a function, and replacing them with copies of the code that would be called.  <\/p>\n<p>  A trivial example:  <\/p>\n<pre>\n\/\/ Not inlined:\nvoid mult2(int& v) {\n   v = 2 * v;\n}\n\nvoid main() {\n  for (int i = 0; i < 100; i++) {\n     int x = i;\n     mult2(x);\n     cout << x;\n}\n\n\/\/ Inlined:\nfor (int i = 0; i < 100; i++)  {\n   int x = i;\n    x = 2 * x;\n   cout x;\n}\n<\/pre>\n<p>  In theory, inlining saves you the cost of actually doing a function call. Normally, to call a function, you need to push the parameters to the call onto the stack, store the return address onto the stack, and then do a jump to the start of the function. That's not a huge cost, but it's not trivial either. If you get rid of the call, you can get rid of the jump in to the function and the jump back; and you can get rid of the pushes and pops of the parameters. So if you've got code in, say, a loop, where it's going to get called a bunch of times, it's faster if the code is inlined.  <\/p>\n<p>  Inlining makes code harder to read. If you call a function from multiple places, then you instantly know that the program is doing the same thing inside of  those two calls. But if it's inlined, you need to look at the duplicate code in both places, and figure out that they're the same. Inlining makes code harder to maintain. If you find a bug in some code that's called as a function, then if you fix the function, you've fixed every place that calls the code. But if  you inlined it, then you need to fix every copy separately - and it's all too easy to miss a copy.  <\/p>\n<p>  The particularly frustrating thing about inlining is that it's exactly the kind of optimization that compilers are really good at. It's really easy to show that a function is inlinable. And it's easy to transform the code to do an automated inlining. And it's even easy to tell if a function should be inlined:  if you call the function many times from a loop, or if the function is only ever called from one place, then you inline it. And yet, people will dedicate huge amounts of time to figuring out if they should inline a function by hand.  <\/p>\n<p>  This kind of thing isn't just frustrating. It's counterproductive. You end up wasting time doing things that the computer could do for you, and that it's better at than you are. That time isn't free - it either delays your project, or it gets done in the place of doing something more valuable - like writing tests to make sure that your code is correct. Or like optimizing the code that really  <em> needs <\/em>  to be optimized.  <\/p>\n<p>  The reason that we call it premature optimization is simply because it's done too soon. You hand-optimize code when you know that it's a problem. You don't optimize because you  <em> think <\/em>  that this  <em> probably <\/em>  will be a problem. You wait until you're  <em> sure <\/em>  that it's a problem. If you can't  <em> prove <\/em>  that there's going to be a performance problem, then you <em> should <\/em>  be focused only on  <em> algorithmic <\/em>  complexity, correctness, and clarity.  <\/p>\n","protected":false},"excerpt":{"rendered":"<p>A reader who&#8217;s known me for a while just sent me a note asking me to say something about premature optimization. The reason that I mention that he&#8217;s known me for a while (in fact, someone who used to be a coworker) is because this is a topic that I&#8217;m prone to rant about in [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[54],"tags":[208,212,313,222],"class_list":["post-1433","post","type-post","status-publish","format-standard","hentry","category-programming","tag-optimization","tag-pet-peeves","tag-programming","tag-reader-requests"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4lzZS-n7","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/1433","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/comments?post=1433"}],"version-history":[{"count":0,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/1433\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/media?parent=1433"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/categories?post=1433"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/tags?post=1433"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}