Rounding and Bias

Another alert reader sent me a link to a YouTube video which is moderately interesting.
The video itself is really a deliberate joke, but it does demonstrate a worthwile point. It’s about rounding.

The overwhelming majority of us were taught how to round decimals back in either elementary or middle school. (I don’t even recall exactly when.) The rule that most of us were taught is:

  1. If the first digit after the rounding point is 0, 1, 2, 3, or 4, then round the previous digit down;
  2. If the first digit after the rounding point is 5, 6, 7, 8, or 9, then round the
    previous digit up.

Here’s the problem: those rules are wrong.

The problem is that if the first digit after the rounding point is zero, you’re
not really rounding – that is, you’re not doing anything that changes the value of the data point. But if the first digit after the rounding point is 5,
then it’s exactly halfway in-between; it’s not closer to the either the rounded up value or the rounded down value – it’s exactly between them. Always rounding 5 up will create a bias, because it’s taking the point at the middle, and shifting it as if it were closer
towards the upward side.

To demonstrate, let’s try an easy example. Suppose we’ve got the following set
of numbers: {0, 0.5. 1, 1.5. 2, 2.5, 3, 3.5, 4, 4.5}. Let’s compute the mean
of those numbers: 22.5/10 = 2.25.

Now, let’s round them off: {0, 1, 1, 2, 2, 3, 3, 4, 4, 5}; and then compute the mean: 25/10 = 2.5.

With the standard rounding rule, we’ve biased the numbers upwards enough to create a significant error!

The correct way to round is to randomly round 5s either up or down. The standard rule, used in most scientific settings, is to pick either odd or even as the “preferred” outcome, and to always round 5s towards the preferred outcome. If we try that with our example, using
preferred even, the rounding is {0, 0, 1, 2, 2, 2, 3, 4, 4, 4}. Taking the mean of that, we get 22/10 = 2.2 – which is significantly closer to the mean of the original numbers than the
mean rounding 5s up. The practice of rounding up adds a systematic bias to the data. It’s a very small systematic bias, but it’s a real one.

Does it matter? Not usually. As the commentary to the video points out, over the space of a couple of years, that systematic error in rounding gas prices amounts to about a dime. For most things in our daily experience, the difference between random rounding and upward rounding for 5s is just not significant. But if you’re doing statistical analysis of
large quantities of data, or you’re doing computations that rely on a high degree of
precision, then it can introduce enough error to foul your results. If you’re doing statistical analysis, it can do things like make an insignificant result appear to be statistically significant. If you’re doing high precision computations for things like
navigation of a space probe through a gravitational slingshot, it can introduce enough error
to crash your probe.

0 thoughts on “Rounding and Bias

  1. The Science Pundit

    Mark,
    Your explanation only works if you’re removing exactly one significant digit when rounding (ie. written as real numbers, the 0 or 5 that gets chopped off is followed by an infinite string of 0‘s). If you assume that you are very likely to encounter a non-zero digit somewhere beyond the digit that you’re rounding off, then lopping off a 0 is indeed (almost) always rounding down, and also the “rounded up” value is (almost) always closer to the “true” value than if you just lopped off the 5 and rounded down.

    Reply
  2. kevwalsh

    #1 – nonsense. By what process in the world do we produce truncated numbers like you suggest? You have this odd idea that if I take some measurement, and get 2.5 as a value, then the true value is of the form 2.5xxxxx where I just don’t know what the x’s happen to be (i.e. the measured value is just a truncated version of the true value). But it’s not. If our instruments are good, we think the value is near 2.5. Maybe a little above or a little below. There just ain’t many ways to produce data where we know all the digits are true in the truncation sense.
    -kevin

    Reply
  3. AntaresTrader

    Comments #1-3 indicate it is time for a post on significant figures. Here is the quick version:
    2.5 actually means a between 2.45 and 2.55 if this is not what you want it to mean you could perhaps write 2.50 or 2.5 +/- 0.2. if you want exctly two and a half it is properly written 25 *10^1 not the lack of any decimal point makes a number exact.
    WRT Gas: The pump at my station measures price to 4 sig figs and volume to 5. I think the means that it only rounds wrong one time in 200.

    Reply
  4. Karl

    “If you’re doing high precision computations for things like navigation of a space probe through a gravitational slingshot, it can introduce enough error to crash your probe.”
    In that case maybe you shouldn’t be rounding.

    Reply
  5. wng_z3r0

    #6, It’s the basic problem of computation. You can’t store infinite length numbers on a computer, except for symbolically. The second you can’t store, and calculate, everything symbolically, you have to account for the need to truncate or round. And it’s not even numbers you would think need special handling. 0.1 is a classic example of a number that cannot be stored exactly using IEEE754 floating point, because .1 is a non-repeating fractional number in base 2. This is why you need to use proper numeric methods to guarantee N digits of accuracy, and this post about rounding is an example of how to reduce the error of the least significant number.

    Reply
  6. Brian

    Often you have more than one non-significant digit, ie, digits you want to round away. In those cases #1 is correct. It’s not that you have 2.5xxxx where you don’t know x, it’s that you have 2.534 and you don’t care about anything after the decimal point.
    Also, I would say that taking 1.0000(etc) to 1 actually IS rounding, it’s just rounding with a no-op, in the same way that dividing something by one is still dividing… But that’s a definitions thing…

    Reply
  7. kevwalsh

    Re #4:
    $1.09 rounded to the nearest dollar? $1
    24-bit sample rounded to 16-bits? my argument applies.
    How about this: $1.05 rounded to the nearest dollar. Mark is right. There is no single “nearest” dollar. They are both equally near. #1 tried to imply that $1.05 really stands for a true value of $1.05+delta, with delta>=0, and therefore the “nearest” dollar should more likely be $2. This is nonsense. It is almost always going to be $1.05+/-delta for any kind of real sampling or measurements.

    Reply
  8. Brian

    Just to be pedantic, I assume that when you say 1.05 to the nearest dollar you mean 1.50. But the point is that 1.50, ok, is equal, but 1.5x where x>0 is NOT equal, no matter

    Reply
  9. Paul Carpenter

    Yeah I assume that’s what #9 meant.
    #8 – I think any rounding algorithm would have to loose accuracy in general.

    Reply
  10. John Green

    Mark is right about rounding (for the record, so am I, although it turns out I might be wrong re. gas pump rounding, but no one really knows, because depending on who you ask the machines are either much less or much more accurate than I gave them credit for in the video).
    It is kind of my lifelong dream to be deemed moderately interesting by people who like math (I went so far as to write a novel about such people), so I appreciate the link and the thoughtful commentary. -John

    Reply
  11. Brian

    #12: I still maintain that you and Mark are ONLY correct about (in the case of rounding to the next integer) xxxx.5 EXACTLY. if you prefer evens, and you round 4.51 down to 4, you are doing it wrong.

    Reply
  12. Tree

    I agree a post on significant digits is needed. If you measure 3.52, what you know is that what you are measuring is 3.5xxx…, where 0.0xxx… is close to 0.02. (how close depends on your tool, and should be specified.)

    Reply
  13. John Green

    Well, sure, but 3.51 isn’t 3.5. Obviously this is only relevant if the calculation being done ends either by 0’ing out or if the calculator in question rounds wrongly.
    (Example: I was taught in third grade that 3.3345 rounded to the nearest penny would be 3.34, because you have to round up the 4 and then you round up the 3, which is totally ludicrous. But I have heard–although no confirmation from the nice people at exxon–that gas pumps regularly round this way.)

    Reply
  14. Brian

    re: 15, wait, so you’re saying that 3.3344444444445 gets rounded to 3.34??? that’s dumb. If you were taught that in 3rd grade your 3rd grade teacher should be fired. from a cannon.
    Rounding isn’t a recursive process. You pick a point, and round.

    Reply
  15. William Wallace

    If you’re doing high precision computations for things like navigation of a space probe through a gravitational slingshot, it can introduce enough error to crash your probe.

    Especially if readings are processed in recursive equations, where little errors can accumulate over time.
    Rounding is a form of quantization. And quantization can be done in various ways (truncation, rounding, rounding toward 0, rounding toward infinity, etc.). And quantization error (quantized value – actual value) can be handled by adding noise (dither). And dither can have a Gaussian PDF, or other PDFs, e.g., triangular, depending upon the application.
    Anyway, MarkCC is mostly correct, and even in the case when he is less than correct, I get his point.

    Reply
  16. RoaldFalcon

    Thanks for all the great information, MarkCC. I always enjoy reading your blog.
    I would also like to join those asking for a post about the concepts and methods regarding significant digits.
    I have tried to read material about it from NIST and others in the past, but my understanding is still very low, and I would appreciate your treatment of this subject, if it’s something that would interest you.
    Thanks.

    Reply

Leave a Reply to wng_z3r0 Cancel reply