Issue41198

Created on **2020-07-02 20:06** by **Carlos Neves**, last changed **2020-07-03 20:42** by **tim.peters**. This issue is now **closed**.

Messages (10) | |||
---|---|---|---|

msg372878 - (view) | Author: Carlos Neves (Carlos Neves) | Date: 2020-07-02 20:06 | |

Hi, I am observing unexpected behavior with round built-in function about (1) significant figures in analytical methods and (2) number of odd and even numbers obtained by this function. https://docs.python.org/3/library/functions.html#round https://en.wikipedia.org/wiki/Significant_figures 1. Significant Figures ====================== For example, when I say 1.20 in analytical methods, I am confident about the last digit, the zero. It has a meaning. But, when I use Python, >>> round (1.203, 2) 1.2 >>> the zero not appears. It is not occur when the second digit is not zero. >>> round (1.213, 2) 1.21 >>> The zero should be printed like the other numbers to be consistent with the significant figures. Maybe other functions could present the same behavior. 2. Rounding procedure ===================== I wrote the following code to test the number of odd and even numbers during a round procedure. I should get half-and-a-half of odd and even numbers. But the result using the round function is different. We observed 5 even more and 5 odd less. This behavior causes a systematic error. https://en.wikipedia.org/wiki/Rounding I hope to be contributing to the improvement of the code. Thank advanced. ###################################################### # This code count the number of odd and even number with different procedures: truncate, round simple and round function # Test condition: Rounding with one digit after the decimal point. import numpy as np even_0 = 0 odd_0 = 0 even_1 = 0 odd_1 = 0 even_2 = 0 odd_2 = 0 even_3 = 0 odd_3 = 0 # generate 1000 numbers from 0.000 up to 1 with step of 0.001 x = np.arange(0,1,0.001) # printing for i in range(len(x)): x_truncated = int((x[i]*10)+0.0)/10 # no rounding x_rounded_simple = int((x[i]*10)+0.5)/10 # rounding up at 5 x_rounded_function = round(x[i],1) # rounding by function with one digit after the decimal point # counting odd and even numbers if int(x[i]*1000) % 2 == 0: even_0 += 1 else: odd_0 += 1 if int(x_truncated*10) % 2 == 0: even_1 += 1 else: odd_1 += 1 if int(x_rounded_simple*10) % 2 == 0: even_2 += 1 else: odd_2 += 1 if int(x_rounded_function*10) % 2 == 0: even_3 += 1 else: odd_3 += 1 print ("{0:.3f} {1:.1f} {2:.1f} {3:.1f}".format((x[i]), x_truncated, x_rounded_simple, x_rounded_function)) print ("Result:") print ("Raw: Even={0}, Odd={1}".format(even_0,odd_0)) print ("Truncated: Even={0}, Odd={1}".format(even_1,odd_1)) print ("Rounded simple: Even={0}, Odd={1}".format(even_2,odd_2)) print ("Rounded Function: Even={0}, Odd={1}".format(even_3,odd_3)) ###################################################### Output ... 0.995 0.9 1.0 1.0 0.996 0.9 1.0 1.0 0.997 0.9 1.0 1.0 0.998 0.9 1.0 1.0 0.999 0.9 1.0 1.0 Result: Raw: Even=500, Odd=500 Truncated: Even=500, Odd=500 Rounded simple: Even=500, Odd=500 Rounded Function: Even=505, Odd=495 ---- |
|||

msg372884 - (view) | Author: Tim Peters (tim.peters) * | Date: 2020-07-02 21:22 | |

For the first, your hardware's binary floating-point has no concept of significant trailing zeroes. If you need such a thing, use Python's `decimal` module instead, which does support a "significant trailing zero" concept. You would need an entirely new data type to graft such a notion onto Python's (or numpy's!) binary floats. For the second, we'd have to dig into exactly what numpy's `arange()` does. Very few of the numbers you're working with are exactly representable in binary floating point except for 0.0. For example, "0.001" is approximated by a binary float whose exact decimal value is 0.001000000000000000020816681711721685132943093776702880859375 Sometimes the rounded (by machine float arithmetic) multiples of that are exactly representable, but usually not. For example, >>> 0.001 * 250 0.25 rounds to the exactly representable 1/4, and >>> 0.001 * 750 0.75 to the exactly representable 3/4. However, `round()` uses round-to-nearest/even, and then >>> round(0.25, 1) 0.2 >>> round(0.75, 1) 0.8 both resolve the tie to the closest even value (although neither of those _results_ are exactly representable in binary floating-point - although if you go on to multiply them by 10.0, they do round (in hardware) to exactly 2.0 and 8.0). Note that numpy's arange() docs do warn you against using it ;-) """ When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use numpy.linspace for these cases. """ |
|||

msg372886 - (view) | Author: Carlos Neves (Carlos Neves) | Date: 2020-07-02 21:45 | |

Hi Peters, I will pay more attention to the Python docs :) Thank you for your direction. Carlos A. Neves Em qui., 2 de jul. de 2020 às 18:22, Tim Peters <report@bugs.python.org> escreveu: > > > Tim Peters <tim@python.org> added the comment: > > For the first, your hardware's binary floating-point has no concept of significant trailing zeroes. If you need such a thing, use Python's `decimal` module instead, which does support a "significant trailing zero" concept. You would need an entirely new data type to graft such a notion onto Python's (or numpy's!) binary floats. > > For the second, we'd have to dig into exactly what numpy's `arange()` does. Very few of the numbers you're working with are exactly representable in binary floating point except for 0.0. For example, "0.001" is approximated by a binary float whose exact decimal value is > > 0.001000000000000000020816681711721685132943093776702880859375 > > Sometimes the rounded (by machine float arithmetic) multiples of that are exactly representable, but usually not. For example, > > >>> 0.001 * 250 > 0.25 > > rounds to the exactly representable 1/4, and > > >>> 0.001 * 750 > 0.75 > > to the exactly representable 3/4. However, `round()` uses round-to-nearest/even, and then > > >>> round(0.25, 1) > 0.2 > >>> round(0.75, 1) > 0.8 > > both resolve the tie to the closest even value (although neither of those _results_ are exactly representable in binary floating-point - although if you go on to multiply them by 10.0, they do round (in hardware) to exactly 2.0 and 8.0). > > Note that numpy's arange() docs do warn you against using it ;-) > > """ > When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use numpy.linspace for these cases. > """ > > ---------- > nosy: +tim.peters > > _______________________________________ > Python tracker <report@bugs.python.org> > <https://bugs.python.org/issue41198> > _______________________________________ |
|||

msg372887 - (view) | Author: Steven D'Aprano (steven.daprano) * | Date: 2020-07-02 21:45 | |

Thank you for your long and detailed bug report, but please post one issue per bug report. Tim, we agree that the notion of significant figures is irrelevant; is Carlos' even/odd test sufficiently flawed that we should close this bug report, or keep it open to investigate the rounding bias issue? My feeling is that it is sufficiently flawed that we can just close this, but it's too early in the morning for me to articulate why :-) |
|||

msg372888 - (view) | Author: Tim Peters (tim.peters) * | Date: 2020-07-02 21:56 | |

I assumed Mark would tell us what's up with the arange() oddity, so let's see whether he does. There is no truly good way to generate "evenly spaced" binary floats using a non-representable conceptual decimal delta. The dumbass ;-) way doesn't show a discrepancy in pure Python: >>> num = ne = no = 0 >>> d = 0.001 >>> while num < 1.0: ... digit = int(round(num, 1) * 10) ... if digit & 1: ... no += 1 ... else: ... ne += 1 ... num += d >>> ne, no (500, 500) However, a somewhat less naive way does show a discrepancy, but less so than what arange() apparently does: >>> ne = no = 0 >>> for i in range(1000): ... digit = int(round(i * d, 1) * 10) ... if digit & 1: ... no += 1 ... else: ... ne += 1 >>> ne, no (501, 499) I assume that's because of the specific nearest/even behavior I already showed for multipliers i=250 and i=750. |
|||

msg372890 - (view) | Author: Steven D'Aprano (steven.daprano) * | Date: 2020-07-02 22:11 | |

If you change the starting point of the rounding away from zero, the bias flips back and forth, which is exactly what I would expect from Banker's Rounding: def check_bias(start): d = 0.001 ne = no = 0 for i in range(1000): digit = int(round(start + i * d, 1) * 10) if digit & 1: no += 1 else: ne += 1 return ne, no # Python 3.7 >>> check_bias(0.0) (501, 499) >>> check_bias(0.1) (500, 500) >>> check_bias(0.2) (499, 501) >>> check_bias(0.3) (499, 501) >>> check_bias(0.4) (500, 500) >>> check_bias(0.5) (499, 501) >>> check_bias(0.6) (501, 499) I ran the same check_bias in Python 2.7, which doesn't use bankers rounding, and the bias is consistently in one direction: # Python 2.7 >>> check_bias(0.0) (500, 500) >>> check_bias(0.1) (499, 501) >>> check_bias(0.2) (498, 502) >>> check_bias(0.3) (498, 502) >>> check_bias(0.4) (499, 501) >>> check_bias(0.5) (498, 502) >>> check_bias(0.6) (500, 500) |
|||

msg372894 - (view) | Author: Tim Peters (tim.peters) * | Date: 2020-07-03 00:09 | |

Cool! So the only thing surprising to me here is just how far off balance the arange() run was. So I'd like to keep this open long enough for Mark to notice, just in case it's pointing to something fishy in numpy. |
|||

msg372941 - (view) | Author: Mark Dickinson (mark.dickinson) * | Date: 2020-07-03 16:23 | |

One note: in the original post, not only are the values being tested coming from NumPy's arange, but round(x[i],1) is testing *NumPy's* rounding functionality, not Python's. x[i] has type np.float64, and while np.float64 does inherit from Python's float, it also overrides float.__round__ with its own implementation (that essentially amounts to scale-by-power-of-ten, round-to-nearest-int, unscale, just like Python used to do in the bad old days). So errors from arange plus NumPy's non-correctly-rounded round means that all bets are off on what happens for values that _look_ as though they're ties when shown in decimal, but aren't actually ties thanks to the what-you-see-is-not-what-you-get nature of binary floating-point. On arange in particular, I've never looked closely into the implementation; it's never noticeably not been "close enough" (i.e., accurate to within a few ulps either way), and I've never needed it or expected it to be perfectly correctly rounded. Now that it's been brought up, I'll take a look. (But that shouldn't keep this issue open, since that's a pure NumPy issue.) Honestly, given the usual floating-point imprecision issues, I'm surprised that the balance is coming out as evenly as it is in Tim's and Steven's experiments. I can see why it might work for a single binade, but I'm at a loss to explain why you'd expect a perfect balance across several binades. For example: if you're looking at values of the form 0.xx5 in the binade [0.5, 1.0], and rounding those to two decimal places, you'd expect perfect parity, because if you pair the values from [0.5, 0.75] with the reverse of the values from [0.75, 1.0], in each pair exactly one of the two values will round up, and one down (the paired values always add up to *exactly* 1.5, with no rounding, so the errors from the decimal-to-binary rounding will always go in opposite directions). For example 0.505 rounds up, and dually 0.995 rounds down. (But whether the pair gives (up, down) or (down, up) will depend on exactly which way the rounding went when determining the nearest binary64 float, so will be essentially unpredictable.) >>> test_values = [x/1000 for x in range(505, 1000, 10)] >>> len(test_values) # total count of values 50 >>> sum(round(val, 2) > val for val in test_values) # number rounding up 25 But then you need to make a similar argument for the next binade down: [0.25, 0.5] (which doesn't work at all in this case, because that binade contains an odd number of values). Nevertheless, this *does* seem to work, and I haven't yet found a good explanation why. Any offers? >>> k = 8 >>> test_values = [x/10**(k+1) for x in range(5, 10**(k+1), 10)] >>> sum(round(val, k) > val for val in test_values) 50000000 BTW, yes, I think this issue can be closed. |
|||

msg372952 - (view) | Author: Mark Dickinson (mark.dickinson) * | Date: 2020-07-03 18:53 | |

Just for fun, I posted a Stack Overflow question: https://stackoverflow.com/q/62721186/270986 Let's close this here. |
|||

msg372954 - (view) | Author: Tim Peters (tim.peters) * | Date: 2020-07-03 20:42 | |

Thanks, Mark! I didn't even know __round__ had become a dunder method. For the rest, I'll follow StackOverflow - I don't have an instant answer, and the instant answers I _had_ didn't survive second thoughts ;-) |

History | |||
---|---|---|---|

Date | User | Action | Args |

2020-07-03 20:42:11 | tim.peters | set | messages: + msg372954 |

2020-07-03 18:53:01 | mark.dickinson | set | status: open -> closed resolution: not a bug messages: + msg372952 stage: resolved |

2020-07-03 16:23:27 | mark.dickinson | set | messages: + msg372941 |

2020-07-03 00:09:27 | tim.peters | set | messages: + msg372894 |

2020-07-02 22:11:37 | steven.daprano | set | messages: + msg372890 |

2020-07-02 21:56:36 | tim.peters | set | messages: + msg372888 |

2020-07-02 21:45:53 | steven.daprano | set | nosy:
+ steven.daprano messages: + msg372887 |

2020-07-02 21:45:37 | Carlos Neves | set | messages: + msg372886 |

2020-07-02 21:22:30 | tim.peters | set | nosy:
+ tim.peters messages: + msg372884 |

2020-07-02 20:06:26 | Carlos Neves | create |