Message 372941 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	mark.dickinson
Recipients	Carlos Neves, lemburg, mark.dickinson, rhettinger, steven.daprano, stutzbach, tim.peters
Date	2020-07-03.16:23:26
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1593793407.08.0.721002665363.issue41198@roundup.psfhosted.org>
In-reply-to

Content
One note: in the original post, not only are the values being tested coming from NumPy's arange, but round(x[i],1) is testing NumPy's rounding functionality, not Python's. x[i] has type np.float64, and while np.float64 does inherit from Python's float, it also overrides float.__round__ with its own implementation (that essentially amounts to scale-by-power-of-ten, round-to-nearest-int, unscale, just like Python used to do in the bad old days). So errors from arange plus NumPy's non-correctly-rounded round means that all bets are off on what happens for values that _look_ as though they're ties when shown in decimal, but aren't actually ties thanks to the what-you-see-is-not-what-you-get nature of binary floating-point. On arange in particular, I've never looked closely into the implementation; it's never noticeably not been "close enough" (i.e., accurate to within a few ulps either way), and I've never needed it or expected it to be perfectly correctly rounded. Now that it's been brought up, I'll take a look. (But that shouldn't keep this issue open, since that's a pure NumPy issue.) Honestly, given the usual floating-point imprecision issues, I'm surprised that the balance is coming out as evenly as it is in Tim's and Steven's experiments. I can see why it might work for a single binade, but I'm at a loss to explain why you'd expect a perfect balance across several binades. For example: if you're looking at values of the form 0.xx5 in the binade [0.5, 1.0], and rounding those to two decimal places, you'd expect perfect parity, because if you pair the values from [0.5, 0.75] with the reverse of the values from [0.75, 1.0], in each pair exactly one of the two values will round up, and one down (the paired values always add up to exactly 1.5, with no rounding, so the errors from the decimal-to-binary rounding will always go in opposite directions). For example 0.505 rounds up, and dually 0.995 rounds down. (But whether the pair gives (up, down) or (down, up) will depend on exactly which way the rounding went when determining the nearest binary64 float, so will be essentially unpredictable.) >>> test_values = [x/1000 for x in range(505, 1000, 10)] >>> len(test_values) # total count of values 50 >>> sum(round(val, 2) > val for val in test_values) # number rounding up 25 But then you need to make a similar argument for the next binade down: [0.25, 0.5] (which doesn't work at all in this case, because that binade contains an odd number of values). Nevertheless, this does seem to work, and I haven't yet found a good explanation why. Any offers? >>> k = 8 >>> test_values = [x/10(k+1) for x in range(5, 10(k+1), 10)] >>> sum(round(val, k) > val for val in test_values) 50000000 BTW, yes, I think this issue can be closed.

One note: in the original post, not only are the values being tested coming from NumPy's arange, but round(x[i],1) is testing *NumPy's* rounding functionality, not Python's. x[i] has type np.float64, and while np.float64 does inherit from Python's float, it also overrides float.__round__ with its own implementation (that essentially amounts to scale-by-power-of-ten, round-to-nearest-int, unscale, just like Python used to do in the bad old days). So errors from arange plus NumPy's non-correctly-rounded round means that all bets are off on what happens for values that _look_ as though they're ties when shown in decimal, but aren't actually ties thanks to the what-you-see-is-not-what-you-get nature of binary floating-point.

On arange in particular, I've never looked closely into the implementation; it's never noticeably not been "close enough" (i.e., accurate to within a few ulps either way), and I've never needed it or expected it to be perfectly correctly rounded. Now that it's been brought up, I'll take a look. (But that shouldn't keep this issue open, since that's a pure NumPy issue.)

Honestly, given the usual floating-point imprecision issues, I'm surprised that the balance is coming out as evenly as it is in Tim's and Steven's experiments. I can see why it might work for a single binade, but I'm at a loss to explain why you'd expect a perfect balance across several binades. 

For example: if you're looking at values of the form 0.xx5 in the binade [0.5, 1.0], and rounding those to two decimal places, you'd expect perfect parity, because if you pair the values from [0.5, 0.75] with the reverse of the values from [0.75, 1.0], in each pair exactly one of the two values will round up, and one down (the paired values always add up to *exactly* 1.5, with no rounding, so the errors from the decimal-to-binary rounding will always go in opposite directions). For example 0.505 rounds up, and dually 0.995 rounds down. (But whether the pair gives (up, down) or (down, up) will depend on exactly which way the rounding went when determining the nearest binary64 float, so will be essentially unpredictable.)

    >>> test_values = [x/1000 for x in range(505, 1000, 10)]
    >>> len(test_values)  # total count of values
    50
    >>> sum(round(val, 2) > val for val in test_values)  # number rounding up
    25

But then you need to make a similar argument for the next binade down: [0.25, 0.5] (which doesn't work at all in this case, because that binade contains an odd number of values).

Nevertheless, this *does* seem to work, and I haven't yet found a good explanation why. Any offers?

    >>> k = 8
    >>> test_values = [x/10**(k+1) for x in range(5, 10**(k+1), 10)]
    >>> sum(round(val, k) > val for val in test_values)
    50000000

BTW, yes, I think this issue can be closed.

History
Date	User	Action	Args
2020-07-03 16:23:27	mark.dickinson	set	recipients: + mark.dickinson, lemburg, tim.peters, rhettinger, stutzbach, steven.daprano, Carlos Neves
2020-07-03 16:23:27	mark.dickinson	set	messageid: <1593793407.08.0.721002665363.issue41198@roundup.psfhosted.org>
2020-07-03 16:23:27	mark.dickinson	link	issue41198 messages
2020-07-03 16:23:26	mark.dickinson	create