This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Round built-in function not shows zeros acording significant figures and calculates different numbers of odd and even
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.8
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Carlos Neves, lemburg, mark.dickinson, rhettinger, steven.daprano, stutzbach, tim.peters
Priority: normal Keywords:

Created on 2020-07-02 20:06 by Carlos Neves, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (10)
msg372878 - (view) Author: Carlos Neves (Carlos Neves) Date: 2020-07-02 20:06
Hi,

I am observing unexpected behavior with round built-in function about (1) significant figures in analytical methods and (2) number of odd and even numbers obtained by this function.

https://docs.python.org/3/library/functions.html#round

https://en.wikipedia.org/wiki/Significant_figures

1. Significant Figures
======================

For example, when I say 1.20 in analytical methods, I am confident about the last digit, the zero. It has a meaning. But, when I use Python,

>>> round (1.203, 2)
1.2
>>>

the zero not appears. It is not occur when the second digit is not zero. 

>>> round (1.213, 2)
1.21
>>>

The zero should be printed like the other numbers to be consistent with the significant figures. Maybe other functions could present the same behavior.

2. Rounding procedure
=====================

I wrote the following code to test the number of odd and even numbers during a round procedure. I should get half-and-a-half of odd and even numbers. But the result using the round function is different. We observed 5 even more and 5 odd less. This behavior causes a systematic error.

https://en.wikipedia.org/wiki/Rounding

I hope to be contributing to the improvement of the code.

Thank advanced.


######################################################
# This code count the number of odd and even number with different procedures: truncate, round simple and round function 
# Test condition: Rounding with one digit after the decimal point.

import numpy as np

even_0 = 0
odd_0 = 0

even_1 = 0
odd_1 = 0

even_2 = 0
odd_2 = 0

even_3 = 0
odd_3 = 0

# generate 1000 numbers from 0.000 up to 1 with step of 0.001
x = np.arange(0,1,0.001) 

# printing 
for i in range(len(x)): 

	x_truncated = int((x[i]*10)+0.0)/10 # no rounding
	x_rounded_simple = int((x[i]*10)+0.5)/10 # rounding up at 5
	x_rounded_function = round(x[i],1) # rounding by function with one digit after the decimal point

	# counting odd and even numbers
	if int(x[i]*1000) % 2 == 0:
		even_0 += 1
	else:
		odd_0 += 1

	if int(x_truncated*10) % 2 == 0:
		even_1 += 1
	else:
		odd_1 += 1

	if int(x_rounded_simple*10) % 2 == 0:
		even_2 += 1
	else:
		odd_2 += 1

	if int(x_rounded_function*10) % 2 == 0:
		even_3 += 1
	else:
		odd_3 += 1

	print ("{0:.3f} {1:.1f} {2:.1f} {3:.1f}".format((x[i]), x_truncated, x_rounded_simple, x_rounded_function))

print ("Result:")
print ("Raw: Even={0}, Odd={1}".format(even_0,odd_0))	
print ("Truncated: Even={0}, Odd={1}".format(even_1,odd_1))
print ("Rounded simple: Even={0}, Odd={1}".format(even_2,odd_2))
print ("Rounded Function: Even={0}, Odd={1}".format(even_3,odd_3))

######################################################

Output
...
0.995 0.9 1.0 1.0
0.996 0.9 1.0 1.0
0.997 0.9 1.0 1.0
0.998 0.9 1.0 1.0
0.999 0.9 1.0 1.0
Result:
Raw: Even=500, Odd=500
Truncated: Even=500, Odd=500
Rounded simple: Even=500, Odd=500
Rounded Function: Even=505, Odd=495

----
msg372884 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2020-07-02 21:22
For the first, your hardware's binary floating-point has no concept of significant trailing zeroes. If you need such a thing, use Python's `decimal` module instead, which does support a "significant trailing zero" concept. You would need an entirely new data type to graft such a notion onto Python's (or numpy's!) binary floats.

For the second, we'd have to dig into exactly what numpy's `arange()` does. Very few of the numbers you're working with are exactly representable in binary floating point except for 0.0. For example, "0.001" is approximated by a binary float whose exact decimal value is

0.001000000000000000020816681711721685132943093776702880859375

Sometimes the rounded (by machine float arithmetic) multiples of that are exactly representable, but usually not. For example,

>>> 0.001 * 250
0.25

rounds to the exactly representable 1/4, and

>>> 0.001 * 750
0.75

to the exactly representable 3/4. However, `round()` uses round-to-nearest/even, and then

>>> round(0.25, 1)
0.2
>>> round(0.75, 1)
0.8

both resolve the tie to the closest even value (although neither of those _results_ are exactly representable in binary floating-point - although if you go on to multiply them by 10.0, they do round (in hardware) to exactly 2.0 and 8.0).

Note that numpy's arange() docs do warn you against using it ;-)

"""
When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use numpy.linspace for these cases.
"""
msg372886 - (view) Author: Carlos Neves (Carlos Neves) Date: 2020-07-02 21:45
Hi Peters,

I will pay more attention to the Python docs :)
Thank you for your direction.

Carlos A. Neves

Em qui., 2 de jul. de 2020 às 18:22, Tim Peters
<report@bugs.python.org> escreveu:
>
>
> Tim Peters <tim@python.org> added the comment:
>
> For the first, your hardware's binary floating-point has no concept of significant trailing zeroes. If you need such a thing, use Python's `decimal` module instead, which does support a "significant trailing zero" concept. You would need an entirely new data type to graft such a notion onto Python's (or numpy's!) binary floats.
>
> For the second, we'd have to dig into exactly what numpy's `arange()` does. Very few of the numbers you're working with are exactly representable in binary floating point except for 0.0. For example, "0.001" is approximated by a binary float whose exact decimal value is
>
> 0.001000000000000000020816681711721685132943093776702880859375
>
> Sometimes the rounded (by machine float arithmetic) multiples of that are exactly representable, but usually not. For example,
>
> >>> 0.001 * 250
> 0.25
>
> rounds to the exactly representable 1/4, and
>
> >>> 0.001 * 750
> 0.75
>
> to the exactly representable 3/4. However, `round()` uses round-to-nearest/even, and then
>
> >>> round(0.25, 1)
> 0.2
> >>> round(0.75, 1)
> 0.8
>
> both resolve the tie to the closest even value (although neither of those _results_ are exactly representable in binary floating-point - although if you go on to multiply them by 10.0, they do round (in hardware) to exactly 2.0 and 8.0).
>
> Note that numpy's arange() docs do warn you against using it ;-)
>
> """
> When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use numpy.linspace for these cases.
> """
>
> ----------
> nosy: +tim.peters
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue41198>
> _______________________________________
msg372887 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2020-07-02 21:45
Thank you for your long and detailed bug report, but please post one issue per bug report.

Tim, we agree that the notion of significant figures is irrelevant; is Carlos' even/odd test sufficiently flawed that we should close this bug report, or keep it open to investigate the rounding bias issue? My feeling is that it is sufficiently flawed that we can just close this, but it's too early in the morning for me to articulate why :-)
msg372888 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2020-07-02 21:56
I assumed Mark would tell us what's up with the arange() oddity, so let's see whether he does.  There is no truly good way to generate "evenly spaced" binary floats using a non-representable conceptual decimal delta.  The dumbass ;-) way doesn't show a discrepancy in pure Python:

>>> num = ne = no = 0
>>> d = 0.001
>>> while num < 1.0:
...     digit = int(round(num, 1) * 10)
...     if digit & 1:
...         no += 1
...     else:
...         ne += 1
...     num += d
>>> ne, no
(500, 500)

However, a somewhat less naive way does show a discrepancy, but less so than what arange() apparently does:

>>> ne = no = 0
>>> for i in range(1000):
...     digit = int(round(i * d, 1) * 10)
...     if digit & 1:
...         no += 1
...     else:
...         ne += 1
>>> ne, no
(501, 499)

I assume that's because of the specific nearest/even behavior I already showed for multipliers i=250 and i=750.
msg372890 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2020-07-02 22:11
If you change the starting point of the rounding away from zero, the bias flips back and forth, which is exactly what I would expect from Banker's Rounding:


    def check_bias(start):
        d = 0.001
        ne = no = 0
        for i in range(1000):
            digit = int(round(start + i * d, 1) * 10)
            if digit & 1:
                no += 1
            else:
                ne += 1
        return ne, no


    # Python 3.7
    >>> check_bias(0.0)
    (501, 499)
    >>> check_bias(0.1)
    (500, 500)
    >>> check_bias(0.2)
    (499, 501)
    >>> check_bias(0.3)
    (499, 501)
    >>> check_bias(0.4)
    (500, 500)
    >>> check_bias(0.5)
    (499, 501)
    >>> check_bias(0.6)
    (501, 499)


I ran the same check_bias in Python 2.7, which doesn't use bankers rounding, and the bias is consistently in one direction:

    # Python 2.7
    >>> check_bias(0.0)
    (500, 500)
    >>> check_bias(0.1)
    (499, 501)
    >>> check_bias(0.2)
    (498, 502)
    >>> check_bias(0.3)
    (498, 502)
    >>> check_bias(0.4)
    (499, 501)
    >>> check_bias(0.5)
    (498, 502)
    >>> check_bias(0.6)
    (500, 500)
msg372894 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2020-07-03 00:09
Cool! So the only thing surprising to me here is just how far off balance the arange() run was.  So I'd like to keep this open long enough for Mark to notice, just in case it's pointing to something fishy in numpy.
msg372941 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2020-07-03 16:23
One note: in the original post, not only are the values being tested coming from NumPy's arange, but round(x[i],1) is testing *NumPy's* rounding functionality, not Python's. x[i] has type np.float64, and while np.float64 does inherit from Python's float, it also overrides float.__round__ with its own implementation (that essentially amounts to scale-by-power-of-ten, round-to-nearest-int, unscale, just like Python used to do in the bad old days). So errors from arange plus NumPy's non-correctly-rounded round means that all bets are off on what happens for values that _look_ as though they're ties when shown in decimal, but aren't actually ties thanks to the what-you-see-is-not-what-you-get nature of binary floating-point.

On arange in particular, I've never looked closely into the implementation; it's never noticeably not been "close enough" (i.e., accurate to within a few ulps either way), and I've never needed it or expected it to be perfectly correctly rounded. Now that it's been brought up, I'll take a look. (But that shouldn't keep this issue open, since that's a pure NumPy issue.)

Honestly, given the usual floating-point imprecision issues, I'm surprised that the balance is coming out as evenly as it is in Tim's and Steven's experiments. I can see why it might work for a single binade, but I'm at a loss to explain why you'd expect a perfect balance across several binades. 

For example: if you're looking at values of the form 0.xx5 in the binade [0.5, 1.0], and rounding those to two decimal places, you'd expect perfect parity, because if you pair the values from [0.5, 0.75] with the reverse of the values from [0.75, 1.0], in each pair exactly one of the two values will round up, and one down (the paired values always add up to *exactly* 1.5, with no rounding, so the errors from the decimal-to-binary rounding will always go in opposite directions). For example 0.505 rounds up, and dually 0.995 rounds down. (But whether the pair gives (up, down) or (down, up) will depend on exactly which way the rounding went when determining the nearest binary64 float, so will be essentially unpredictable.)

    >>> test_values = [x/1000 for x in range(505, 1000, 10)]
    >>> len(test_values)  # total count of values
    50
    >>> sum(round(val, 2) > val for val in test_values)  # number rounding up
    25

But then you need to make a similar argument for the next binade down: [0.25, 0.5] (which doesn't work at all in this case, because that binade contains an odd number of values).

Nevertheless, this *does* seem to work, and I haven't yet found a good explanation why. Any offers?

    >>> k = 8
    >>> test_values = [x/10**(k+1) for x in range(5, 10**(k+1), 10)]
    >>> sum(round(val, k) > val for val in test_values)
    50000000

BTW, yes, I think this issue can be closed.
msg372952 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2020-07-03 18:53
Just for fun, I posted a Stack Overflow question: https://stackoverflow.com/q/62721186/270986

Let's close this here.
msg372954 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2020-07-03 20:42
Thanks, Mark! I didn't even know __round__ had become a dunder method.

For the rest, I'll follow StackOverflow - I don't have an instant answer, and the instant answers I _had_ didn't survive second thoughts ;-)
History
Date User Action Args
2022-04-11 14:59:33adminsetgithub: 85370
2020-07-03 20:42:11tim.peterssetmessages: + msg372954
2020-07-03 18:53:01mark.dickinsonsetstatus: open -> closed
resolution: not a bug
messages: + msg372952

stage: resolved
2020-07-03 16:23:27mark.dickinsonsetmessages: + msg372941
2020-07-03 00:09:27tim.peterssetmessages: + msg372894
2020-07-02 22:11:37steven.dapranosetmessages: + msg372890
2020-07-02 21:56:36tim.peterssetmessages: + msg372888
2020-07-02 21:45:53steven.dapranosetnosy: + steven.daprano
messages: + msg372887
2020-07-02 21:45:37Carlos Nevessetmessages: + msg372886
2020-07-02 21:22:30tim.peterssetnosy: + tim.peters
messages: + msg372884
2020-07-02 20:06:26Carlos Nevescreate