Message 326581 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	mark.dickinson
Recipients	MJH, ezio.melotti, jab, jyasskin, mark.dickinson, njs, serhiy.storchaka, tim.peters, vstinner
Date	2018-09-27.18:48:13
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1538074093.88.0.545547206417.issue32956@psf.upfronthosting.co.za>
In-reply-to

Content
[Joshua] > 1. Update the round() docs to make the documentation of this behavior less buried, Sounds reasonable to me; I'm definitely open to documentation improvements. Though it doesn't seem all that buried to me: the round-ties-to-even behaviour is described in the third sentence in the first place I'd look for round documentation (https://docs.python.org/3/library/functions.html#round). It would be misleading to move the information earlier, because the use of round-ties-to-even is specific to the builtin types: user-defined types can do whatever they like via the __round__ magic method. > 2. include a (brief) justification (possibly even just a link to http://wiki.c2.com/?BankersRounding or some more-authoritative document), and Sure, a link to a source on bankers rounding could work. > 3. link to where else this change in Python 3 was discussed more, if anywhere, or else confirm this change was made based on no additional analysis that we can find written down. I'm not aware of much discussion beyond the thread that Serhiy already pointed to. There's a little bit more (but not much) on rounding the py3k mailing list (try a Google search for "site:mail.python.org/pipermail/python-3000 rounding"). > It'd also be interesting to hear if this is something we wish we'd done differently now, but that shouldn't distract from 1, 2, and 3. I can't speak for anyone else, but it's certainly not something I think should have been done differently, with one caveat: the silent and subtle change in behaviour from Python 2 to Python 3 was a bit unpleasant, and a possible source of late-discovered (or undiscovered) bugs. > so maybe changing from round_half_up to round_half_even was necessary for the other improvements [...] No. The change was independent of other fixes and changes. There _is_ quite a history of round changes: fixes for the single-argument round function in odd corner cases (earlier versions of Python used the simple add-half-and-chop algorithm, with gives the wrong answer for 0.4999999999999999 and for 4503599627370497.0 thanks to FPU-level rounding in the add-half step); making two-argument round correctly-rounded in all cases in Python 2.7 and 3.1 via the same dtoa.c machinery used for str<->float conversions; changing the return type of single-argument round in Python 3; making round generic via the __round__ magic method, etc. But none of these required the change in rounding mode. We need to recognise that there are various different contexts where the idea of "rounding" comes into play in a general-purpose language. Some examples: 1. FPU-level rounding for basic floating-point operations (addition, multiplication, sqrt, etc.) 2. Conversion of source-code decimal numeric literals (e.g., in "bad_pi = 3.14") to the _nearest_ exactly representable binary float/double; the notion of _nearest_ needs some way to break ties. 3. Formatting a float for output as a string (format(my_float, ".2f")) 4. Rounding a float to the nearest integer (Python's single-argument "round") 5. Rounding a binary float to some number of decimal places (two-argument round), which is a rather more subtle operation than it might seem at first sight For 1., there's decades of numerical evidence that round-ties-to-even is what you want to do, and that's why IEEE 754 makes it the default rounding mode, and why it's the rounding mode you're likely to be using for numeric work out of the box in any mainstream language. [For one demonstration of where the unbiasedness of round-ties-to-even can matter, see https://stackoverflow.com/a/45245802/270986. Apologies for linking to my own answer here, but it was easily accessible. I'm sure there are many better demonstrations out there.] Case 2 is really a special case of 1. Though not (usually) FPU-supported: you can think of conversion from decimal string to binary floating-point as another primitive floating-point operation, and it's one that's covered by IEEE 754; round-ties-to-even (or at least, some precision- or algorithm-limited _approximation_ to round-ties-to-even) is again a common default across languages and operating systems. Case 3 is also covered by IEEE 754, and I believe that "most" languages use round-ties-to-even here, too. C's fprintf (for example) specifies that e-style, f-style, and g-style formatting should be "correctly rounded" (C99 7.19.6.1p13), where "correctly rounded" means "[...] nearest in value, subject to the current rounding mode [...]" (C99 3.9); in practice, that's usually round-ties-to-even. Java's DecimalFormat uses round-ties-to-even by default (source: https://docs.oracle.com/javase/7/docs/api/java/text/DecimalFormat.html). I haven't checked other languages, but I expect that many of them do something similar. Cases 4 and 5 are mostly what we're arguing about in this issue. It's much less clear to me that the numerical benefits are significant at this level (compared to FPU-level last-bit-rounding, where those benefits are really unarguable). But note that these cases are really just floatified versions of case 3. Indeed, Python 3's current two-argument round algorithm is based directly on the string conversion code used for string formatting. And the use of round-ties-to-even for case 3 is already well established (and was already established long before Python 3.) What happens for these 5 cases in Java? It _looks_ to me as though the first three cases use round-ties-to-even, the fourth uses round-ties-to-away by default, and the last isn't directly supported by the language. (But it's been a long time since I dabbled in Java.) Like I said, I'm not totally convinced about the numerical benefits of round-ties-to-even for user-level round-to-n-decimal-places operations as opposed to FPU-level rounding (though I'm open to persuasion). That's partly because round-to-two-decimal-places (for example) is actually quite a peculiar operation to be doing on a binary float in the first place, and in practice ties don't really appear or affect the behaviour that often. (It might look as though you have a value "2.675" in your dataframe, but on a typical machine that value is actually being stored as "2.67499999999999982236431605997495353221893310546875", so it doesn't matter one whit whether you're using round-ties-to-even or round-ties-to-away: under correct rounding, both are going to give you the surprising result of 2.67 when you round to two decimal places). What I really like about Python's choice is the consistency. In Python, since Python 3, all five cases of rounding described above use round-ties-to-even. In Python 2, float formatting used round-ties-to-even (most of the time in practice, though for Python 2.6 and earlier the exact behaviour depended on the system), while "round" used round-ties-to-away for a very closely-related operation, and there are bug reports and StackOverflow questions from users surprised by the discrepancy between float formatting and two-argument round. In Python 3, we have the pleasant situation that "round" and string formatting agree.

[Joshua]

> 1. Update the round() docs to make the documentation of this behavior less buried,

Sounds reasonable to me; I'm definitely open to documentation improvements. Though it doesn't seem all that buried to me: the round-ties-to-even behaviour is described in the third sentence in the first place I'd look for round documentation (https://docs.python.org/3/library/functions.html#round). It would be misleading to move the information earlier, because the use of round-ties-to-even is specific to the builtin types: user-defined types can do whatever they like via the __round__ magic method.

> 2. include a (brief) justification (possibly even just a link to http://wiki.c2.com/?BankersRounding or some more-authoritative document), and

Sure, a link to a source on bankers rounding could work.

> 3. link to where else this change in Python 3 was discussed more, if anywhere, or else confirm this change was made based on no additional analysis that we can find written down.

I'm not aware of much discussion beyond the thread that Serhiy already pointed to. There's a little bit more (but not much) on rounding the py3k mailing list (try a Google search for "site:mail.python.org/pipermail/python-3000 rounding").

> It'd also be interesting to hear if this is something we wish we'd done differently now, but that shouldn't distract from 1, 2, and 3.

I can't speak for anyone else, but it's certainly not something I think should have been done differently, with one caveat: the silent and subtle change in behaviour from Python 2 to Python 3 was a bit unpleasant, and a possible source of late-discovered (or undiscovered) bugs.

> so maybe changing from round_half_up to round_half_even was necessary for the other improvements [...]

No. The change was independent of other fixes and changes. There _is_ quite a history of round changes: fixes for the single-argument round function in odd corner cases (earlier versions of Python used the simple add-half-and-chop algorithm, with gives the wrong answer for 0.4999999999999999 and for 4503599627370497.0 thanks to FPU-level rounding in the add-half step); making two-argument round correctly-rounded in all cases in Python 2.7 and 3.1 via the same dtoa.c machinery used for str<->float conversions; changing the return type of single-argument round in Python 3; making round generic via the __round__ magic method, etc. But none of these required the change in rounding mode.

We need to recognise that there are various different contexts where the idea of "rounding" comes into play in a general-purpose language. Some examples:

1. FPU-level rounding for basic floating-point operations (addition, multiplication, sqrt, etc.)
2. Conversion of source-code decimal numeric literals (e.g., in "bad_pi = 3.14") to the _nearest_ exactly representable binary float/double; the notion of _nearest_ needs some way to break ties.
3. Formatting a float for output as a string (format(my_float, ".2f"))
4. Rounding a float to the nearest integer (Python's single-argument "round")
5. Rounding a binary float to some number of decimal places (two-argument round), which is a rather more subtle operation than it might seem at first sight

For 1., there's decades of numerical evidence that round-ties-to-even is what you want to do, and that's why IEEE 754 makes it the default rounding mode, and why it's the rounding mode you're likely to be using for numeric work out of the box in any mainstream language. [For one demonstration of where the unbiasedness of round-ties-to-even can matter, see https://stackoverflow.com/a/45245802/270986. Apologies for linking to my own answer here, but it was easily accessible. I'm sure there are many better demonstrations out there.]

Case 2 is really a special case of 1. Though not (usually) FPU-supported: you can think of conversion from decimal string to binary floating-point as another primitive floating-point operation, and it's one that's covered by IEEE 754; round-ties-to-even (or at least, some precision- or algorithm-limited _approximation_ to round-ties-to-even) is again a common default across languages and operating systems.

Case 3 is also covered by IEEE 754, and I believe that "most" languages use round-ties-to-even here, too. C's fprintf (for example) specifies that e-style, f-style, and g-style formatting should be "correctly rounded" (C99 7.19.6.1p13), where "correctly rounded" means "[...] nearest in value, subject to the current rounding mode [...]" (C99 3.9); in practice, that's usually round-ties-to-even. Java's DecimalFormat uses round-ties-to-even by default (source:  https://docs.oracle.com/javase/7/docs/api/java/text/DecimalFormat.html). I haven't checked other languages, but I expect that many of them do something similar.

Cases 4 and 5 are mostly what we're arguing about in this issue. It's much less clear to me that the numerical benefits are significant at this level (compared to FPU-level last-bit-rounding, where those benefits are really unarguable). But note that these cases are really just floatified versions of case 3. Indeed, Python 3's current two-argument round algorithm is based directly on the string conversion code used for string formatting. And the use of round-ties-to-even for case 3 is already well established (and was already established long before Python 3.)

What happens for these 5 cases in Java? It _looks_ to me as though the first three cases use round-ties-to-even, the fourth uses round-ties-to-away by default, and the last isn't directly supported by the language. (But it's been a long time since I dabbled in Java.)

Like I said, I'm not totally convinced about the numerical benefits of round-ties-to-even for user-level round-to-n-decimal-places operations as opposed to FPU-level rounding (though I'm open to persuasion). That's partly because round-to-two-decimal-places (for example) is actually quite a peculiar operation to be doing on a binary float in the first place, and in practice ties don't really appear or affect the behaviour that often. (It might *look* as though you have a value "2.675" in your dataframe, but on a typical machine that value is actually being stored as "2.67499999999999982236431605997495353221893310546875", so it doesn't matter one whit whether you're using round-ties-to-even or round-ties-to-away: under correct rounding, both are going to give you the surprising result of 2.67 when you round to two decimal places).

What I really like about Python's choice is the consistency. In Python, since Python 3, all five cases of rounding described above use round-ties-to-even. In Python 2, float formatting used round-ties-to-even (most of the time in practice, though for Python 2.6 and earlier the exact behaviour depended on the system), while "round" used round-ties-to-away for a very closely-related operation, and there are bug reports and StackOverflow questions from users surprised by the discrepancy between float formatting and two-argument round. In Python 3, we have the pleasant situation that "round" and string formatting agree.

History
Date	User	Action	Args
2018-09-27 18:48:14	mark.dickinson	set	recipients: + mark.dickinson, tim.peters, vstinner, jyasskin, ezio.melotti, njs, jab, serhiy.storchaka, MJH
2018-09-27 18:48:13	mark.dickinson	set	messageid: <1538074093.88.0.545547206417.issue32956@psf.upfronthosting.co.za>
2018-09-27 18:48:13	mark.dickinson	link	issue32956 messages
2018-09-27 18:48:13	mark.dickinson	create