Issue 1600: str.format() produces different output on different platforms (Py30a2)

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/45941

classification

Title:	str.format() produces different output on different platforms (Py30a2)
Type:	behavior	Stage:
Components:	Interpreter Core	Versions:	Python 3.0, Python 2.6

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	amaury.forgeotdarc, christian.heimes, eric.smith, gvanrossum, mark, mark.dickinson
Priority:	normal	Keywords:

Created on 2007-12-12 09:04 by mark, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
pystrtod.diff	mark, 2007-12-16 19:10
pystrtod.c	mark, 2007-12-16 19:10
pystrtod.c	mark, 2007-12-17 01:09
test_float.diff	mark, 2007-12-17 08:40
test_float.py	mark, 2007-12-17 08:40
pystrtod.c	mark, 2007-12-17 08:56
pystrtod.diff	mark, 2007-12-17 08:56
pystrtod.c.diff	mark, 2007-12-18 08:47
test_float.py.diff	mark, 2007-12-18 08:47

Messages (20)
msg58485 - (view)	Author: Mark Summerfield (mark) *	Date: 2007-12-12 09:04
I don't know if this is a bug, but it is certainly a difference in behavior between platforms: Python 3.0a2 on linux2: >>> "{0:.3e}".format(123.45678901) '1.235e+02' Python 3.0a2 on win32: >>> "{0:.3e}".format(123.45678901) '1.235e+002' It seems to me that str.format() should produce consistent results across platforms, but I don't think the PEP says anything either way.
msg58497 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-12-12 15:13
Again, a (not unreasonable) feature request. AFAIK %e behaves the same way. I'm sure if you submitted a patch it would be accepted happily.
msg58499 - (view)	Author: Mark Summerfield (mark) *	Date: 2007-12-12 16:41
On 2007-12-12, Guido van Rossum wrote: > Guido van Rossum added the comment: > > Again, a (not unreasonable) feature request. AFAIK %e behaves the same > way. I'm sure if you submitted a patch it would be accepted happily. Unfortunately, I can't---I haven't programmed C in more than a decade, and don't know Python's C API, so I doubt I could write anything in C that would actually work! Nowadays I only program in Python and C++:-)
msg58665 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-12-15 21:14
Guido is right. On Linux the system's sprintf() family prints %e, %g and %f with two or three digits while Windows always uses three digits: Linux >>> "%e" % 1e1 '1.000000e+01' >>> "%e" % 1e10 '1.000000e+10' >>> "%e" % 1e100 '1.000000e+100' Windows >>> "%e" % 1e1 '1.000000e+001' >>> "%e" % 1e10 '1.000000e+010' >>> "%e" % 1e100 '1.000000e+100' The output could be changed in any of the functions: Objects/floatobject.h:format_double() Python/pystrtod.c:PyOS_ascii_formatd() Python/mysnprint.c:PyOS_snprintf()
msg58666 - (view)	Author: Mark Summerfield (mark) *	Date: 2007-12-15 21:27
On 2007-12-15, Christian Heimes wrote: > Christian Heimes added the comment: > > Guido is right. On Linux the system's sprintf() family prints %e, %g and > %f with two or three digits while Windows always uses three digits: > > Linux > >>> "%e" % 1e1 > '1.000000e+01' > >>> "%e" % 1e10 > '1.000000e+10' > >>> "%e" % 1e100 > '1.000000e+100' > > Windows > >>> "%e" % 1e1 > '1.000000e+001' > >>> "%e" % 1e10 > '1.000000e+010' > >>> "%e" % 1e100 > '1.000000e+100' > > The output could be changed in any of the functions: > Objects/floatobject.h:format_double() > Python/pystrtod.c:PyOS_ascii_formatd() > Python/mysnprint.c:PyOS_snprintf() It seems to me that Python should provide consistent results across platforms wherever possible and that this is a gratuitous inconsistency that makes cross-platform testing less convenient than it need be. I'll take a look at those functions next week.
msg58667 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-12-15 22:59
Mark Summerfield wrote: > It seems to me that Python should provide consistent results across > platforms wherever possible and that this is a gratuitous inconsistency > that makes cross-platform testing less convenient than it need be. > > I'll take a look at those functions next week. It should be fixed in the trunk and merged into py3k. 2.6 suffers from the same problem. By the way I have another pending patch which adds consistent handling of "nan" and "inf" on all platforms to float. Christian
msg58674 - (view)	Author: Mark Summerfield (mark) *	Date: 2007-12-16 19:10
On 2007-12-15, Christian Heimes wrote: > Christian Heimes added the comment: > > Mark Summerfield wrote: > > It seems to me that Python should provide consistent results across > > platforms wherever possible and that this is a gratuitous inconsistency > > that makes cross-platform testing less convenient than it need be. > > > > I'll take a look at those functions next week. > > It should be fixed in the trunk and merged into py3k. 2.6 suffers from > the same problem. > > By the way I have another pending patch which adds consistent handling > of "nan" and "inf" on all platforms to float. Hi Christian, I've added some code to pystrtod.c's PyOS_ascii_formatd() function that ensures that the exponent is always at least 3 digits, so long as the buffer passed in has room. Although I have svn access, this was granted to me by Georg Brandl only for doing documentation edits, so I don't feel that I can submit code patches myself---and in any case my C is rusty, so I would prefer my code was peer reviewed anyway. Would you be willing to add the patch for me, assuming you are happy with it? I've attached my modified pystrtod.c and also pystrtod.diff which shows the diff against Python 30a2. My code is at the end of the function all in one lump so it is easy to see what I've done. (I've assumed ANSI C, so have declared some local variables in my code block rather than at the top of the function: start, exponent_digit_count, and zeros; they could all be moved if necessary.) I hope this helps:-)
msg58680 - (view)	Author: Mark Summerfield (mark) *	Date: 2007-12-17 01:09
On 2007-12-15, Christian Heimes wrote: > Christian Heimes added the comment: > > Mark Summerfield wrote: > > It seems to me that Python should provide consistent results across > > platforms wherever possible and that this is a gratuitous inconsistency > > that makes cross-platform testing less convenient than it need be. > > > > I'll take a look at those functions next week. > > It should be fixed in the trunk and merged into py3k. 2.6 suffers from > the same problem. > > By the way I have another pending patch which adds consistent handling > of "nan" and "inf" on all platforms to float. Hi Christian, I made two mistakes (that I know of)---(1) I forgot that 'g' format can produce an exponent string, and (2) I did a wrong calculation to ensure that I didn't overflow the buffer. (Even with those mistakes Python's test_float and test_fpformat passed fine, as did my own tests.) Anyway, here's the fixed and hopefully final block of code. The first correction affects the first if statement, and the second correction affects the third if statement. /* Ensure that the exponent is at least 3 digits, providing the buffer is large enough for the extra zeros. / if (format_char == 'e' \|\| format_char == 'E' \|\| format_char == 'g' \|\| format_char == 'G') { p = buffer; while (p && p != 'e' && p != 'E') ++p; if (p && ((p + 1) == '-' \|\| (p + 1) == '+')) { p += 2; char start = p; int exponent_digit_count = 0; while (p && isdigit((unsigned char)p)) { ++p; ++exponent_digit_count; } int zeros = 3 - exponent_digit_count; if (exponent_digit_count && zeros > 0 && start + zeros + exponent_digit_count + 1 < buffer + buf_len) { p = start; memmove(p + zeros, p, exponent_digit_count + 1); int i = 0; for (; i < zeros; ++i) *p++ = '0'; } } } I've also attached the complete pystrtod.c file with the corrections.
msg58685 - (view)	Author: Mark Summerfield (mark) *	Date: 2007-12-17 08:40
Attached is new version of test_float.py with a few tests to check str.format() with exponents formats, plus a diff. They test that the exponent is always 3 digits and that the case of the e in the format is respected.
msg58687 - (view)	Author: Mark Summerfield (mark) *	Date: 2007-12-17 08:56
My C is rusty! Attached is new pystrtod.c & diff, this time using memset() instead of looping to padd with zeros.
msg58691 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-12-17 12:41
Hi Mark! In general the patch is fine but it has some small issues. * Your patches are all reversed. They remove (-) the new lines instead of adding (+) them. Why aren't you using svn diff > file.patch? * You are mixing tabs with spaces. All 2.6 C files and most 3.0 C files are still using tabs. * You forgot about %f. For large values the format characters f and F are using the exponent display, too "%f" % 1e60 == '1e+60' * You cannot assume that char is unsigned. Use Py_CHARMAP(char) instead. I think that you can make the code more readable when you do format_char = tolower(Py_CHARMAP(format_char)); first. * The code is not C89 conform. The standards dictate that you cannot declare a var in the middle of a block. New var must be declared right after the {
msg58729 - (view)	Author: Mark Summerfield (mark) *	Date: 2007-12-18 08:47
On 2007-12-17, Christian Heimes wrote: > Christian Heimes added the comment: > > Hi Mark! > > In general the patch is fine but it has some small issues. > > * Your patches are all reversed. They remove (-) the new lines instead > of adding (+) them. Why aren't you using svn diff > file.patch? I didn't know about that. Have now used it. > * You are mixing tabs with spaces. All 2.6 C files and most 3.0 C files > are still using tabs. Okay, have now switched to tabs. > * You forgot about %f. For large values the format characters f and F > are using the exponent display, too "%f" % 1e60 == '1e+60' Good point; I now search for 'e' or 'E' in any number. > * You cannot assume that char is unsigned. Use Py_CHARMAP(char) instead. > I think that you can make the code more readable when you do format_char > = tolower(Py_CHARMAP(format_char)); first. I don't refer to format_char any more. > * The code is not C89 conform. The standards dictate that you cannot > declare a var in the middle of a block. New var must be declared right > after the { I didn't know that. I've now moved the variable declarations. I've attached the diff you asked for, plus a diff for the test_float.py file -- and I've done the changes in relation to 2.6 trunk since there's nothing 3.0-specific. Hope this is now okay.
msg62506 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2008-02-17 21:41
Eric, because of this issue the windows buildbots turned to red. Does the proposed patch still apply? of should be make the tests more tolerant?
msg62508 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2008-02-17 22:34
The PEP 3101 float formatting code (in Objects/stringlib/formatter.h) uses PyOS_ascii_formatd for all specifier codes except 'n'. So applying the patch would fix the issue that was originally brought up in msg58485. I think the approach of the patch (if not its content, I haven't inspected it yet) is correct. Fix the underlying code and benefit from this everywhere. I don't think we should change the tests to be more tolerant, I think we should be consistent across platforms. My only concern is breaking code in the wild. This seems like a change with wide-reaching implications. I think 'n' should also be addressed. It calls PyOS_snprintf directly. I'll review the patch and comment back here. Unfortunately, I don't have a Windows box set up for testing.
msg62542 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-02-18 19:44
I know I'm coming a bit late to this discussion, but I wanted to point out that the C99 standard does actually specify how many digits should be in the exponent of a "%e"-formatted number: In section 7.19.6, in the documentation for fprintf, it says: "The exponent always contains at least two digits, and only as many more digits as necessary to represent the exponent." Not that that's necessarily a reason for Python to do the same :)
msg62548 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2008-02-18 22:33
Given Mark Dickinson's input, I think we should follow it. That effectively means leaving the Linux/MacOS input as is, and modifying the Windows output. I'll work up a patch, but I'd still like to get some input on changing the output of existing, working code.
msg62556 - (view)	Author: Mark Summerfield (mark) *	Date: 2008-02-19 07:58
On 2008-02-18, Mark Dickinson wrote: > Mark Dickinson added the comment: > > I know I'm coming a bit late to this discussion, but I wanted to point > out that the C99 standard does actually specify how many digits should > be in the exponent of a "%e"-formatted number: > > In section 7.19.6, in the documentation for fprintf, it says: > > "The exponent always contains at least two digits, and only as many more > digits as necessary to represent the exponent." > > Not that that's necessarily a reason for Python to do the same :) I don't really see why Python shouldn't use as few digits as are needed:-) The patch I submitted just made the exponent at least three digits. But my aim was cross-platform consistency, and I still think (whether using the fewest digits, the fewest but at least 2, or whatever other logic) that the same logic should be used on all platforms since this makes it easier to test cross-platform applications that output numbers in exponential form.
msg62565 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2008-02-19 16:46
I would like Python to follow the C99 rule here. It is practical and Python has a long tradition of following C where it makes sense.
msg62607 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2008-02-20 23:36
Checked in as r60909. I started with Mark's patch, but added code to both increase or decrease the number of zeros, as needed.
msg63430 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2008-03-10 00:17
Issue closed with commit r60909. Fixed as suggested by Mark Dickinson: "The exponent always contains at least two digits, and only as many more digits as necessary to represent the exponent."

History
Date	User	Action	Args
2022-04-11 14:56:28	admin	set	github: 45941
2008-03-10 00:17:11	eric.smith	set	status: open -> closed resolution: fixed messages: + msg63430
2008-02-20 23:36:11	eric.smith	set	messages: + msg62607
2008-02-19 16:46:28	gvanrossum	set	messages: + msg62565
2008-02-19 07:58:47	mark	set	messages: + msg62556
2008-02-18 22:33:01	eric.smith	set	messages: + msg62548
2008-02-18 19:44:42	mark.dickinson	set	nosy: + mark.dickinson messages: + msg62542
2008-02-17 22:34:19	eric.smith	set	messages: + msg62508
2008-02-17 21:41:46	amaury.forgeotdarc	set	nosy: + amaury.forgeotdarc, eric.smith messages: + msg62506
2008-01-06 22:29:44	admin	set	keywords: - py3k versions: Python 2.6, Python 3.0
2007-12-18 08:47:51	mark	set	files: + pystrtod.c.diff, test_float.py.diff messages: + msg58729
2007-12-17 12:41:27	christian.heimes	set	messages: + msg58691
2007-12-17 08:56:18	mark	set	files: + pystrtod.c, pystrtod.diff messages: + msg58687
2007-12-17 08:40:58	mark	set	files: + test_float.diff, test_float.py messages: + msg58685
2007-12-17 01:09:50	mark	set	files: + pystrtod.c messages: + msg58680
2007-12-16 19:10:58	mark	set	files: + pystrtod.diff, pystrtod.c messages: + msg58674
2007-12-15 22:59:57	christian.heimes	set	messages: + msg58667
2007-12-15 21:27:29	mark	set	messages: + msg58666
2007-12-15 21:14:18	christian.heimes	set	keywords: + py3k nosy: + christian.heimes messages: + msg58665 versions: + Python 2.6
2007-12-12 16:41:07	mark	set	messages: + msg58499
2007-12-12 15:13:39	gvanrossum	set	priority: normal nosy: + gvanrossum messages: + msg58497
2007-12-12 09:04:02	mark	create