classification
Title: str.format() produces different output on different platforms (Py30a2)
Type: behavior Stage:
Components: Interpreter Core Versions: Python 3.0, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, christian.heimes, eric.smith, gvanrossum, mark, mark.dickinson
Priority: normal Keywords:

Created on 2007-12-12 09:04 by mark, last changed 2008-03-10 00:17 by eric.smith. This issue is now closed.

Files
File name Uploaded Description Edit
pystrtod.diff mark, 2007-12-16 19:10
pystrtod.c mark, 2007-12-16 19:10
pystrtod.c mark, 2007-12-17 01:09
test_float.diff mark, 2007-12-17 08:40
test_float.py mark, 2007-12-17 08:40
pystrtod.c mark, 2007-12-17 08:56
pystrtod.diff mark, 2007-12-17 08:56
pystrtod.c.diff mark, 2007-12-18 08:47
test_float.py.diff mark, 2007-12-18 08:47
Messages (20)
msg58485 - (view) Author: Mark Summerfield (mark) * Date: 2007-12-12 09:04
I don't know if this is a bug, but it is certainly a difference in
behavior between platforms:

Python 3.0a2 on linux2:
>>> "{0:.3e}".format(123.45678901)
'1.235e+02'

Python 3.0a2 on win32:
>>> "{0:.3e}".format(123.45678901)
'1.235e+002'

It seems to me that str.format() should produce consistent results
across platforms, but I don't think the PEP says anything either way.
msg58497 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-12-12 15:13
Again, a (not unreasonable) feature request.  AFAIK %e behaves the same
way.  I'm sure if you submitted a patch it would be accepted happily.
msg58499 - (view) Author: Mark Summerfield (mark) * Date: 2007-12-12 16:41
On 2007-12-12, Guido van Rossum wrote:
> Guido van Rossum added the comment:
>
> Again, a (not unreasonable) feature request.  AFAIK %e behaves the same
> way.  I'm sure if you submitted a patch it would be accepted happily.

Unfortunately, I can't---I haven't programmed C in more than a decade,
and don't know Python's C API, so I doubt I could write anything in C
that would actually work! Nowadays I only program in Python and C++:-)
msg58665 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-12-15 21:14
Guido is right. On Linux the system's sprintf() family prints %e, %g and
%f with two or three digits while Windows always uses three digits:

Linux
>>> "%e" % 1e1
'1.000000e+01'
>>> "%e" % 1e10
'1.000000e+10'
>>> "%e" % 1e100
'1.000000e+100'

Windows
>>> "%e" % 1e1
'1.000000e+001'
>>> "%e" % 1e10
'1.000000e+010'
>>> "%e" % 1e100
'1.000000e+100'

The output could be changed in any of the functions:
Objects/floatobject.h:format_double()
Python/pystrtod.c:PyOS_ascii_formatd()
Python/mysnprint.c:PyOS_snprintf()
msg58666 - (view) Author: Mark Summerfield (mark) * Date: 2007-12-15 21:27
On 2007-12-15, Christian Heimes wrote:
> Christian Heimes added the comment:
>
> Guido is right. On Linux the system's sprintf() family prints %e, %g and
> %f with two or three digits while Windows always uses three digits:
>
> Linux
> >>> "%e" % 1e1
> '1.000000e+01'
> >>> "%e" % 1e10
> '1.000000e+10'
> >>> "%e" % 1e100
> '1.000000e+100'
>
> Windows
> >>> "%e" % 1e1
> '1.000000e+001'
> >>> "%e" % 1e10
> '1.000000e+010'
> >>> "%e" % 1e100
> '1.000000e+100'
>
> The output could be changed in any of the functions:
> Objects/floatobject.h:format_double()
> Python/pystrtod.c:PyOS_ascii_formatd()
> Python/mysnprint.c:PyOS_snprintf()

It seems to me that Python should provide consistent results across
platforms wherever possible and that this is a gratuitous inconsistency
that makes cross-platform testing less convenient than it need be.

I'll take a look at those functions next week.
msg58667 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-12-15 22:59
Mark Summerfield wrote:
> It seems to me that Python should provide consistent results across
> platforms wherever possible and that this is a gratuitous inconsistency
> that makes cross-platform testing less convenient than it need be.
> 
> I'll take a look at those functions next week.

It should be fixed in the trunk and merged into py3k. 2.6 suffers from
the same problem.

By the way I have another pending patch which adds consistent handling
of "nan" and "inf" on all platforms to float.

Christian
msg58674 - (view) Author: Mark Summerfield (mark) * Date: 2007-12-16 19:10
On 2007-12-15, Christian Heimes wrote:
> Christian Heimes added the comment:
>
> Mark Summerfield wrote:
> > It seems to me that Python should provide consistent results across
> > platforms wherever possible and that this is a gratuitous inconsistency
> > that makes cross-platform testing less convenient than it need be.
> >
> > I'll take a look at those functions next week.
>
> It should be fixed in the trunk and merged into py3k. 2.6 suffers from
> the same problem.
>
> By the way I have another pending patch which adds consistent handling
> of "nan" and "inf" on all platforms to float.

Hi Christian,

I've added some code to pystrtod.c's PyOS_ascii_formatd() function that
ensures that the exponent is always at least 3 digits, so long as the
buffer passed in has room.

Although I have svn access, this was granted to me by Georg Brandl only
for doing documentation edits, so I don't feel that I can submit code
patches myself---and in any case my C is rusty, so I would prefer my
code was peer reviewed anyway. Would you be willing to add the patch for
me, assuming you are happy with it?

I've attached my modified pystrtod.c and also pystrtod.diff which shows
the diff against Python 30a2. My code is at the end of the function all
in one lump so it is easy to see what I've done. (I've assumed ANSI C,
so have declared some local variables in my code block rather than at
the top of the function: start, exponent_digit_count, and zeros; they
could all be moved if necessary.)

I hope this helps:-)
msg58680 - (view) Author: Mark Summerfield (mark) * Date: 2007-12-17 01:09
On 2007-12-15, Christian Heimes wrote:
> Christian Heimes added the comment:
>
> Mark Summerfield wrote:
> > It seems to me that Python should provide consistent results across
> > platforms wherever possible and that this is a gratuitous inconsistency
> > that makes cross-platform testing less convenient than it need be.
> >
> > I'll take a look at those functions next week.
>
> It should be fixed in the trunk and merged into py3k. 2.6 suffers from
> the same problem.
>
> By the way I have another pending patch which adds consistent handling
> of "nan" and "inf" on all platforms to float.

Hi Christian,

I made two mistakes (that I know of)---(1) I forgot that 'g' format can
produce an exponent string, and (2) I did a wrong calculation to ensure
that I didn't overflow the buffer. (Even with those mistakes Python's
test_float and test_fpformat passed fine, as did my own tests.) Anyway,
here's the fixed and hopefully final block of code. The first correction
affects the first if statement, and the second correction affects the
third if statement.

        /* Ensure that the exponent is at least 3 digits,
	   providing the buffer is large enough for the extra zeros. */
        if (format_char == 'e' || format_char == 'E' ||
	    format_char == 'g' || format_char == 'G') {
            p = buffer;
            while (*p && *p != 'e' && *p != 'E')
                ++p;
            if (*p && (*(p + 1) == '-' || *(p + 1) == '+')) {
		p += 2;
                char *start = p;
                int exponent_digit_count = 0;
                while (*p && isdigit((unsigned char)*p)) {
                    ++p;
                    ++exponent_digit_count;
                }
                int zeros = 3 - exponent_digit_count;
                if (exponent_digit_count && zeros > 0 &&
		    start + zeros + exponent_digit_count + 1
		    < buffer + buf_len) {
                    p = start;
                    memmove(p + zeros, p, exponent_digit_count + 1);
                    int i = 0;
                    for (; i < zeros; ++i)
                        *p++ = '0';
                }
            }
        }

I've also attached the complete pystrtod.c file with the corrections.
msg58685 - (view) Author: Mark Summerfield (mark) * Date: 2007-12-17 08:40
Attached is new version of test_float.py with a few tests to check
str.format() with exponents formats, plus a diff. They test that the
exponent is always 3 digits and that the case of the e in the format is
respected.
msg58687 - (view) Author: Mark Summerfield (mark) * Date: 2007-12-17 08:56
My C is rusty! Attached is new pystrtod.c & diff, this time using
memset() instead of looping to padd with zeros.
msg58691 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-12-17 12:41
Hi Mark!

In general the patch is fine but it has some small issues.

* Your patches are all reversed. They remove (-) the new lines instead
of adding (+) them. Why aren't you using svn diff > file.patch?
* You are mixing tabs with spaces. All 2.6 C files and most 3.0 C files
are still using tabs.
* You forgot about %f. For large values the format characters f and F
are using the exponent display, too "%f" % 1e60 == '1e+60'
* You cannot assume that char is unsigned. Use Py_CHARMAP(char) instead.
I think that you can make the code more readable when you do format_char
= tolower(Py_CHARMAP(format_char)); first.
* The code is not C89 conform. The standards dictate that you cannot
declare a var in the middle of a block. New var must be declared right
after the {
msg58729 - (view) Author: Mark Summerfield (mark) * Date: 2007-12-18 08:47
On 2007-12-17, Christian Heimes wrote:
> Christian Heimes added the comment:
>
> Hi Mark!
>
> In general the patch is fine but it has some small issues.
>
> * Your patches are all reversed. They remove (-) the new lines instead
> of adding (+) them. Why aren't you using svn diff > file.patch?

I didn't know about that. Have now used it.

> * You are mixing tabs with spaces. All 2.6 C files and most 3.0 C files
> are still using tabs.

Okay, have now switched to tabs.

> * You forgot about %f. For large values the format characters f and F
> are using the exponent display, too "%f" % 1e60 == '1e+60'

Good point; I now search for 'e' or 'E' in any number.

> * You cannot assume that char is unsigned. Use Py_CHARMAP(char) instead.
> I think that you can make the code more readable when you do format_char
> = tolower(Py_CHARMAP(format_char)); first.

I don't refer to format_char any more.

> * The code is not C89 conform. The standards dictate that you cannot
> declare a var in the middle of a block. New var must be declared right
> after the {

I didn't know that. I've now moved the variable declarations.

I've attached the diff you asked for, plus a diff for the test_float.py
file -- and I've done the changes in relation to 2.6 trunk since there's
nothing 3.0-specific.

Hope this is now okay.
msg62506 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-02-17 21:41
Eric, because of this issue the windows buildbots turned to red.
Does the proposed patch still apply? of should be make the tests more
tolerant?
msg62508 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2008-02-17 22:34
The PEP 3101 float formatting code (in Objects/stringlib/formatter.h)
uses PyOS_ascii_formatd for all specifier codes except 'n'.  So applying
the patch would fix the issue that was originally brought up in msg58485.

I think the approach of the patch (if not its content, I haven't
inspected it yet) is correct.  Fix the underlying code and benefit from
this everywhere.  I don't think we should change the tests to be more
tolerant, I think we should be consistent across platforms.

My only concern is breaking code in the wild.  This seems like a change
with wide-reaching implications.

I think 'n' should also be addressed.  It calls PyOS_snprintf directly.

I'll review the patch and comment back here.  Unfortunately, I don't
have a Windows box set up for testing.
msg62542 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-02-18 19:44
I know I'm coming a bit late to this discussion, but I wanted to point
out that the C99 standard does actually specify how many digits should
be in the exponent of a "%e"-formatted number:

In section 7.19.6, in the documentation for fprintf, it says:

"The exponent always contains at least two digits, and only as many more
digits as necessary to represent the exponent."

Not that that's necessarily a reason for Python to do the same :)
msg62548 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2008-02-18 22:33
Given Mark Dickinson's input, I think we should follow it.  That
effectively means leaving the Linux/MacOS input as is, and modifying the
Windows output.  I'll work up a patch, but I'd still like to get some
input on changing the output of existing, working code.
msg62556 - (view) Author: Mark Summerfield (mark) * Date: 2008-02-19 07:58
On 2008-02-18, Mark Dickinson wrote:
> Mark Dickinson added the comment:
>
> I know I'm coming a bit late to this discussion, but I wanted to point
> out that the C99 standard does actually specify how many digits should
> be in the exponent of a "%e"-formatted number:
>
> In section 7.19.6, in the documentation for fprintf, it says:
>
> "The exponent always contains at least two digits, and only as many more
> digits as necessary to represent the exponent."
>
> Not that that's necessarily a reason for Python to do the same :)

I don't really see why Python shouldn't use as few digits as are needed:-)

The patch I submitted just made the exponent at least three digits.

But my aim was cross-platform consistency, and I still think (whether
using the fewest digits, the fewest but at least 2, or whatever other
logic) that the same logic should be used on all platforms since this
makes it easier to test cross-platform applications that output numbers
in exponential form.
msg62565 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-02-19 16:46
I would like Python to follow the C99 rule here. It is practical and
Python has a long tradition of following C where it makes sense.
msg62607 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2008-02-20 23:36
Checked in as r60909.  I started with Mark's patch, but added code to
both increase or decrease the number of zeros, as needed.
msg63430 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2008-03-10 00:17
Issue closed with commit r60909.  Fixed as suggested by Mark Dickinson:
"The exponent always contains at least two digits, and only as many more
digits as necessary to represent the exponent."
History
Date User Action Args
2008-03-10 00:17:11eric.smithsetstatus: open -> closed
resolution: fixed
messages: + msg63430
2008-02-20 23:36:11eric.smithsetmessages: + msg62607
2008-02-19 16:46:28gvanrossumsetmessages: + msg62565
2008-02-19 07:58:47marksetmessages: + msg62556
2008-02-18 22:33:01eric.smithsetmessages: + msg62548
2008-02-18 19:44:42mark.dickinsonsetnosy: + mark.dickinson
messages: + msg62542
2008-02-17 22:34:19eric.smithsetmessages: + msg62508
2008-02-17 21:41:46amaury.forgeotdarcsetnosy: + amaury.forgeotdarc, eric.smith
messages: + msg62506
2008-01-06 22:29:44adminsetkeywords: - py3k
versions: Python 2.6, Python 3.0
2007-12-18 08:47:51marksetfiles: + pystrtod.c.diff, test_float.py.diff
messages: + msg58729
2007-12-17 12:41:27christian.heimessetmessages: + msg58691
2007-12-17 08:56:18marksetfiles: + pystrtod.c, pystrtod.diff
messages: + msg58687
2007-12-17 08:40:58marksetfiles: + test_float.diff, test_float.py
messages: + msg58685
2007-12-17 01:09:50marksetfiles: + pystrtod.c
messages: + msg58680
2007-12-16 19:10:58marksetfiles: + pystrtod.diff, pystrtod.c
messages: + msg58674
2007-12-15 22:59:57christian.heimessetmessages: + msg58667
2007-12-15 21:27:29marksetmessages: + msg58666
2007-12-15 21:14:18christian.heimessetkeywords: + py3k
nosy: + christian.heimes
messages: + msg58665
versions: + Python 2.6
2007-12-12 16:41:07marksetmessages: + msg58499
2007-12-12 15:13:39gvanrossumsetpriority: normal
nosy: + gvanrossum
messages: + msg58497
2007-12-12 09:04:02markcreate