classification
Title: Numeric formatting inconsistent between int, float and Decimal
Type: behavior Stage: resolved
Components: Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: eric.smith, facundobatista, mamrhein, mark.dickinson, rhettinger, skrah
Priority: normal Keywords:

Created on 2019-12-17 17:48 by mamrhein, last changed 2019-12-19 10:19 by mamrhein. This issue is now closed.

Messages (13)
msg358561 - (view) Author: Michael Amrhein (mamrhein) Date: 2019-12-17 17:48
The __format__ methods of int, float and Decimal (C and Python implementation) do not interpret the Format Specification Mini-Language in the same way:

>>> import decimal as cdec
... cdec.__file__
...
'/usr/lib64/python3.6/decimal.py'
>>> import _pydecimal as pydec
... pydec.__file__
...
'/usr/lib64/python3.6/_pydecimal.py'

>>> i = -1234567890
... f = float(i)
... d = cdec.Decimal(i)
... p = pydec.Decimal(i)
...
>>> # Case 1: no fill, no align, no zeropad
... fmt = "28,"
>>> format(i, fmt)
'              -1,234,567,890'
>>> format(f, fmt)
'            -1,234,567,890.0'
>>> format(d, fmt)
'              -1,234,567,890'
>>> format(p, fmt)
'              -1,234,567,890'

>>> # Case 2: no fill, no align, but zeropad
... fmt = "028,"
>>> format(i, fmt)
'-000,000,000,001,234,567,890'
>>> format(f, fmt)
'-0,000,000,001,234,567,890.0'
>>> format(d, fmt)
'-000,000,000,001,234,567,890'
>>> format(p, fmt)
'-000,000,000,001,234,567,890'

>>> # Case 3: no fill, but align '>' + zeropad
... fmt = ">028,"
>>> format(i, fmt)
'00000000000000-1,234,567,890'
>>> format(f, fmt)
'000000000000-1,234,567,890.0'
>>> format(d, fmt)
ValueError: invalid format string
>>> format(p, fmt)
ValueError: Alignment conflicts with '0' in format specifier: >028,

>>> # Case 4: no fill, but align '=' + zeropad
... fmt = "=028,"
>>> format(i, fmt)
'-000,000,000,001,234,567,890'
>>> format(f, fmt)
'-0,000,000,001,234,567,890.0'
>>> format(d, fmt)
ValueError: invalid format string
>>> format(p, fmt)
ValueError: Alignment conflicts with '0' in format specifier: =028,

>>> # Case 5: fill '0', align '=' + zeropad
... fmt = "0=028,"
>>> format(i, fmt)
'-000,000,000,001,234,567,890'
>>> format(f, fmt)
'-0,000,000,001,234,567,890.0'
>>> format(d, fmt)
ValueError: invalid format string
>>> format(p, fmt)
ValueError: Fill character conflicts with '0' in format specifier: 0=028,

>>> # Case 6: fill ' ', align '=' + zeropad
... fmt = " =028,"
>>> format(i, fmt)
'-              1,234,567,890'
>>> format(f, fmt)
'-            1,234,567,890.0'
>>> format(d, fmt)
ValueError: invalid format string
>>> format(p, fmt)
ValueError: Fill character conflicts with '0' in format specifier:  =028,

>>> # Case 7: fill ' ', align '>' + zeropad
... fmt = " >028,"
>>> format(i, fmt)
'              -1,234,567,890'
>>> format(f, fmt)
'            -1,234,567,890.0'
>>> format(d, fmt)
ValueError: invalid format string
>>> format(p, fmt)
ValueError: Fill character conflicts with '0' in format specifier:  >028,

>>> # Case 8: fill ' ', no align, but zeropad
... fmt = " 028,"
>>> format(i, fmt)
'-000,000,000,001,234,567,890'
>>> format(f, fmt)
'-0,000,000,001,234,567,890.0'
>>> format(d, fmt)
'-000,000,000,001,234,567,890'
>>> format(p, fmt)
'-000,000,000,001,234,567,890'

>>> # Case 9: fill '_', no align, but zeropad
... fmt = "_028,"
>>> format(i, fmt)
ValueError: Invalid format specifier
>>> format(f, fmt)
ValueError: Invalid format specifier
>>> format(d, fmt)
ValueError: invalid format string
>>> format(p, fmt)
ValueError: Invalid format specifier: _028,

>>> # Case 10: fill '_', no align, no zeropad
... fmt = "_28,"
>>> format(i, fmt)
ValueError: Invalid format specifier
>>> format(f, fmt)
ValueError: Invalid format specifier
>>> format(d, fmt)
ValueError: Invalid format string
>>> format(p, fmt)
ValueError: Invalid format specifier: _28,

>>> # Case 11: fill '0', align '>', no zeropad
... fmt = "0>28,"
>>> format(i, fmt)
'00000000000000-1,234,567,890'
>>> format(f, fmt)
'000000000000-1,234,567,890.0'
>>> format(d, fmt)
'00000000000000-1,234,567,890'
>>> format(p, fmt)
'00000000000000-1,234,567,890'

>>> # Case 12: fill '0', align '<', no zeropad
... fmt = "0<28,"
>>> format(i, fmt)
'-1,234,567,89000000000000000'
>>> format(f, fmt)
'-1,234,567,890.0000000000000'
>>> format(d, fmt)
'-1,234,567,89000000000000000'
>>> format(p, fmt)
'-1,234,567,89000000000000000'

>>> # Case 13: fixed-point notation w/o precision
... fmt = "f"
>>> format(f, fmt)
'-1234567890.000000'
>>> format(d, fmt)
'-1234567890'
>>> format(p, fmt)
'-1234567890'

Case 1 & 2:
For a format string not giving a type ("None") the spec says: "Similar to 'g', except that fixed-point notation, when used, has at least one digit past the decimal point." float does follow this rule, Decimal does not.
While this may be regarded as reasonable, it should be noted in the doc. 

Cases 3 to 7:
Both implementations of Decimal do not allow to combine align and zeropad, while int and float do. When also fill is given, int and float ignore zeropad, but use '0' instead of ' ' (default), if not. 
(For an exception see the following case.)
The spec says: "When no explicit alignment is given, preceding the width field by a zero ('0') character enables sign-aware zero-padding for numeric types. This is equivalent to a fill character of '0' with an alignment type of '='." That does not explicitly give a rule for align + zeropad together, but IMHO it suggests to use zeropad *only* if no align is given and that it should *not* overwrite the default fill ' '.

Cases 8 - 10:
The syntax given by the spec IMHO says: no fill without align! There is no mention of an exception for a blank as fill.

Case 11 & 12:
While all implementation "agree" here, combining '0' as fill with align other than '=' gives really odd results.
See also https://bugs.python.org/issue17247.

Case 13:
For fixed-point notation the spec says: "The default precision is 6." float does follow this rule, Decimal does not.
While this may be regarded as reasonable, it should be noted in the doc.
msg358565 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2019-12-17 19:04
Thanks for the report. I think most of this is a documentation issue: we either need to make clear that the formatting documentation applies only to the float type and that Decimal follows its own rules (usually for good reason, namely that it's required to follow Mike Cowlishaw's General Decimal Arithmetic Specification), or adjust the main string formatting documentation to make sure it covers the Decimal type as well as float.

Michael: thank you for including both the _pydecimal and decimal results here. Just to double check, I'm not seeing any differences between just those two here (other than the exact exception messages); are you seeing differences just between those two types that you think shouldn't exist?

In more detail:

Cases 1-2: these look like a documentation issue. The decimal behaviour is definitely desirable here: when possible, we want the string representation to preserve the information about the decimal exponent.

Case 3: This looks like a bug in the float formatting to me: the "0" prefix for the width implies "=" for the alignment, which conflicts with the explicit ">". But making something that previously worked a ValueError is hazardous; perhaps we can accept that in this case the explicitly-given alignment overrides the implicit "=", and modify decimal accordingly.

Case 4: This is a weaker case than case 3, where the implicit alignment matches the explicitly given one; I think we should fix decimal to accept this, since there's no ambiguity.

Case 5: Same reasoning as case 4: let's fix decimal.

Case 6: Like case 3: there's a conflict between the implicit fill of "0" and the explicitly given fill of " ". Again, it seems reasonable to let the explicit win over the implicit here.

Case 7: like case 3 and case 6 combined; if we fix those, we might as well also fix this one for consistency, even though at that point the "0" prefix for the width is doing nothing at all.

Cases 8-10: the space in case 8 isn't being interpreted as a fill character here; it's being interpreted as the sign character. I don't think there's anything to fix for these cases.

Cases 11-12: I don't think there's anything to be fixed here: yes, padding on the right with zeros creates misleading results, but I don't think it's Python's job to prevent the user from doing that.

Case 13: This is a doc issue; without a precision, the Decimal output again tries to preserve the exponent information, while also ensuring that the value is printed in a form that doesn't use the exponent.


So cases 3-7 look like the only ones where we should consider changing the behaviour; the issue 17247 that you pointed to proposed tightening the behaviour for float to match Decimal, but I think it would be just as reasonable to loosen the Decimal behaviour to match float.
msg358576 - (view) Author: Michael Amrhein (mamrhein) Date: 2019-12-17 21:06
Mark, I mostly agree with your classifications / proposals.
Regarding cases 3-7 I'd like to suggest a slightly different resolution:
Following my interpretation of the spec ("use zeropad *only* if no align is given"), "<020", ">020", "^020" and "=020" would be treated equivalent to "<20", ">20", "^20" and "=20":

format(12345, "<020") -> '-12345              ', not '-1234500000000000000'
format(12345, ">020") -> '              -12345', not '00000000000000-12345'
format(12345, "^020") -> '       -12345       ', not '0000000-123450000000'
format(12345, "=020") -> '-              12345', not '-0000000000000012345'

For '<', '>' and '^' I can't imagine any code depending on the current behaviour of int and float, so this change is unlikely to break anything. 
For '=' it might be reasonable to make an exception (and state it in the doc), so that "020", "=020", "0=020" and "0=20" are treated as equivalent.
For Decimal this would mean to loosen the behaviour, as you proposed.
msg358581 - (view) Author: Michael Amrhein (mamrhein) Date: 2019-12-17 21:22
Mark, to answer your question regarding the two implementations of Decimals:
No, in the cases I stated above there are no differences in their behaviour.

In order to dig a bit deeper, I wrote a little test:

d = cdec.Decimal("1234567890.1234")
p = pydec.Decimal("1234567890.1234")
for fill in ('', ' ', '_'):
    for align in ('', '<', '>', '^', '='):
        for sign in ('', '-', '+', ' '):
            for zeropad in ('', '0'):
                for min_width in ('', '25'):
                    for thousands_sep in ('', ','):
                        for precision in ('', '.3', '.5'):
                            for ftype in ('', 'e', 'f', 'g'): 
                                fmt = f"{fill}{align}{sign}{zeropad}{min_width}{thousands_sep}{precision}{ftype}"
                                try:
                                    df = format(d, fmt)
                                except ValueError:
                                    df = "<ValueError>"
                                try:
                                    pf = format(p, fmt)
                                except ValueError:
                                    pf = "<ValueError>"
                                if df != pf:
                                    print(fmt, df, pf)

It did not reveal any differences. The two implementations are equivalent regarding the tested combinations.
msg358583 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2019-12-17 21:30
> Regarding cases 3-7 I'd like to suggest a slightly different resolution:

Hmm, yes. I was concentrating on the Decimal results, but I agree that these int/float results are disturbing:

>>> format(12345, "<020")
'12345000000000000000'
>>> format(12345.0, "<020")
'12345.00000000000000'
>>> format(12345, "^020")
'00000001234500000000'

I'm fine with an explicit *fill* character of zero producing misleading results; the user just gets what they ask for in that case. (And the filling could be happening in generic code that isn't aware of the numeric context any more, so it could be tricky to change.)

But having the pre-width 0 be interpreted this way is questionable. Eric: thoughts?
msg358587 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2019-12-17 21:45
I'm not sure what you mean by the "pre-width 0". Is that the "0" here:

format_spec ::=  [[fill]align][sign][#][0][width][grouping_option][.precision][type]

?

And now that I write out the question, I'm sure that's what you mean.

PEP 3101 says "If the width field is preceded by a zero ('0') character, this enables zero-padding. This is equivalent to an alignment type of '=' and a fill character of '0'.". I don't see any other discussion of it in the PEP. In particular, what if you specify a different alignment type with the pre-width 0?

I believe this all originated in C's printf, via PEP 3101. Has anyone checked what C does?

But in any event, I don't think we can change the int formatting, in particular. There's no doubt someone who's relying on every little quirk. I'm less concerned about float and decimal, although we'd still need to be very careful.
msg358590 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2019-12-17 21:54
[Eric]

> Is that the "0" here: [...]

Yes, sorry; that could have been clearer.

> In particular, what if you specify a different alignment type with the pre-width 0?

Right, that's the critical question here. For floats and ints, an explicitly-specified alignment type overrides the implicit "=". But Decimal raises. The Decimal behaviour seems more reasonable, but the float and int behaviours are more historically baked-in, and riskier to change.

And then there's a parallel question with the fill character: should an explicitly-given fill override the "0"-fill character that's implicit in that "[0]"? int and float say "yes". Decimal says "ValueError".
msg358595 - (view) Author: Michael Amrhein (mamrhein) Date: 2019-12-17 22:35
>
> ... Has anyone checked what C does?
>

#include <stdio.h>
int main() {
    int i = -12345;
    double f = -12345.0;
    printf("%-020d\n", i);
    printf("%020d\n", i);
    printf("%20d\n", i);
    printf("%-020f\n", f);
    printf("%020f\n", f);
    printf("%20f\n", f);
    return 0;
}

Output:

-12345             
-0000000000000012345
              -12345
-12345.000000      
-000000012345.000000
       -12345.000000

https://en.cppreference.com/w/c/io/fprintf:

Each conversion specification has the following format:

  introductory % character 

  (optional) one or more flags that modify the behavior of the conversion: 

    -: the result of the conversion is left-justified within the field (by default it is right-justified)
    +: the sign of signed conversions is always prepended to the result of the conversion (by default the result is preceded by minus only when it is negative)
    space: if the result of a signed conversion does not start with a sign character, or is empty, space is prepended to the result. It is ignored if + flag is present.
    # : alternative form of the conversion is performed. See the table below for exact effects otherwise the behavior is undefined.
    0 : for integer and floating point number conversions, leading zeros are used to pad the field instead of space characters. For integer numbers it is ignored if the precision is explicitly specified. For other conversions using this flag results in undefined behavior. It is ignored if - flag is present. 

Last sentence means that zero-padding is only done when the output is right-aligned. I can't find an equivalent for Pythons '=' align option.
msg358599 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2019-12-17 23:54
I agree that a quick glance in the rear view mirror shows that the design isn't awesome. But I just don't see how we can change it at this point. It is what it is.

And it's no surprise that int and float have the same behavior: they share the same code in Python/formatter_unicode.h. Which means that even if we did want to change one but not the other, we'd have to duplicate a lot of code and live with the maintenance hassle forever.
msg358619 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2019-12-18 10:56
Thanks, Eric. I'm now convinced that we shouldn't weaken the Decimal behaviour, and I agree that it's risky to change the float and int behaviour. So it's sounding as though we're looking at a "won't fix" resolution here.

There are still the documentation issues: the trailing ".0" and the 6-digits after the point for "f" with no precision. Michael: would you be willing to open a separate bug report for those? (It's awkward to track two different issues, potentially needing different resolutions, in a single bugs.python.org issue.)
msg358620 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2019-12-18 11:04
I agree with your approach, Mark.

And Michael: thanks for your report on the C behavior. I just wish we'd thought to look at this 13 years ago when .format() was being discussed.
msg358629 - (view) Author: Michael Amrhein (mamrhein) Date: 2019-12-18 16:53
Mark, Eric,
sometimes the pressure to be backwards compatible is more of a curse than a blessing. But I can live with your decision.
And, yes, I will create two separate issues regarding the docs.
msg358668 - (view) Author: Michael Amrhein (mamrhein) Date: 2019-12-19 10:19
Created new issue for tracking the deficiencies in the documentation:
https://bugs.python.org/issue39096.
History
Date User Action Args
2019-12-19 10:19:41mamrheinsetstatus: open -> closed
resolution: wont fix
messages: + msg358668

stage: resolved
2019-12-18 16:53:16mamrheinsetmessages: + msg358629
2019-12-18 11:04:00eric.smithsetmessages: + msg358620
2019-12-18 10:56:02mark.dickinsonsetmessages: + msg358619
2019-12-17 23:54:22eric.smithsetmessages: + msg358599
2019-12-17 22:35:43mamrheinsetmessages: + msg358595
2019-12-17 21:54:56mark.dickinsonsetmessages: + msg358590
2019-12-17 21:45:28eric.smithsetmessages: + msg358587
2019-12-17 21:30:35mark.dickinsonsetmessages: + msg358583
2019-12-17 21:22:45mamrheinsetmessages: + msg358581
2019-12-17 21:06:49mamrheinsetmessages: + msg358576
2019-12-17 19:08:13mark.dickinsonsetmessages: - msg358567
2019-12-17 19:07:50mark.dickinsonsetnosy: + rhettinger, facundobatista, skrah
messages: + msg358567
2019-12-17 19:04:04mark.dickinsonsetmessages: + msg358565
2019-12-17 18:29:58mark.dickinsonsetnosy: + mark.dickinson
2019-12-17 18:19:21eric.smithsetnosy: + eric.smith
2019-12-17 17:48:02mamrheincreate