classification
Title: string formatting: normalize negative zero
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: mark.dickinson Nosy List: John Belmonte, eric.smith, jbelmonte, mark.dickinson, serhiy.storchaka, steven.daprano
Priority: normal Keywords: patch

Created on 2021-12-06 12:03 by John Belmonte, last changed 2022-01-22 23:18 by John Belmonte.

Pull Requests
URL Status Linked Edit
PR 30049 open jbelmonte, 2021-12-11 13:24
Messages (21)
msg407792 - (view) Author: John Belmonte (John Belmonte) Date: 2021-12-06 12:03
proposal: add a string formatting option to normalize negative 0 values to 0

use case:  rounded display of a float that is nominally 0, where the distraction of a flashing minus sign from minute changes around 0 is unwanted

example:
>>> '%~5.1f' % -.00001
'  0.0'

format spec before:
  format_spec     ::=  [[fill]align][sign][#][0][width][grouping_option][.precision][type]
after:
  format_spec     ::=  [[fill]align][sign][~][#][0][width][grouping_option][.precision][type]
  where '~' is only allowed for number types

implementation: if '~' is present in the spec, add 0 to the value after applying precision
msg407793 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-12-06 12:15
To normalize negative 0.0 to 0.0 you can just add 0.0. It will work with any method of converting floats to string, there is no need to change all formatting specifications.

>>> x = -0.0
>>> x
-0.0
>>> x + 0.0
0.0
msg407796 - (view) Author: John Belmonte (John Belmonte) Date: 2021-12-06 12:30
> To normalize negative 0.0 to 0.0 you can just add 0.0.

yes, I'm aware-- see "implementation" in the original post

> there is no need to change all formatting specifications

For adding 0 to work, it must be done after the rounding.  That means if you want make use of anything in the current formatting spec regarding precision, normalizing negative zero would need to be a proper option of the formatting spec.
msg407810 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2021-12-06 14:18
It was decided long ago that % formatting would not be enhanced with new features. I think that it is supposed to match the standard C formatting codes, and nothing else.

So this is unlikely to be approved for % formatting.

It *might* be approved for the format method and f-strings. (I suspect that unless you get immediate and uncontroversial agreement from multiple core developers, you may need to take it for further discussion and perhaps even a PEP.)

What you call a "distraction" I consider to be critical part of the display. The numbers you are displaying actually are negative, and rounding them for display does not change that. So they ought to show the minus sign. So I'm not really very sympathetic to this feature request.
msg407822 - (view) Author: John Belmonte (John Belmonte) Date: 2021-12-06 15:10
Here is the same proposal made for C++ `std::format`:

  http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1496r2.pdf

It makes fair arguments for the feature's use, and explains why the problem is hard to work around.

It was withdrawn by the author for C++20, but a consensus proposal is promised for C++23.
msg407857 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2021-12-06 19:03
I'd support having this functionality available for `format` and for f-strings. (As Steven says, changing %-formatting doesn't seem viable.) It really _is_ awkward to do this in any other way, and I'm reliably informed that normal people don't expect to see negative zeros in formatted numeric output.

It did take me a few minutes to get my head around the idea that `f"{-0.01:+.1f}"` would return `"+0.0"` rather than `"-0.0"` or `" 0.0"` or just plain `"0.0"` under this proposal, but I agree that it seems like the only thing that can be consistent and make sense.

I'm not 100% convinced by the particular spelling proposed, but I don't have anything better to suggest. If C++ might be going with a "z", would it make sense to do the same for Python?

I don't forsee any implementation difficulties for float and complex types. For Decimal, we'd need to "own" the string formatting, taking that responsibility away from mpdecimal, but there are already other reasons to do that. Once we've done that, again the implementation doesn't seem onerous.
msg407905 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-12-07 09:54
Well, it makes sense for negative zero produced by rounding.

But if we add a special support for this case, it would be useful to have some control on the type of rounding. Currently floats are rounded to the nearest decimal number, but in some cases it would be better to round up, down, toward zero or infinity (seed for example issue44884).

You can round explicitly before formatting, but this solution is also applicable for this issue:

>>> '%5.1f' % (round(-.00001, 1) + 0.0)
'  0.0'
msg407928 - (view) Author: John Belmonte (John Belmonte) Date: 2021-12-07 12:22
> changing %-formatting doesn't seem viable

I'm concerned about treating %-formatting specially.  As far as float/complex, the logical and efficient place to put this change seems to be PyOS_double_to_string(), which affects all three formatting options.

For example, the dtoa case is as simple as this change to format_float_short():

    /* coerce negative zero to positive */
    if (sign == 1 && ((digits_len == 0 && decpt == -1) ||
                      (digits_len == 1 && digits[0] == '0'))) {
        sign = 0;
    }
msg407929 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2021-12-07 12:53
Sorry John, I don't understand your comment about "treating %-formatting 
specifically". Isn't the point here not to change %-formatting at all?
msg407930 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2021-12-07 12:55
%-formatting already doesn't support some formats that float.__format__ does, for example ','.

So I agree we shouldn't modify %-formatting. I don't have much of an opinion on whether changing __format__ is a good idea or not.
msg407934 - (view) Author: John Belmonte (John Belmonte) Date: 2021-12-07 13:14
I see now.  PyOS_double_to_string() could gain the extra flag to coerce negative zero but, out of the three formatting methods, only format() and f-string would use the flag.
msg407973 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2021-12-07 21:48
PyOS_double_to_string is part of the stable ABI. I don't recall if we're allowed to add new bitfield flags to a stable ABI function. We'd use a new Py_DTSF_NORMALIZE_NEGATIVE_0 flag for this feature.

I suspect we can't add a flag, due to comparability reasons (new code setting the flag being used in old versions of python without it), but we'd need to research. I saw a similar discussion within the last few years, but of course now I can't find it. Maybe old versions would correctly ignore the new bit being set.

This proposal becomes less interesting if we'd need to add a new function to support it. Although I guess we could do something that's internal-only.
msg407990 - (view) Author: John Belmonte (John Belmonte) Date: 2021-12-08 01:51
I'll share a draft PR soon (excluding Decimal), so far it's still looking straightforward.

> Maybe old versions would correctly ignore the new bit being set.

That's one of the benefits of using bit flags in an ABI: backward-compatible extensibility.  (An implementation can defeat it by intentionally failing if unknown bits are encountered, but the code in question doesn't appear to be doing this.)

> You can round explicitly before formatting
>
>    >>> '%5.1f' % (round(-.00001, 1) + 0.0)

Yes, I have experience with it.

  1. even as a one-off, it's questionable.  If someone accidentally changes the precision in only one of the spec string or round call, that's a bug.
  2. since applications and libraries may pass around format specs, and because of (1), you'll try to make a programmatic solution.  Now you're parsing format spec strings.
  3. while a programmatic solution can be done for a function API like format() that takes a separate spec and value, there is no sane way to wrap f-strings
msg408367 - (view) Author: John Belmonte (John Belmonte) Date: 2021-12-12 08:31
> For Decimal, we'd need to "own" the string formatting, taking that responsibility away from mpdecimal, but there are already other reasons to do that.

After some digging, I believe this is the background on forking pieces of mpdecimal (and why the existing source copy inside Python doesn't count as a fork):
  https://bugs.python.org/issue45708#msg405895
  https://github.com/python/cpython/pull/29438

If I understand correctly, the PR for supporting underscore separators in Decimal formatting is only taking control of generating a mpd_spec_t from the spec string.  Formatting itself is still done by mpd_qformat_spec().

So there's outstanding work to also pull the formatting code itself into _decimal.c.  (And this is wanted anyway to reconcile existing libmpdec formatting modifications:  https://github.com/python/cpython/commit/298131a44896a4fec1ea829814ad52409d59aba5)

And this is all because vendors have the crazy practice of unbundling libmpdec from Python.  (If a project is bundling the source of another, there may be some reason...?)
msg408413 - (view) Author: John Belmonte (John Belmonte) Date: 2021-12-12 23:52
potential short-term solution for Decimal:

  if negative zero option is set and sign is negative:
      pre-round into a temp using mpd_qrescale()
      if mpd_iszero(temp):
          change sign to positive
msg408529 - (view) Author: John Belmonte (John Belmonte) Date: 2021-12-14 13:25
implemented float and Decimal-- PR is ready for review
msg408800 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2021-12-17 16:49
Thanks, John. I should have time to review within the next week or so.
msg410838 - (view) Author: John Belmonte (John Belmonte) Date: 2022-01-18 00:45
Mark, would you give it a review this month?  (PR has been marked stale.)
msg411243 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2022-01-22 10:00
[John]

> Mark, would you give it a review this month?

Apologies; my holiday-break free time was nobbled from unexpected quarters. I can't promise to find time this month, but I can promise to try. I did at least skim through the PR, and while there are still likely some iterations needed I'm satisfied that this is technically feasible.

But I'm afraid that's the easy part. If this is to go in, the other problem we still have to solve is achieving some consensus among the core developers that this is worth doing. Right now, judging by comments on this issue, I think I'm the only core dev who thinks this is a good idea; others are lukewarm at best, and I'm not willing to unilaterally approve and merge these changes without something closer to a consensus. There are a couple of ways forward here:

- Post the proposal on python-ideas to get wider visibility and feedback. If everyone agrees this is a great idea (from experience, this seems an unlikely outcome), then we can go ahead and merge. Otherwise we'd likely need a PEP to move forward.

- Bypass the python-ideas step, write the PEP, discuss in the appropriate forums, and then submit to the SC for approval / rejection.

- Convince Eric Smith. :-) With apologies to Eric for singling him out: Eric could reasonably be described as the steward/maintainer of the formatting machinery, so if he's persuaded, that's good enough for me.

The fact that you've already created a working implementation so that people can experiment is a bonus when it comes to trying to sell this to others.

I don't have the bandwidth to write a PEP, but I would be happy to act as PEP sponsor.
msg411295 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2022-01-22 22:17
Wow, thanks, Mark!

I'm generally in favor. The selling points to me are that it needs to happen post-rounding, and the C++ discussion. It would be better if this were already accepted in C++. I'll note that the paper is proposing a 'z' modifier to the sign, so I guess for us that would translate to: [sign[optional-z]] instead of just sign. I'd have to noodle through the differences between that the proposed [sign][~]. I guess this would all be worked out in a PEP.

My only reservation is Mark's comment:  """For Decimal, we'd need to "own" the string formatting, taking that responsibility away from mpdecimal, but there are already other reasons to do that."""

If Mark is okay with that (right back at you, Mark!), then I think a PEP is the next step. It doesn't need to be huge, sort of like PEP 378.
msg411302 - (view) Author: John Belmonte (John Belmonte) Date: 2022-01-22 23:18
Thank you Mark and Eric.

> I'll note that the paper is proposing a 'z' modifier to the sign, so I guess for us that would translate to: [sign[optional-z]] instead of just sign.  I'd have to noodle through the differences between that the proposed [sign][~].

The C++ paper proposes [sign][z] (i.e. you can have the `z` alone without an explicit +/-), and this is what I implemented in the Python PR.  My original proposal with tilde was discarded.

> My only reservation is Mark's comment:  """For Decimal, we'd need to "own" the string formatting, taking that responsibility away from mpdecimal, but there are already other reasons to do that."""

In the PR I was able to avoid taking that on by preprocessing the format string before handing it to mpdecimal.  The code was already doing such things to handle the NULL fill character.
History
Date User Action Args
2022-01-22 23:18:50John Belmontesetmessages: + msg411302
2022-01-22 22:17:16eric.smithsetmessages: + msg411295
2022-01-22 10:00:59mark.dickinsonsetmessages: + msg411243
2022-01-18 00:45:26John Belmontesetmessages: + msg410838
2021-12-17 16:49:58mark.dickinsonsetassignee: mark.dickinson
2021-12-17 16:49:46mark.dickinsonsetmessages: + msg408800
2021-12-14 13:25:50John Belmontesetmessages: + msg408529
2021-12-12 23:52:33John Belmontesetmessages: + msg408413
2021-12-12 08:31:27John Belmontesetmessages: + msg408367
2021-12-11 13:24:12jbelmontesetkeywords: + patch
nosy: + jbelmonte

pull_requests: + pull_request28274
stage: patch review
2021-12-08 01:51:58John Belmontesetmessages: + msg407990
2021-12-07 21:48:16eric.smithsetmessages: + msg407973
2021-12-07 13:14:12John Belmontesetmessages: + msg407934
2021-12-07 12:55:35eric.smithsetmessages: + msg407930
2021-12-07 12:53:02steven.dapranosetmessages: + msg407929
2021-12-07 12:22:14John Belmontesetmessages: + msg407928
2021-12-07 09:54:42serhiy.storchakasetmessages: + msg407905
2021-12-06 19:03:35mark.dickinsonsetmessages: + msg407857
2021-12-06 18:16:16mark.dickinsonsetnosy: + eric.smith
2021-12-06 15:10:31John Belmontesetmessages: + msg407822
2021-12-06 14:18:04steven.dapranosetnosy: + steven.daprano

messages: + msg407810
versions: + Python 3.11
2021-12-06 12:30:06John Belmontesetmessages: + msg407796
2021-12-06 12:15:51serhiy.storchakasetnosy: + mark.dickinson, serhiy.storchaka
messages: + msg407793
2021-12-06 12:03:46John Belmontecreate