This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: PEP 515-style formatting with underscores does not seem to work for Decimal
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, eric.smith, mark.dickinson, rhettinger, serhiy.storchaka, sndrtj, sobolevn, tim.peters
Priority: normal Keywords: patch

Created on 2021-11-04 09:01 by sndrtj, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 29438 open mark.dickinson, 2021-11-06 10:42
Messages (14)
msg405667 - (view) Author: Sander Bollen (sndrtj) Date: 2021-11-04 09:01
Hello,

It appears that Decimal does not support PEP-515 style formatting with underscores as thousands separators. 

```
>>> from decimal import Decimal
>>> f"{Decimal('5000'):,}"
'5,000'
>>> f"{Decimal('5000'):_}"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid format string

```

This does work for all the other types mentioned in PEP515

```
>>> f"{5000:_}"
'5_000'
>>> f"{5000.0:_}"
'5_000.0'
>>> f"{complex(5000, 1):_}"
'(5_000+1j)'
```

I have tried this on python 3.8, 3.9 and 3.10 on a Mac.
msg405679 - (view) Author: Nikita Sobolev (sobolevn) * (Python triager) Date: 2021-11-04 11:23
It is not hard to fix (https://github.com/python/cpython/blob/2c045bd5673d56c3fdde7536da9df1c7d6f270f0/Modules/_decimal/libmpdec/io.c#L857-L863):

```c
    /* thousands separator */
    if (*cp == ',' || *cp == '_') {
        spec->dot = ".";
        if (*cp == ',') {
            spec->sep = ",";
        } else {
            spec->sep = "_";
        }
        spec->grouping = "\003\003";
        cp++;
    }
```

But, there's some context to it: https://bugs.python.org/issue44267 / https://mail.python.org/pipermail/python-dev/2016-March/143556.html

I am going to research this further.
msg405683 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2021-11-04 11:58
I think the two main reasons that applied to not implementing the parsing part of PEP 515 for the Decimal type (speed, compliance with the IBM specification) don't apply to the formatting side.

We do need to think about the implications of making local changes to our copy of the externally-maintained libmpdec library, though.

Changing Python versions: this is a new feature, so could only go into Python 3.11.
msg405684 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2021-11-04 12:03
I agree with Mark.

Also, if we're going to change the C implementation, the Python implementation should agree with it.
msg405685 - (view) Author: Nikita Sobolev (sobolevn) * (Python triager) Date: 2021-11-04 12:04
More context
------------

Original commit with `_` format: https://github.com/python/cpython/commit/89e1b1aae0775341735de6bc5e97b3c1e9cea0fa
msg405724 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-11-04 15:00
Tim, do you have any thoughts on whether Decimal should extend beyond the specification in this case?
msg405739 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2021-11-04 17:14
> whether Decimal should extend beyond the specification in this case

We already go way beyond the original specification for string formatting. The spec doesn't go further than specifying to-scientific-string and to-engineering-string, neither of which even get into limiting precision.
msg405791 - (view) Author: Sander Bollen (sndrtj) Date: 2021-11-05 12:41
Thanks for looking into this so swiftly!

As a user, I have found PEP515 to be worded a little ambiguously with regards to formatting using underscores as thousands separators. While it enumerates a collection of types as far as the constructor is concerned, it does not do so for formatting. Instead, it says to support `_` as thousands separator for formatting wherever `,` is allowed. Since `,` does work for Decimal, I have taken that to imply `_` should work for Decimal as well.
msg405795 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-11-05 13:21
This is a duplicate of issue43624. It was also discussed in Discuss.
msg405853 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2021-11-06 09:39
Serhiy: this is not a duplicate of #43624. That issue is about underscores in the *fractional* part of a (float / complex / Decimal) number, and the changes to the formatting mini-language syntax that would be necessary to support that. This issue is simply about bringing Decimal into line with int and float and allowing inclusion of underscores in the *integral* part of the formatted result.

Raymond: the "General Decimal Arithmetic" specification that the decimal module is based on isn't relevant here. It has nothing to say on the subject of formatting. We moved beyond the specification the moment we allowed `format(some_decimal, 'f')`, let alone `format(some_decimal, '.3f')` or `format(some_decimal, ',')`.

As Sander Bollen noted, we've already added ","-form thousands separators to Decimal formatting. I can't see any good reason for supporting "," but not supporting "_" as a thousands separator for Decimal.
msg405854 - (view) Author: Nikita Sobolev (sobolevn) * (Python triager) Date: 2021-11-06 10:34
> We do need to think about the implications of making local changes to our copy of the externally-maintained libmpdec library, though.

It looks like quite similar changes have already been made: https://github.com/python/cpython/commit/298131a44896a4fec1ea829814ad52409d59aba5

I will send a PR, so we can see what exactly it touches / changes.
msg405855 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2021-11-06 10:44
> It looks like quite similar changes have already been made: 

Yes, I think this isn't something that needs to be resolved for this issue, but it is something we need to think about. (Though perhaps the resolution is just "Don't worry about it until we need to.")

> I will send a PR, so we can see what exactly it touches / changes.

Ah, sorry; I already made one before reading your message. I'd be happy to get your input on that PR, though. (Or to review a PR from you.)
msg405895 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2021-11-07 09:27
Christian Heimes pointed out in the PR discussion that we can't simply modify libmpdec, since some vendors unbundle the mpdecimal library.

So some options are:

0. Do nothing.
1. Request that this feature to be added upstream, so that it eventually makes its way into core Python.
2. Bypass mpd_parse_fmt_str and do our own format string parsing in _decimal.c (e.g., by copying and adapting the code in mpdecimal).
3. Wrap mpd_parse_fmt_str and do our own pre- and post- processing in _decimal.c (pre-process to convert "_" to "," in the format string, then post-process the formatted string to convert "," back to "_").

Option 2 makes sense to me from the point of view of separation of concerns: libmpdec aims to implement Cowlishaw's specification, and formatting lies outside of that specification. The decimal specification is pretty much set in stone, but the formatting mini-language could change again in the future, and when that happens we should be able to update the CPython code accordingly. (This brings to mind Robert Martin's Single Responsibility Principle: "Gather together those things that change for the same reason, and separate those things that change for different reasons.") 

I've updated the PR (and turned it into a draft) to show what option 2 looks like. The duplication is a little ugly.
msg405907 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2021-11-07 16:29
I think your option 2 makes the most sense.
History
Date User Action Args
2022-04-11 14:59:52adminsetgithub: 89871
2021-11-07 16:29:51eric.smithsetmessages: + msg405907
2021-11-07 09:27:32mark.dickinsonsetnosy: + christian.heimes
messages: + msg405895
2021-11-06 14:26:00rhettingersetassignee: rhettinger ->
2021-11-06 10:44:41mark.dickinsonsetmessages: + msg405855
2021-11-06 10:42:22mark.dickinsonsetkeywords: + patch
stage: patch review
pull_requests: + pull_request27692
2021-11-06 10:34:21sobolevnsetmessages: + msg405854
2021-11-06 09:49:54serhiy.storchakasetstatus: closed -> open
superseder: Add underscore as a decimal separator for string formatting ->
resolution: duplicate ->
stage: resolved -> (no value)
2021-11-06 09:39:09mark.dickinsonsetnosy: + mark.dickinson
messages: + msg405853
2021-11-05 13:21:33serhiy.storchakasetstatus: open -> closed
superseder: Add underscore as a decimal separator for string formatting
messages: + msg405795

resolution: duplicate
stage: resolved
2021-11-05 12:41:28sndrtjsetmessages: + msg405791
2021-11-05 09:37:41mark.dickinsonsetnosy: - mark.dickinson
2021-11-05 04:36:08rhettingersetnosy: + serhiy.storchaka
2021-11-04 17:14:58mark.dickinsonsetmessages: + msg405739
2021-11-04 15:00:58rhettingersetassignee: rhettinger

messages: + msg405724
nosy: + rhettinger, tim.peters
2021-11-04 12:04:47sobolevnsetmessages: + msg405685
2021-11-04 12:03:12eric.smithsetmessages: + msg405684
2021-11-04 11:58:12mark.dickinsonsettype: enhancement
messages: + msg405683
versions: + Python 3.11, - Python 3.8, Python 3.9, Python 3.10
2021-11-04 11:50:39mark.dickinsonsetnosy: + mark.dickinson, eric.smith
2021-11-04 11:23:29sobolevnsetnosy: + sobolevn
messages: + msg405679
2021-11-04 09:01:14sndrtjcreate