msg405667 - (view) |
Author: Sander Bollen (sndrtj) |
Date: 2021-11-04 09:01 |
Hello,
It appears that Decimal does not support PEP-515 style formatting with underscores as thousands separators.
```
>>> from decimal import Decimal
>>> f"{Decimal('5000'):,}"
'5,000'
>>> f"{Decimal('5000'):_}"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid format string
```
This does work for all the other types mentioned in PEP515
```
>>> f"{5000:_}"
'5_000'
>>> f"{5000.0:_}"
'5_000.0'
>>> f"{complex(5000, 1):_}"
'(5_000+1j)'
```
I have tried this on python 3.8, 3.9 and 3.10 on a Mac.
|
msg405679 - (view) |
Author: Nikita Sobolev (sobolevn) *  |
Date: 2021-11-04 11:23 |
It is not hard to fix (https://github.com/python/cpython/blob/2c045bd5673d56c3fdde7536da9df1c7d6f270f0/Modules/_decimal/libmpdec/io.c#L857-L863):
```c
/* thousands separator */
if (*cp == ',' || *cp == '_') {
spec->dot = ".";
if (*cp == ',') {
spec->sep = ",";
} else {
spec->sep = "_";
}
spec->grouping = "\003\003";
cp++;
}
```
But, there's some context to it: https://bugs.python.org/issue44267 / https://mail.python.org/pipermail/python-dev/2016-March/143556.html
I am going to research this further.
|
msg405683 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2021-11-04 11:58 |
I think the two main reasons that applied to not implementing the parsing part of PEP 515 for the Decimal type (speed, compliance with the IBM specification) don't apply to the formatting side.
We do need to think about the implications of making local changes to our copy of the externally-maintained libmpdec library, though.
Changing Python versions: this is a new feature, so could only go into Python 3.11.
|
msg405684 - (view) |
Author: Eric V. Smith (eric.smith) *  |
Date: 2021-11-04 12:03 |
I agree with Mark.
Also, if we're going to change the C implementation, the Python implementation should agree with it.
|
msg405685 - (view) |
Author: Nikita Sobolev (sobolevn) *  |
Date: 2021-11-04 12:04 |
More context
------------
Original commit with `_` format: https://github.com/python/cpython/commit/89e1b1aae0775341735de6bc5e97b3c1e9cea0fa
|
msg405724 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2021-11-04 15:00 |
Tim, do you have any thoughts on whether Decimal should extend beyond the specification in this case?
|
msg405739 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2021-11-04 17:14 |
> whether Decimal should extend beyond the specification in this case
We already go way beyond the original specification for string formatting. The spec doesn't go further than specifying to-scientific-string and to-engineering-string, neither of which even get into limiting precision.
|
msg405791 - (view) |
Author: Sander Bollen (sndrtj) |
Date: 2021-11-05 12:41 |
Thanks for looking into this so swiftly!
As a user, I have found PEP515 to be worded a little ambiguously with regards to formatting using underscores as thousands separators. While it enumerates a collection of types as far as the constructor is concerned, it does not do so for formatting. Instead, it says to support `_` as thousands separator for formatting wherever `,` is allowed. Since `,` does work for Decimal, I have taken that to imply `_` should work for Decimal as well.
|
msg405795 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2021-11-05 13:21 |
This is a duplicate of issue43624. It was also discussed in Discuss.
|
msg405853 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2021-11-06 09:39 |
Serhiy: this is not a duplicate of #43624. That issue is about underscores in the *fractional* part of a (float / complex / Decimal) number, and the changes to the formatting mini-language syntax that would be necessary to support that. This issue is simply about bringing Decimal into line with int and float and allowing inclusion of underscores in the *integral* part of the formatted result.
Raymond: the "General Decimal Arithmetic" specification that the decimal module is based on isn't relevant here. It has nothing to say on the subject of formatting. We moved beyond the specification the moment we allowed `format(some_decimal, 'f')`, let alone `format(some_decimal, '.3f')` or `format(some_decimal, ',')`.
As Sander Bollen noted, we've already added ","-form thousands separators to Decimal formatting. I can't see any good reason for supporting "," but not supporting "_" as a thousands separator for Decimal.
|
msg405854 - (view) |
Author: Nikita Sobolev (sobolevn) *  |
Date: 2021-11-06 10:34 |
> We do need to think about the implications of making local changes to our copy of the externally-maintained libmpdec library, though.
It looks like quite similar changes have already been made: https://github.com/python/cpython/commit/298131a44896a4fec1ea829814ad52409d59aba5
I will send a PR, so we can see what exactly it touches / changes.
|
msg405855 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2021-11-06 10:44 |
> It looks like quite similar changes have already been made:
Yes, I think this isn't something that needs to be resolved for this issue, but it is something we need to think about. (Though perhaps the resolution is just "Don't worry about it until we need to.")
> I will send a PR, so we can see what exactly it touches / changes.
Ah, sorry; I already made one before reading your message. I'd be happy to get your input on that PR, though. (Or to review a PR from you.)
|
msg405895 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2021-11-07 09:27 |
Christian Heimes pointed out in the PR discussion that we can't simply modify libmpdec, since some vendors unbundle the mpdecimal library.
So some options are:
0. Do nothing.
1. Request that this feature to be added upstream, so that it eventually makes its way into core Python.
2. Bypass mpd_parse_fmt_str and do our own format string parsing in _decimal.c (e.g., by copying and adapting the code in mpdecimal).
3. Wrap mpd_parse_fmt_str and do our own pre- and post- processing in _decimal.c (pre-process to convert "_" to "," in the format string, then post-process the formatted string to convert "," back to "_").
Option 2 makes sense to me from the point of view of separation of concerns: libmpdec aims to implement Cowlishaw's specification, and formatting lies outside of that specification. The decimal specification is pretty much set in stone, but the formatting mini-language could change again in the future, and when that happens we should be able to update the CPython code accordingly. (This brings to mind Robert Martin's Single Responsibility Principle: "Gather together those things that change for the same reason, and separate those things that change for different reasons.")
I've updated the PR (and turned it into a draft) to show what option 2 looks like. The duplication is a little ugly.
|
msg405907 - (view) |
Author: Eric V. Smith (eric.smith) *  |
Date: 2021-11-07 16:29 |
I think your option 2 makes the most sense.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:52 | admin | set | github: 89871 |
2021-11-07 16:29:51 | eric.smith | set | messages:
+ msg405907 |
2021-11-07 09:27:32 | mark.dickinson | set | nosy:
+ christian.heimes messages:
+ msg405895
|
2021-11-06 14:26:00 | rhettinger | set | assignee: rhettinger -> |
2021-11-06 10:44:41 | mark.dickinson | set | messages:
+ msg405855 |
2021-11-06 10:42:22 | mark.dickinson | set | keywords:
+ patch stage: patch review pull_requests:
+ pull_request27692 |
2021-11-06 10:34:21 | sobolevn | set | messages:
+ msg405854 |
2021-11-06 09:49:54 | serhiy.storchaka | set | status: closed -> open superseder: Add underscore as a decimal separator for string formatting -> resolution: duplicate -> stage: resolved -> (no value) |
2021-11-06 09:39:09 | mark.dickinson | set | nosy:
+ mark.dickinson messages:
+ msg405853
|
2021-11-05 13:21:33 | serhiy.storchaka | set | status: open -> closed superseder: Add underscore as a decimal separator for string formatting messages:
+ msg405795
resolution: duplicate stage: resolved |
2021-11-05 12:41:28 | sndrtj | set | messages:
+ msg405791 |
2021-11-05 09:37:41 | mark.dickinson | set | nosy:
- mark.dickinson
|
2021-11-05 04:36:08 | rhettinger | set | nosy:
+ serhiy.storchaka
|
2021-11-04 17:14:58 | mark.dickinson | set | messages:
+ msg405739 |
2021-11-04 15:00:58 | rhettinger | set | assignee: rhettinger
messages:
+ msg405724 nosy:
+ rhettinger, tim.peters |
2021-11-04 12:04:47 | sobolevn | set | messages:
+ msg405685 |
2021-11-04 12:03:12 | eric.smith | set | messages:
+ msg405684 |
2021-11-04 11:58:12 | mark.dickinson | set | type: enhancement messages:
+ msg405683 versions:
+ Python 3.11, - Python 3.8, Python 3.9, Python 3.10 |
2021-11-04 11:50:39 | mark.dickinson | set | nosy:
+ mark.dickinson, eric.smith
|
2021-11-04 11:23:29 | sobolevn | set | nosy:
+ sobolevn messages:
+ msg405679
|
2021-11-04 09:01:14 | sndrtj | create | |