โžœ

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Add underscore as a decimal separator for string formatting
Type: enhancement Stage:
Components: Interpreter Core Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Terry Davis, domdfcoding, eric.smith, mark.dickinson, rhettinger, serhiy.storchaka, vstinner
Priority: normal Keywords:

Created on 2021-03-25 17:19 by Terry Davis, last changed 2022-04-11 14:59 by admin.

Messages (17)
msg389508 - (view) Author: Terry Davis (Terry Davis) Date: 2021-03-25 17:19
Proposal:
Enable this
>>> format(12_34_56.12_34_56, '_._f')
'123_456.123_456'

Where now only this is possible
>>> format(12_34_56.12_34_56, '_.f')
'123_456.123456'


Based on the discussion in the Ideas forum, three core devs support this addition.
https://discuss.python.org/t/add-underscore-as-a-thousandths-separator-for-string-formatting/7407

I'm willing to give this a try if someone points me to where to add tests and where the float formatting code is. This would be my first CPython contribution.

The feature freeze for 3.10 is 2021-05-03.
https://www.python.org/dev/peps/pep-0619/#id5
msg389512 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-03-25 17:56
IIRC there is ISO recommending that after the decimal point, digits be arranged in groups of five.  I think is also how printed reference tables are typically formatted.
msg389517 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-03-25 18:45
Some brief research
===================

""" in numbers four or more digits long, use commas to set off groups of three digits, counting leftward from the decimal point, in the standard American style. For long decimal numbers, do not use any digit-group separators to the right of the decimal point."""
โ€” Google Style Guide https://developers.google.com/style/numbers

The CRC math handbook uses groups of five after the decimal point.
See ยง1.2.4 in 
http://dl.icdst.org/pdfs/files/2a2cbcfc89598fd83c315ce45c1ee663.pdf


NIST Guide for using SI units:  """The digits of numerical values having more than four digits on either side of the decimal marker are separated into groups of three using a thin, fixed space counting from both the left and right of the decimal marker. For example, 15 739.012 53 is highly preferred to 15739.01253. Commas are not used to separate digits into groups of three. (See Sec. 10.5.3.)"""
โ€” page vi in https://physics.nist.gov/cuu/pdf/sp811.pdf#10.5.2

StackExchange question on the topic:
https://math.stackexchange.com/questions/182775/convention-of-digit-grouping-after-decimal-point

The important reference, ISO 80000:1 discusses this in section 7, "Printing rules", but the standard is not publicly available.
msg389529 - (view) Author: Dominic Davis-Foster (domdfcoding) * Date: 2021-03-25 21:04
ISO 80000-1:2009 recommends groups of three digits either side of the decimal sign.
msg389534 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2021-03-26 01:40
If we do anything for float, we should do the same for decimal.Decimal.
msg389546 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-26 11:58
> If we do anything for float, we should do the same for decimal.Decimal.

and complex ;-)
msg389547 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-26 12:09
How backward incompatible and annoying would it be to modify the behavior of the existing "_f" format?

Do you see use cases which only want to group digits in the integer part but not the fractional part?

According to https://discuss.python.org/t/add-underscore-as-a-thousandths-separator-for-string-formatting/7407 discussion, grouping digits was first designed for integers, and the fractional part of floats was simply ignored/forgotten. I mean, it doesn't sound like a deliberate choice to not group digits in the fractional part.

The advantage of changing "_f" format is to keep backward compatibility: Python 3.9 and older would not group digits in the fractional part, but at least they don't fail with an error. If you write code with "_._f" format, you need a fallback code path for Python 3.9 and older:

if sys.version_info >= (3, 10):
   text = f"my {...} very {...} long {...} and {...} complex {...} format string: x={x:_._f}"
else:
   text = f"my {...} very {...} long {...} and {...} complex {...} format string: x={x:_f}"

Or:

text = f"my {...} very {...} long {...} and {...} complex {...} format string:" + (f"x={x:_f}" if sys.version_info >= (3, 10) else "x={x:_f}")

Or many other variants.

The main drawback is the risk to break tests relying on the exact output.

About the separator character and the number of digits per group, IMO there is no standard working in all countries and all languages. But since we have a strict rule of 3 digits with "_" separator, I am fine with doing the same for the fractional part. It's an "arbitrary" choice, but at least, it's consistent.

People wanting a different format per locale/language should write their own function. Once enough people will agree on such API, we can consider to add it to the stdlib. But for now, IMO 3 digits with "_" is good enough.

By the way, I agree that it's hard to read numbers with many digits in the decimal part ;-)

>>> f"{1/7:_.30f}"
'0.142857142857142849212692681249'

>>> f"{10**10+1/7:_.10f}"
'10_000_000_000.1428565979'
msg389574 - (view) Author: Terry Davis (Terry Davis) Date: 2021-03-26 23:52
Good point Victor, though I wonder how likely it is that a person using 3.10 would only use this particular new feature, and have an otherwise backwards-compatible codebase.

This isn't something that I asked about out of necessity, and there hasn't been any other discussion of this idea that anyone can remember.

On the other hand, I suppose it would be possible to have a feature flag that can be used to disable decimal underscores in 3.10 to prevent test failures. Just spitballing...
msg389687 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-29 12:06
> On the other hand, I suppose it would be possible to have a feature flag that can be used to disable decimal underscores in 3.10 to prevent test failures. Just spitballing...

I wrote PEP 606 -- Python Compatibility Version https://www.python.org/dev/peps/pep-0606/ and it was rejected.
msg389708 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-03-29 15:34
I prefer Terry's original proposal which is backwards compatible and gives the user control over whether separator is to be applied to the fractional component.   

>>> format(12_34_56.12_34_56, '_._f')   # Whole and fractional
'123_456.123_456'
>>> format(12_34_56.12_34_56, '_.f')    # Fractional component only   
'123_456.123456'
msg389709 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2021-03-29 15:37
I agree with Raymond. We can't make a change that would modify existing program output. Which is unfortunate, but such is life.

And I'd prefer to see groupings of 5 on the right, but I realize I might be in the minority.
msg389735 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-29 20:16
'_.f' would be the same as '_f'?

Should "._f" be allowed to only add underscores in the fractional part? (for consistency?)
msg389736 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-29 20:18
Raymond:
> I prefer Terry's original proposal which is backwards compatible (...)

Well ok, that's what I expected. Backward compatibility usually wins all other arguments in Python :-) But I had to ask the question :-)
msg389744 - (view) Author: Terry Davis (Terry Davis) Date: 2021-03-29 20:39
Victor,
> '_.f' would be the same as '_f'?
No, the example in my original post is wrong, '_.f' isn't allowed now.
The proposal should use '_f' to describe the current behavior.

> Should "._f" be allowed to only add underscores in the fractional part? (for consistency?)

Yes, but not for consistency with the above usage, instead it's so both fractional and integral underscores can be specified on their own.

Here is my attempt at updating the format spec. The only problem I have with it is that it allows a naked '.'; I don't know how to specify "dot must be followed by one or both of 'float_grouping' and 'precision'".

Current:
format_spec     ::=  [[fill]align][sign][#][0][width][grouping_option][.precision][type]

Proposed:
format_spec     ::=  [[fill]align][sign][#][0][width][grouping_option][.[float_grouping][precision]][type]
fill            ::=  <any character>
align           ::=  "<" | ">" | "=" | "^"
sign            ::=  "+" | "-" | " "
width           ::=  digit+
grouping_option ::=  "_" | ","
float_grouping  ::=  "_"
precision       ::=  digit+
type            ::=  "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "s" | "x" | "X" | "%"
msg389754 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-29 21:12
I'm now confused. Would you mind to give examples of all proposed formats and the expected output?
msg389762 - (view) Author: Terry Davis (Terry Davis) Date: 2021-03-29 22:24
Current behavior:

>>> format(1234.1234, '_f')
'1_234.123400'
>>> format(1234.1234, ',f')
'1,234.123400'

New behavior:
>>> format(1234.1234, ',._f')
'1,234.123_400'
>>> format(1234.1234, '_._f')
'1_234.123_400'
>>> format(1234.1234, '._f')
'1234.123_400'
>>> format(1234.1234, '._4f')
'1234.123_4'
>>> format(1234.1234, '.f')  # still not allowed
'1234.123_4'
>>> format(1234.1234, '_.f')  # still not allowed
msg392911 - (view) Author: Terry Davis (Terry Davis) Date: 2021-05-04 15:37
If no one else has any comments, I'll assume there is consensus and start working on this. I have not contributed to CPython before, nor have I worked on production C code, so it may be a while before I get anywhere.
History
Date User Action Args
2022-04-11 14:59:43adminsetgithub: 87790
2021-11-06 09:49:54serhiy.storchakaunlinkissue45708 superseder
2021-11-05 13:21:33serhiy.storchakalinkissue45708 superseder
2021-05-04 15:37:03Terry Davissetmessages: + msg392911
versions: + Python 3.11, - Python 3.10
2021-03-29 22:24:52Terry Davissetmessages: + msg389762
2021-03-29 21:12:30vstinnersetmessages: + msg389754
2021-03-29 20:39:49Terry Davissetmessages: + msg389744
2021-03-29 20:18:27vstinnersetmessages: + msg389736
2021-03-29 20:16:46vstinnersetmessages: + msg389735
2021-03-29 15:37:48eric.smithsetmessages: + msg389709
2021-03-29 15:34:44rhettingersetmessages: + msg389708
2021-03-29 12:06:58vstinnersetmessages: + msg389687
2021-03-26 23:52:26Terry Davissetmessages: + msg389574
2021-03-26 12:09:26vstinnersetmessages: + msg389547
2021-03-26 11:58:04vstinnersetmessages: + msg389546
2021-03-26 01:40:45eric.smithsetmessages: + msg389534
2021-03-26 01:38:46eric.smithsetnosy: + eric.smith
2021-03-25 21:04:42domdfcodingsetnosy: + domdfcoding
messages: + msg389529
2021-03-25 19:22:50vstinnersetnosy: + vstinner
2021-03-25 18:45:13rhettingersetmessages: + msg389517
2021-03-25 17:56:02rhettingersetnosy: + rhettinger, mark.dickinson, serhiy.storchaka
messages: + msg389512
2021-03-25 17:19:07Terry Daviscreate