Author vstinner
Recipients vstinner
Date 2019-01-09.11:31:48
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1547033508.44.0.197270282125.issue35697@roundup.psfhosted.org>
In-reply-to
Content
The decimal module support formatting a number in the "n" formatting type if the LC_NUMERIC locale uses a different encoding than the LC_CTYPE locale. 

Example with attached decimal_locale.py on Fedora 29 with Python 3.7.2:

$ python3 decimal_locale.py 
LC_NUMERIC locale: uk_UA.koi8u
decimal_point: ',' = ',' = U+002c
thousands_sep: '\xa0' = '\xa0' = U+00a0
Traceback (most recent call last):
  File "/home/vstinner/decimal_locale.py", line 16, in <module>
    text = format(num, "n")
ValueError: invalid decimal point or unsupported combination of LC_CTYPE and LC_NUMERIC

Attached PR modify the _decimal module to support this corner case.

Note: I already wrote PR 5191 last year, but I abandoned the PR in the meanwhile.

--

Supporting non-ASCII decimal point and thousands separator has a long history and a list of now fixed issues:

* bpo-7442
* bpo-13706
* bpo-25812
* bpo-28604 (LC_MONETARY)
* bpo-31900
* bpo-33954

I even wrote an article about these bugs :-)
https://github.com/python/cpython/pull/5191

Python 3.7.2 now supports different encodings for LC_NUMERIC, LC_MONETARY and LC_CTYPE locales. format(int, "n") sets temporarily LC_CTYPE to LC_NUMERIC to decode decimal_point and thousands_sep from the correct encoding. The LC_CTYPE locale is only changed if it's different than LC_NUMERIC locale and if the decimal point and/or thousands separator is non-ASCII. It's implemented in this function:

int
_Py_GetLocaleconvNumeric(struct lconv *lc,
                         PyObject **decimal_point, PyObject **thousands_sep)

Function used by locale.localeconv() and format() (for "n" type).

I decided to fix the bug when I was fixing other locale bugs because we now got enough bug reports.

Copy of my msg309980:

"""
> I would not consider this a bug in Python, but rather in the locale settings passed to setlocale().

Past 10 years, I repeated to every single user I met that "Python 3 is right, your system setup is wrong". But that's a waste of time. People continue to associate Python3 and Unicode to annoying bugs, because they don't understand how locales work.

Instead of having to repeat to each user that "hum, maybe your config is wrong", I prefer to support this non convential setup and work as expected ("it just works"). With my latest implementation, setlocale() is only done when LC_CTYPE and LC_NUMERIC are different, which is the corner case which "shouldn't occur in practice".
"""
History
Date User Action Args
2019-01-09 11:31:52vstinnersetrecipients: + vstinner
2019-01-09 11:31:48vstinnersetmessageid: <1547033508.44.0.197270282125.issue35697@roundup.psfhosted.org>
2019-01-09 11:31:48vstinnerlinkissue35697 messages
2019-01-09 11:31:48vstinnercreate