classification
Title: _decimal: Implement the previously rejected changes from #7442.
Type: Stage: patch review
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: skrah Nosy List: rhettinger, skrah, vstinner
Priority: normal Keywords: patch, patch, patch

Created on 2019-01-09 11:31 by vstinner, last changed 2019-01-13 04:21 by rhettinger.

Files
File name Uploaded Description Edit
decimal_locale.py vstinner, 2019-01-09 11:31
Pull Requests
URL Status Linked Edit
PR 11474 open vstinner, 2019-01-09 11:42
PR 11474 open vstinner, 2019-01-09 11:42
PR 11474 open vstinner, 2019-01-09 11:43
Messages (12)
msg333302 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-01-09 11:31
The decimal module support formatting a number in the "n" formatting type if the LC_NUMERIC locale uses a different encoding than the LC_CTYPE locale. 

Example with attached decimal_locale.py on Fedora 29 with Python 3.7.2:

$ python3 decimal_locale.py 
LC_NUMERIC locale: uk_UA.koi8u
decimal_point: ',' = ',' = U+002c
thousands_sep: '\xa0' = '\xa0' = U+00a0
Traceback (most recent call last):
  File "/home/vstinner/decimal_locale.py", line 16, in <module>
    text = format(num, "n")
ValueError: invalid decimal point or unsupported combination of LC_CTYPE and LC_NUMERIC

Attached PR modify the _decimal module to support this corner case.

Note: I already wrote PR 5191 last year, but I abandoned the PR in the meanwhile.

--

Supporting non-ASCII decimal point and thousands separator has a long history and a list of now fixed issues:

* bpo-7442
* bpo-13706
* bpo-25812
* bpo-28604 (LC_MONETARY)
* bpo-31900
* bpo-33954

I even wrote an article about these bugs :-)
https://github.com/python/cpython/pull/5191

Python 3.7.2 now supports different encodings for LC_NUMERIC, LC_MONETARY and LC_CTYPE locales. format(int, "n") sets temporarily LC_CTYPE to LC_NUMERIC to decode decimal_point and thousands_sep from the correct encoding. The LC_CTYPE locale is only changed if it's different than LC_NUMERIC locale and if the decimal point and/or thousands separator is non-ASCII. It's implemented in this function:

int
_Py_GetLocaleconvNumeric(struct lconv *lc,
                         PyObject **decimal_point, PyObject **thousands_sep)

Function used by locale.localeconv() and format() (for "n" type).

I decided to fix the bug when I was fixing other locale bugs because we now got enough bug reports.

Copy of my msg309980:

"""
> I would not consider this a bug in Python, but rather in the locale settings passed to setlocale().

Past 10 years, I repeated to every single user I met that "Python 3 is right, your system setup is wrong". But that's a waste of time. People continue to associate Python3 and Unicode to annoying bugs, because they don't understand how locales work.

Instead of having to repeat to each user that "hum, maybe your config is wrong", I prefer to support this non convential setup and work as expected ("it just works"). With my latest implementation, setlocale() is only done when LC_CTYPE and LC_NUMERIC are different, which is the corner case which "shouldn't occur in practice".
"""
msg333303 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-01-09 11:33
Oh, I was wrong: bpo-25812 has not been fixed yet.
msg333304 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2019-01-09 11:41
Since #7442 (again, *I* discovered this and it is *mentioned* in the
_decimal sources), there have been zero bug reports about decimal.
msg333306 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-01-09 11:47
Ok, I wrote PR 11474. Correct result with this PR:

$ ./python decimal_locale.py 
LC_NUMERIC locale: uk_UA.koi8u
decimal_point: ',' = ',' = U+002c
thousands_sep: '\xa0' = '\xa0' = U+00a0
format: '1\xa0200,5' = 1 200,5 = U+0031 U+00a0 U+0032 U+0030 U+0030 U+002c U+0035
msg333307 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-01-09 11:50
Stefan Krah:
> Since #7442 (again, *I* discovered this and it is *mentioned* in the
_decimal sources), ...

I closed bpo-7442 in the meanwhile, so I opened this new issue specific to the decimal module.

> ... there have been zero bug reports about decimal.

Well, here you have :-) A bug report about decimal.

I let you decide what to do with this bug. I wrote a fix. It's up to you to merge it, reject it or do nothing :-)

I wanted to make sure that the bug is fixed in all parts of the Python stdlib.
msg333310 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2019-01-09 12:00
Don't you find it strange to close #7442 in mutual agreement and now
mention the word "bug" 50 times?
msg333311 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2019-01-09 12:39
Also Marc-Andre does not consider this a bug in #31900.  The
presentation of this issue is increasingly bizarre.
msg333313 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-01-09 13:07
Stefan Krah:
> Don't you find it strange to close #7442 in mutual agreement and now
mention the word "bug" 50 times?

We agreed to close the bug in 2014. In the meanwhile, more and more people reported the same bug (multiple similar bug reports and more and more frequent messages). So I decided to fix the bug instead of explaining to users that they must not do that :-)


> Also Marc-Andre does not consider this a bug in #31900.  The
presentation of this issue is increasingly bizarre.

Aha, maybe I misunderstood him when he wrote (msg309981):

"Sounds like a good compromise :-)"

--

I'm not sure of what you are asking here. You are the maintainer of the decimal module. If you consider that it's not worth it, just close the issue. It's up to you ;-)
msg333316 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-01-09 13:15
Oh, I forgot to mention the context: I reported this issue as follow-up to discussions on bpo-35638: "Introduce fixed point locale aware format type for floating point numbers".
msg333317 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-01-09 13:20
Extract of Stefan Krah's msg333296 of bpo-35638:

> So why would you think that I'm not aware of that issue?

Oh, I never said that.

I wrote "By the way, the decimal module doesn't support properly the following corner case: LC_NUMERIC using an encoding different than LC_CTYPE encoding. I wrote https://github.com/python/cpython/pull/5191 but I abandonned my change." as a reminder for myself. I found again the bug when I wrote my article, and I realized that I abandoned my PR when I had my burnout, and not because the fix was "officially" rejected.

So I opened this issue to get an official statement :-)


> It has low priority for me and I hesitate to depend on the official locale functions in decimal because I don't want to be involved in additional issue reports in that area.

As I wrote in my PR, I'm not very happy of my proposed implementation. But let's discuss options to fix it :-)

I don't understand "I don't want to be involved in additional issue reports in that area". If the bug is fixed, why do you expect more bug reports? You wrote that nobody reported any issue related to formatting a decimal number using the locale since 2009.
msg333321 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2019-01-09 14:12
I mean issue reports like #33954 or #35195.  These are just
two examples, and I'm NOT claiming that they are related.

But if functions like _PyUnicode_InsertThousandsGrouping()
*were* used in _decimal, I'd feel compelled to investigate.

Now I don't have to. I'd investigate #35195 for example, perhaps
just to find out that it has an entirely different cause.


Sometimes not being dependent on API functions is a virtue if
the code does not change very often.
msg333542 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-01-13 04:21
> Sometimes not being dependent on API functions is a virtue if the code does not change very often.

That is a reasonable position to take.
History
Date User Action Args
2019-01-13 04:21:04rhettingersetkeywords: patch, patch, patch
nosy: + rhettinger
messages: + msg333542

2019-01-09 14:12:36skrahsetkeywords: patch, patch, patch

messages: + msg333321
2019-01-09 13:20:55vstinnersetkeywords: patch, patch, patch

messages: + msg333317
2019-01-09 13:15:15vstinnersetkeywords: patch, patch, patch

messages: + msg333316
2019-01-09 13:07:59vstinnersetkeywords: patch, patch, patch

messages: + msg333313
2019-01-09 12:39:16skrahsetkeywords: patch, patch, patch

messages: + msg333311
2019-01-09 12:18:09skrahsetkeywords: patch, patch, patch
title: decimal: formatter error if LC_NUMERIC uses a different encoding than LC_CTYPE -> _decimal: Implement the previously rejected changes from #7442.
2019-01-09 12:00:25skrahsetkeywords: patch, patch, patch

messages: + msg333310
2019-01-09 11:50:11vstinnersetkeywords: patch, patch, patch

messages: + msg333307
2019-01-09 11:47:21vstinnersetkeywords: patch, patch, patch

messages: + msg333306
2019-01-09 11:43:07vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request10979
2019-01-09 11:43:02vstinnersetkeywords: + patch
stage: (no value)
pull_requests: + pull_request10978
2019-01-09 11:42:56vstinnersetkeywords: + patch
stage: (no value)
pull_requests: + pull_request10977
2019-01-09 11:41:28skrahsetmessages: + msg333304
2019-01-09 11:36:37skrahsetassignee: skrah

nosy: + skrah
2019-01-09 11:33:50vstinnersetmessages: + msg333303
2019-01-09 11:31:48vstinnercreate