classification
Title: localeconv() doesn't support LC_MONETARY encoding different than LC_CTYPE encoding
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Guillaume Pasquet (Etenil), cstratak, lemburg, loewis, schwab, serhiy.storchaka, vstinner, xtreak
Priority: normal Keywords: patch

Created on 2016-11-03 21:26 by Guillaume Pasquet (Etenil), last changed 2018-11-28 16:52 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 10606 merged vstinner, 2018-11-20 12:36
PR 10619 merged vstinner, 2018-11-20 20:14
PR 10621 merged vstinner, 2018-11-20 21:08
Messages (11)
msg280023 - (view) Author: Guillaume Pasquet (Etenil) (Guillaume Pasquet (Etenil)) Date: 2016-11-03 21:26
This issue was originally reported on Fedora's Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1391280

Description of problem:
After switching the monetary locale to en_GB, python then raises an exception when calling locale.localeconv()

Version-Release number of selected component (if applicable):
3.5.2-4.fc25

How reproducible:
Every time

Steps to Reproduce:
1. Write a python3 script or open the interactive interpreter with "python3"
2. Enter the following
import locale
locale.setlocale(locale.LC_MONETARY, 'en_GB')
locale.localeconv()
3. Observe that python raises an encoding exception

Actual results:
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.5/locale.py", line 110, in localeconv
    d = _localeconv()
UnicodeDecodeError: 'locale' codec can't decode byte 0xa3 in position 0: Invalid or incomplete multibyte or wide character

Expected results:
A dictionary of locale data similar to (for en_US):
{'mon_thousands_sep': ',', 'currency_symbol': '$', 'negative_sign': '-', 'p_sep_by_space': 0, 'frac_digits': 2, 'int_frac_digits': 2, 'decimal_point': '.', 'mon_decimal_point': '.', 'positive_sign': '', 'p_cs_precedes': 1, 'p_sign_posn': 1, 'mon_grouping': [3, 3, 0], 'n_cs_precedes': 1, 'n_sign_posn': 1, 'grouping': [3, 3, 0], 'thousands_sep': ',', 'int_curr_symbol': 'USD ', 'n_sep_by_space': 0}

Note:
This was reproduced on Linux Mint 18 (python 3.5.2), and also on Fedora with python 3.4 and python 3.6 (compiled).
msg280028 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-03 22:21
I suspect this issue is similar to issue25812. en_GB has non-ut8 encoding (likely iso8859-1). Currency symbol £ is encoded with this encoding as b'\xa3'. But Python tries to decode b'\xa3' with an encoding determined by other locale setting (LC_CTYPE).
msg303419 - (view) Author: Andreas Schwab (schwab) * Date: 2017-09-30 19:24
This causes test_float.py to fail with glibc > 2.26.

ERROR: test_float_with_comma (__main__.GeneralFloatCases)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/abuild/rpmbuild/BUILD/Python-3.6.2/Lib/test/support/__init__.py", line 1590, in inner
    return func(*args, **kwds)
  File "Lib/test/test_float.py", line 150, in test_float_with_comma
    if not locale.localeconv()['decimal_point'] == ',':
  File "/home/abuild/rpmbuild/BUILD/Python-3.6.2/Lib/locale.py", line 110, in localeconv
    d = _localeconv()
UnicodeDecodeError: 'locale' codec can't decode byte 0xa0 in position 0: Invalid or incomplete multibyte or wide character

----------------------------------------------------------------------
msg330128 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-20 12:27
Example of the bug:

import locale
# LC_CTYPE: latin1 encoding
locale.setlocale(locale.LC_ALL, "en_GB")
# LC_MONETARY: utf8 encoding
locale.setlocale(locale.LC_MONETARY, "ar_SA.UTF-8")
lc = locale.localeconv()
for attr in (
    "mon_grouping",
    "int_curr_symbol",
    "currency_symbol",
    "mon_decimal_point",
    "mon_thousands_sep",
):
    print(f"{attr}: {lc[attr]!a}")

Python 3.7 output:

mon_grouping: []
int_curr_symbol: 'SAR '
currency_symbol: '\xd8\xb1.\xd8\xb3'
mon_decimal_point: '.'
mon_thousands_sep: ''

Expected output:

mon_grouping: []
int_curr_symbol: 'SAR '
currency_symbol: '\u0631.\u0633'
mon_decimal_point: '.'
mon_thousands_sep: ''

Tested on Fedora 29.
msg330129 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-20 12:47
See also bpo-33954: float.__format__('n') fails with _PyUnicode_CheckConsistency assertion error for locales with non-ascii thousands separator. It may be nice to fix these two bugs at the same times, since they are related :-)
msg330131 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-20 14:10
I tested manually PR 10606:

LC_ALL= LC_CTYPE=xxx LC_MONETARY=xxx ./python -c 'import locale; locale.setlocale(locale.LC_ALL, ""); print(ascii(locale.localeconv()["currency_symbol"]))'
'\xa3'

Result (bug = result/error without the fix):


* LC_CTYPE=en_GB, LC_MONETARY=ar_SA.UTF-8: currency_symbol='\u0631.\u0633' (bug: '\xd8\xb1.\xd8\xb3')
* LC_CTYPE=en_GB, LC_MONETARY=fr_FR.UTF-8: currency_symbol='\u20ac' (bug: '\xe2\x82\xac')
* LC_CTYPE=en_GB, LC_MONETARY=uk_UA.koi8u: currency_symbol='\u0433\u0440\u043d.' (bug: '\xc7\xd2\xce.')
* LC_CTYPE=fr_FR.UTF-8, LC_MONETARY=uk_UA.koi8u: currency_symbol='\u0433\u0440\u043d.' (bug: UnicodeDecodeError)

Locale encodings:

* en_GB: latin1
* ar_SA.UTF-8: utf8
* fr_FR.UTF-8: utf8
* uk_UA.koi8u: KOI8-U
msg330132 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-20 15:20
New changeset 02e6bf7f2025cddcbde6432f6b6396198ab313f4 by Victor Stinner in branch 'master':
bpo-28604: Fix localeconv() for different LC_MONETARY (GH-10606)
https://github.com/python/cpython/commit/02e6bf7f2025cddcbde6432f6b6396198ab313f4
msg330153 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-20 21:06
New changeset 6eff6b8eecd7a8eccad16419269fa18ec820922e by Victor Stinner in branch '3.7':
bpo-28604: Fix localeconv() for different LC_MONETARY (GH-10606) (GH-10619)
https://github.com/python/cpython/commit/6eff6b8eecd7a8eccad16419269fa18ec820922e
msg330155 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-20 21:36
New changeset df3051b53fd7f2862a4087f5449e811d8421347a by Victor Stinner in branch '3.6':
bpo-28604: Fix localeconv() for different LC_MONETARY (GH-10606) (GH-10619) (GH-10621)
https://github.com/python/cpython/commit/df3051b53fd7f2862a4087f5449e811d8421347a
msg330191 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-21 11:26
It seems like my change introduced a regression: bpo-35290.
msg330609 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-28 16:52
See also bpo-31900: localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding.
History
Date User Action Args
2018-11-28 16:52:24vstinnersetmessages: + msg330609
2018-11-28 16:51:47vstinnersettitle: Exception raised by python3.5 when using en_GB locale -> localeconv() doesn't support LC_MONETARY encoding different than LC_CTYPE encoding
2018-11-21 11:26:12vstinnersetmessages: + msg330191
2018-11-20 21:37:25vstinnersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2018-11-20 21:36:19vstinnersetmessages: + msg330155
2018-11-20 21:08:55vstinnersetpull_requests: + pull_request9869
2018-11-20 21:06:25vstinnersetmessages: + msg330153
2018-11-20 20:14:32vstinnersetpull_requests: + pull_request9867
2018-11-20 15:20:28vstinnersetmessages: + msg330132
2018-11-20 14:10:13vstinnersetversions: + Python 3.8, - Python 3.5
2018-11-20 14:10:05vstinnersetmessages: + msg330131
2018-11-20 12:47:44vstinnersetmessages: + msg330129
2018-11-20 12:36:21vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request9849
2018-11-20 12:27:56vstinnersetmessages: + msg330128
2018-10-01 13:53:35xtreaksetnosy: + xtreak
2018-09-24 12:30:14petr.viktorinsetnosy: + vstinner
2017-09-30 19:24:02schwabsetnosy: + schwab
messages: + msg303419
2016-11-04 10:50:40cstrataksetnosy: + cstratak
2016-11-03 22:21:49serhiy.storchakasetnosy: + loewis, serhiy.storchaka, lemburg

messages: + msg280028
versions: + Python 3.7, - Python 3.4
2016-11-03 21:26:34Guillaume Pasquet (Etenil)create