Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

localeconv() doesn't support LC_MONETARY encoding different than LC_CTYPE encoding #72790

Closed
GuillaumePasquetEtenil mannequin opened this issue Nov 3, 2016 · 11 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@GuillaumePasquetEtenil
Copy link
Mannequin

GuillaumePasquetEtenil mannequin commented Nov 3, 2016

BPO 28604
Nosy @malemburg, @loewis, @vstinner, @serhiy-storchaka, @andreas-schwab, @stratakis, @tirkarthi
PRs
  • bpo-28604: Fix localeconv() for different LC_MONETARY #10606
  • [3.7] bpo-28604: Fix localeconv() for different LC_MONETARY (GH-10606) #10619
  • [3.6] bpo-28604: Fix localeconv() for different LC_MONETARY (GH-10606) (GH-10619) #10621
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-11-20.21:37:25.308>
    created_at = <Date 2016-11-03.21:26:34.660>
    labels = ['interpreter-core', '3.8', 'type-bug', '3.7']
    title = "localeconv() doesn't support LC_MONETARY encoding different than LC_CTYPE encoding"
    updated_at = <Date 2018-11-28.16:52:24.298>
    user = 'https://bugs.python.org/GuillaumePasquetEtenil'

    bugs.python.org fields:

    activity = <Date 2018-11-28.16:52:24.298>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-11-20.21:37:25.308>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2016-11-03.21:26:34.660>
    creator = 'Guillaume Pasquet (Etenil)'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 28604
    keywords = ['patch']
    message_count = 11.0
    messages = ['280023', '280028', '303419', '330128', '330129', '330131', '330132', '330153', '330155', '330191', '330609']
    nosy_count = 8.0
    nosy_names = ['lemburg', 'loewis', 'vstinner', 'serhiy.storchaka', 'schwab', 'cstratak', 'Guillaume Pasquet (Etenil)', 'xtreak']
    pr_nums = ['10606', '10619', '10621']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue28604'
    versions = ['Python 3.6', 'Python 3.7', 'Python 3.8']

    @GuillaumePasquetEtenil
    Copy link
    Mannequin Author

    GuillaumePasquetEtenil mannequin commented Nov 3, 2016

    This issue was originally reported on Fedora's Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1391280

    Description of problem:
    After switching the monetary locale to en_GB, python then raises an exception when calling locale.localeconv()

    Version-Release number of selected component (if applicable):
    3.5.2-4.fc25

    How reproducible:
    Every time

    Steps to Reproduce:

    1. Write a python3 script or open the interactive interpreter with "python3"
    2. Enter the following
      import locale
      locale.setlocale(locale.LC_MONETARY, 'en_GB')
      locale.localeconv()
    3. Observe that python raises an encoding exception
    Actual results:
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib64/python3.5/locale.py", line 110, in localeconv
        d = _localeconv()
    UnicodeDecodeError: 'locale' codec can't decode byte 0xa3 in position 0: Invalid or incomplete multibyte or wide character

    Expected results:
    A dictionary of locale data similar to (for en_US):
    {'mon_thousands_sep': ',', 'currency_symbol': '$', 'negative_sign': '-', 'p_sep_by_space': 0, 'frac_digits': 2, 'int_frac_digits': 2, 'decimal_point': '.', 'mon_decimal_point': '.', 'positive_sign': '', 'p_cs_precedes': 1, 'p_sign_posn': 1, 'mon_grouping': [3, 3, 0], 'n_cs_precedes': 1, 'n_sign_posn': 1, 'grouping': [3, 3, 0], 'thousands_sep': ',', 'int_curr_symbol': 'USD ', 'n_sep_by_space': 0}

    Note:
    This was reproduced on Linux Mint 18 (python 3.5.2), and also on Fedora with python 3.4 and python 3.6 (compiled).

    @GuillaumePasquetEtenil GuillaumePasquetEtenil mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Nov 3, 2016
    @serhiy-storchaka
    Copy link
    Member

    I suspect this issue is similar to bpo-25812. en_GB has non-ut8 encoding (likely iso8859-1). Currency symbol £ is encoded with this encoding as b'\xa3'. But Python tries to decode b'\xa3' with an encoding determined by other locale setting (LC_CTYPE).

    @serhiy-storchaka serhiy-storchaka added the 3.7 (EOL) end of life label Nov 3, 2016
    @andreas-schwab
    Copy link
    Mannequin

    andreas-schwab mannequin commented Sep 30, 2017

    This causes test_float.py to fail with glibc > 2.26.

    ERROR: test_float_with_comma (main.GeneralFloatCases)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/abuild/rpmbuild/BUILD/Python-3.6.2/Lib/test/support/__init__.py", line 1590, in inner
        return func(*args, **kwds)
      File "Lib/test/test_float.py", line 150, in test_float_with_comma
        if not locale.localeconv()['decimal_point'] == ',':
      File "/home/abuild/rpmbuild/BUILD/Python-3.6.2/Lib/locale.py", line 110, in localeconv
        d = _localeconv()
    UnicodeDecodeError: 'locale' codec can't decode byte 0xa0 in position 0: Invalid or incomplete multibyte or wide character

    @vstinner
    Copy link
    Member

    Example of the bug:

    import locale
    # LC_CTYPE: latin1 encoding
    locale.setlocale(locale.LC_ALL, "en_GB")
    # LC_MONETARY: utf8 encoding
    locale.setlocale(locale.LC_MONETARY, "ar_SA.UTF-8")
    lc = locale.localeconv()
    for attr in (
        "mon_grouping",
        "int_curr_symbol",
        "currency_symbol",
        "mon_decimal_point",
        "mon_thousands_sep",
    ):
        print(f"{attr}: {lc[attr]!a}")

    Python 3.7 output:

    mon_grouping: []
    int_curr_symbol: 'SAR '
    currency_symbol: '\xd8\xb1.\xd8\xb3'
    mon_decimal_point: '.'
    mon_thousands_sep: ''

    Expected output:

    mon_grouping: []
    int_curr_symbol: 'SAR '
    currency_symbol: '\u0631.\u0633'
    mon_decimal_point: '.'
    mon_thousands_sep: ''

    Tested on Fedora 29.

    @vstinner
    Copy link
    Member

    See also bpo-33954: float.__format__('n') fails with _PyUnicode_CheckConsistency assertion error for locales with non-ascii thousands separator. It may be nice to fix these two bugs at the same times, since they are related :-)

    @vstinner
    Copy link
    Member

    I tested manually PR 10606:

    LC_ALL= LC_CTYPE=xxx LC_MONETARY=xxx ./python -c 'import locale; locale.setlocale(locale.LC_ALL, ""); print(ascii(locale.localeconv()["currency_symbol"]))'
    '\xa3'

    Result (bug = result/error without the fix):

    • LC_CTYPE=en_GB, LC_MONETARY=ar_SA.UTF-8: currency_symbol='\u0631.\u0633' (bug: '\xd8\xb1.\xd8\xb3')
    • LC_CTYPE=en_GB, LC_MONETARY=fr_FR.UTF-8: currency_symbol='\u20ac' (bug: '\xe2\x82\xac')
    • LC_CTYPE=en_GB, LC_MONETARY=uk_UA.koi8u: currency_symbol='\u0433\u0440\u043d.' (bug: '\xc7\xd2\xce.')
    • LC_CTYPE=fr_FR.UTF-8, LC_MONETARY=uk_UA.koi8u: currency_symbol='\u0433\u0440\u043d.' (bug: UnicodeDecodeError)

    Locale encodings:

    • en_GB: latin1
    • ar_SA.UTF-8: utf8
    • fr_FR.UTF-8: utf8
    • uk_UA.koi8u: KOI8-U

    @vstinner vstinner added the 3.8 only security fixes label Nov 20, 2018
    @vstinner
    Copy link
    Member

    New changeset 02e6bf7 by Victor Stinner in branch 'master':
    bpo-28604: Fix localeconv() for different LC_MONETARY (GH-10606)
    02e6bf7

    @vstinner
    Copy link
    Member

    New changeset 6eff6b8 by Victor Stinner in branch '3.7':
    bpo-28604: Fix localeconv() for different LC_MONETARY (GH-10606) (GH-10619)
    6eff6b8

    @vstinner
    Copy link
    Member

    New changeset df3051b by Victor Stinner in branch '3.6':
    bpo-28604: Fix localeconv() for different LC_MONETARY (GH-10606) (GH-10619) (GH-10621)
    df3051b

    @vstinner
    Copy link
    Member

    It seems like my change introduced a regression: bpo-35290.

    @vstinner vstinner changed the title Exception raised by python3.5 when using en_GB locale localeconv() doesn't support LC_MONETARY encoding different than LC_CTYPE encoding Nov 28, 2018
    @vstinner
    Copy link
    Member

    See also bpo-31900: localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants