Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

locale.getlocale() fails if locale name doesn't include encoding #64287

Open
serhiy-storchaka opened this issue Dec 28, 2013 · 4 comments
Open
Labels
3.8 only security fixes 3.9 only security fixes 3.10 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@serhiy-storchaka
Copy link
Member

BPO 20088
Nosy @malemburg, @loewis, @tjguk, @zware, @serhiy-storchaka, @eryksun, @zooba, @Kristinita

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2013-12-28.09:44:02.941>
labels = ['3.8', 'type-bug', 'library', '3.9', '3.10']
title = "locale.getlocale() fails if locale name doesn't include encoding"
updated_at = <Date 2021-03-04.15:21:05.592>
user = 'https://github.com/serhiy-storchaka'

bugs.python.org fields:

activity = <Date 2021-03-04.15:21:05.592>
actor = 'eryksun'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2013-12-28.09:44:02.941>
creator = 'serhiy.storchaka'
dependencies = []
files = []
hgrepos = []
issue_num = 20088
keywords = []
message_count = 4.0
messages = ['207026', '235823', '235840', '388095']
nosy_count = 9.0
nosy_names = ['lemburg', 'loewis', 'tim.golden', 'Arfrever', 'zach.ware', 'serhiy.storchaka', 'eryksun', 'steve.dower', 'nervov_fan']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue20088'
versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

@serhiy-storchaka
Copy link
Member Author

>>> import locale, _locale
>>> _locale.setlocale(locale.LC_CTYPE, 'en_AG')
'en_AG'
>>> _locale.setlocale(locale.LC_CTYPE)
'en_AG'
>>> locale.getlocale(locale.LC_CTYPE)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/serhiy/py/cpython/Lib/locale.py", line 575, in getlocale
    return _parse_localename(localename)
  File "/home/serhiy/py/cpython/Lib/locale.py", line 484, in _parse_localename
    raise ValueError('unknown locale: %s' % localename)
ValueError: unknown locale: en_AG

One solution is proposed in bpo-20079: map all supported in glibc locale names without encoding to locale names with encoding. But see bpo-20087. And default encoding can be different on other systems (not based on glibc).

Other solution is not guess an encoding, but use locale.nl_langinfo(locale.CODESET) in locale.getlocale(). And left in locale alias table only nonstandard mappings (such as english_uk -> en_GB.ISO8859-1 and sr_yu.iso88595 -> sr_CS.ISO8859-5).

@serhiy-storchaka serhiy-storchaka added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Dec 28, 2013
@BreamoreBoy
Copy link
Mannequin

BreamoreBoy mannequin commented Feb 12, 2015

With bpo-20079 closed but bpo-20087 still open where do we stand with this issue?

@eryksun
Copy link
Contributor

eryksun commented Feb 12, 2015

For 3.5 this affects Windows as well, since the new CRT supports RFC1766 language codes, but only without a codepage spec:

    Python 3.5.0a1 (v3.5.0a1:5d4b6a57d5fd, Feb  7 2015, 18:15:14) 
    [MSC v.1900 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import locale
    >>> locale.setlocale(locale.LC_ALL, 'en-GB')
    'en-GB'

    >>> locale.getlocale()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Program Files\Python35\lib\locale.py", line 578, in getlocale
        return _parse_localename(localename)
      File "C:\Program Files\Python35\lib\locale.py", line 487, in _parse_localename
        raise ValueError('unknown locale: %s' % localename)
    ValueError: unknown locale: en-GB

On Vista+ (since 3.5 drops XP support) the codepage can be queried easily via GetLocaleInfoEx:

    >>> from ctypes import *
    >>> LOCALE_IDEFAULTANSICODEPAGE = 0x1004
    >>> GetLocaleInfoEx = WinDLL('kernel32').GetLocaleInfoEx
    >>> info = (c_wchar * 100)()
    >>> GetLocaleInfoEx("en-GB", LOCALE_IDEFAULTANSICODEPAGE, info, len(info)) 
    5
    >>> info.value
    '1252'
    >>> GetLocaleInfoEx("zh-CN", LOCALE_IDEFAULTANSICODEPAGE, info, len(info))
    4
    >>> info.value                                                            
    '936'

Note that Windows follows the RFC spec here (not POSIX), using a hyphen instead of an underscore.

This is a bit of tangent, but for the Windows full language_country.codepage form, the X-11 based locale_alias dict is generally useless. So, contrary to the docs, on Windows getlocale doesn't return the language code in RFC 1766 form. In some cases it does, but only by chance.

@eryksun
Copy link
Contributor

eryksun commented Mar 4, 2021

The locale_alias database was extended to support "en_AG" and many others, but I'd still prefer Serhiy's suggestion to not guess the codeset when checking the default LC_CTYPE category. Use locale.nl_langinfo(locale.CODESET), if it's available.

In Windows, I'd prefer to never guess since the encoding for a BCP-47 locale name can be directly queried. But this issue can be restricted to POSIX. What to do in Windows is already being considered in more recent issues: bpo-23425, bpo-37945, and bpo-43115.

@eryksun eryksun added 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes and removed OS-windows labels Mar 4, 2021
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.8 only security fixes 3.9 only security fixes 3.10 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
Status: No status
Development

No branches or pull requests

2 participants