classification
Title: locale.getlocale() fails if locale name doesn't include encoding
Type: behavior Stage:
Components: Library (Lib), Windows Versions: Python 3.5, Python 3.4, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, BreamoreBoy, eryksun, lemburg, loewis, serhiy.storchaka, steve.dower, tim.golden, zach.ware
Priority: normal Keywords:

Created on 2013-12-28 09:44 by serhiy.storchaka, last changed 2015-02-12 15:35 by eryksun.

Messages (3)
msg207026 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-28 09:44
>>> import locale, _locale
>>> _locale.setlocale(locale.LC_CTYPE, 'en_AG')
'en_AG'
>>> _locale.setlocale(locale.LC_CTYPE)
'en_AG'
>>> locale.getlocale(locale.LC_CTYPE)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/serhiy/py/cpython/Lib/locale.py", line 575, in getlocale
    return _parse_localename(localename)
  File "/home/serhiy/py/cpython/Lib/locale.py", line 484, in _parse_localename
    raise ValueError('unknown locale: %s' % localename)
ValueError: unknown locale: en_AG

One solution is proposed in issue20079: map all supported in glibc locale names without encoding to locale names with encoding. But see issue20087. And default encoding can be different on other systems (not based on glibc).

Other solution is not guess an encoding, but use locale.nl_langinfo(locale.CODESET) in locale.getlocale(). And left in locale alias table only nonstandard mappings (such as english_uk -> en_GB.ISO8859-1 and sr_yu.iso88595 -> sr_CS.ISO8859-5).
msg235823 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2015-02-12 14:17
With #20079 closed but #20087 still open where do we stand with this issue?
msg235840 - (view) Author: Eryk Sun (eryksun) * Date: 2015-02-12 15:35
For 3.5 this affects Windows as well, since the new CRT supports RFC1766 language codes, but only without a codepage spec:

    Python 3.5.0a1 (v3.5.0a1:5d4b6a57d5fd, Feb  7 2015, 18:15:14) 
    [MSC v.1900 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import locale
    >>> locale.setlocale(locale.LC_ALL, 'en-GB')
    'en-GB'

    >>> locale.getlocale()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Program Files\Python35\lib\locale.py", line 578, in getlocale
        return _parse_localename(localename)
      File "C:\Program Files\Python35\lib\locale.py", line 487, in _parse_localename
        raise ValueError('unknown locale: %s' % localename)
    ValueError: unknown locale: en-GB

On Vista+ (since 3.5 drops XP support) the codepage can be queried easily via GetLocaleInfoEx:

    >>> from ctypes import *
    >>> LOCALE_IDEFAULTANSICODEPAGE = 0x1004
    >>> GetLocaleInfoEx = WinDLL('kernel32').GetLocaleInfoEx
    >>> info = (c_wchar * 100)()
    >>> GetLocaleInfoEx("en-GB", LOCALE_IDEFAULTANSICODEPAGE, info, len(info)) 
    5
    >>> info.value
    '1252'
    >>> GetLocaleInfoEx("zh-CN", LOCALE_IDEFAULTANSICODEPAGE, info, len(info))
    4
    >>> info.value                                                            
    '936'

Note that Windows follows the RFC spec here (not POSIX), using a hyphen instead of an underscore. 

This is a bit of tangent, but for the Windows full language_country.codepage form, the X-11 based locale_alias dict is generally useless. So, contrary to the docs, on Windows getlocale doesn't return the language code in RFC 1766 form. In some cases it does, but only by chance.
History
Date User Action Args
2015-02-12 15:35:47eryksunsetnosy: + tim.golden, eryksun, zach.ware, steve.dower
messages: + msg235840
components: + Windows
2015-02-12 14:17:07BreamoreBoysetnosy: + BreamoreBoy

messages: + msg235823
versions: + Python 3.5, - Python 3.3
2013-12-28 21:12:54Arfreversetnosy: + Arfrever
2013-12-28 09:44:02serhiy.storchakacreate