classification
Title: Windows getlocale unix-like with french, german, portuguese, spanish
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.3, Python 3.4, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, fomcl@yahoo.com, r.david.murray
Priority: normal Keywords:

Created on 2015-02-09 20:55 by fomcl@yahoo.com, last changed 2015-02-17 10:35 by fomcl@yahoo.com.

Messages (6)
msg235630 - (view) Author: albertjan (fomcl@yahoo.com) Date: 2015-02-09 20:55
getlocale() is supposed to (?) return a locale two-tuple in a platform-specific notation. However, in *Windows* 7 64, with Python 3.4, 3.3 and 2.7 a *unix-like*, abbreviated, lang_territory notation is used for french, german, portuguese, spanish. In other words: In these four cases, the output of setlocale is not equal to ".".join(locale.getlocale())

## Code that demonstrates the differences
from __future__ import print_function
import locale
import collections
import pprint

languages = ("chinese czech danish dutch english finnish french german greek "
             "hungarian icelandic italian japanese korean norwegian polish "
             "portuguese russian slovak spanish swedish turkish")
d = collections.defaultdict(list)
t = collections.namedtuple("Locale", "lang setlocale getlocale")
for language in languages.split():
    sloc = locale.setlocale(locale.LC_ALL, language)
    gloc = locale.getlocale()
    record = t(language, sloc, gloc)
    if gloc[0][2] == "_":
        d["unix-like"].append(record)
    else:
        d["windows-like"].append(record)
     
pprint.pprint(dict(d))

## output
n:\>C:\Miniconda3\python.exe N:\temp\loc.py
-------------------------------------------------------------
{'unix-like': [Locale(lang='french', setlocale='French_France.1252', getlocale=('fr_FR', 'cp1252')),
               Locale(lang='german', setlocale='German_Germany.1252', getlocale=('de_DE', 'cp1252')),
               Locale(lang='portuguese', setlocale='Portuguese_Brazil.1252', getlocale=('pt_BR', 'cp1252')),
               Locale(lang='spanish', setlocale='Spanish_Spain.1252', getlocale=('es_ES', 'cp1252'))],
-------------------------------------------------------------
 'windows-like': [Locale(lang='chinese', setlocale="Chinese (Simplified)_People's Republic of China.936", getlocale=("Chinese (Simplified)_People's Republic of China", '936')),
                  Locale(lang='czech', setlocale='Czech_Czech Republic.1250', getlocale=('Czech_Czech Republic', '1250')),
                  Locale(lang='danish', setlocale='Danish_Denmark.1252', getlocale=('Danish_Denmark', '1252')),
                  Locale(lang='dutch', setlocale='Dutch_Netherlands.1252', getlocale=('Dutch_Netherlands', '1252')),
                  Locale(lang='english', setlocale='English_United States.1252', getlocale=('English_United States', '1252')),
                  Locale(lang='finnish', setlocale='Finnish_Finland.1252', getlocale=('Finnish_Finland', '1252')),
                  Locale(lang='greek', setlocale='Greek_Greece.1253', getlocale=('Greek_Greece', '1253')),
                  Locale(lang='hungarian', setlocale='Hungarian_Hungary.1250', getlocale=('Hungarian_Hungary', '1250')),
                  Locale(lang='icelandic', setlocale='Icelandic_Iceland.1252', getlocale=('Icelandic_Iceland', '1252')),
                  Locale(lang='italian', setlocale='Italian_Italy.1252', getlocale=('Italian_Italy', '1252')),
                  Locale(lang='japanese', setlocale='Japanese_Japan.932', getlocale=('Japanese_Japan', '932')),
                  Locale(lang='korean', setlocale='Korean_Korea.949', getlocale=('Korean_Korea', '949')),
                  Locale(lang='norwegian', setlocale='Norwegian (Bokmål)_Norway.1252', getlocale=('Norwegian (Bokmål)_Norway', '1252')),
                  Locale(lang='polish', setlocale='Polish_Poland.1250', getlocale=('Polish_Poland', '1250')),
                  Locale(lang='russian', setlocale='Russian_Russia.1251', getlocale=('Russian_Russia', '1251')),
                  Locale(lang='slovak', setlocale='Slovak_Slovakia.1250', getlocale=('Slovak_Slovakia', '1250')),
                  Locale(lang='swedish', setlocale='Swedish_Sweden.1252', getlocale=('Swedish_Sweden', '1252')),
                  Locale(lang='turkish', setlocale='Turkish_Turkey.1254', getlocale=('Turkish_Turkey', '1254'))]}
msg235908 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-02-13 18:30
This is either related to or effectively a duplicate of issue 10466, which contains a fair amount of discussion of the underlying problems.
msg235916 - (view) Author: albertjan (fomcl@yahoo.com) Date: 2015-02-13 19:48
I agree that the two issues are related, but I don't see how they could be duplicates. But maybe that's because I do not know the underlying code.

issue 10466 is mostly about getdefaultlocale() and whether it's desirable or not that its return value is always uniq-esque, including on windows. The failed call to locale.py*) as a script would demonstrate that the getdefaultlocale() return value ought to be platform-specific and ready for consumption by setlocale(). That's how I read that issue. I personally find it useful to have getdefaultlocale() --a nice, harmonized locale string.

With getlocale in Windows, however, the return value is sometimes unix-like, sometimes windows-specific. Until a couple of days ago I thought getlocale was entirely platform-specific. Why should locale.setlocale(locale.LC_ALL, ".".join(locale.getlocale())) succeed on my Dutch windows system, but fail on my neighbour's German windows system?

In my humble opinion:
-setlocale should return nothing. It's a setter
-getlocale should return a platform-specific locale specification, probably what is currently returned by setlocale. The output should be ready for consumption by setlocale.
-getdefaultlocale should ALWAYS return a harmonized/unix-like locale specification. In Unix, but not in Windows, it could be used as an argument for setlocale.

My two cents.

Best wishes,
Albert-Jan

*) which also fails on Python 2.7 and 3.4 on my Dutch Windows 7 64, btw.
msg235918 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-02-13 20:03
Sorry, when I said "effectively a duplicate" I didn't mean *actually* a duplicate, I meant that fixing one will either result in or require fixing the other (same core cause: the disconnect between the Windows names and the unix names and the need for a *consistent* mapping between them).

But, I didn't fully reread that issue or the docs, so maybe I'm wrong about that.
msg235937 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2015-02-14 01:06
> -setlocale should return nothing. It's a setter
> -getlocale should return a platform-specific locale specification,
> probably what is currently returned by setlocale. The output 
> should be ready for consumption by setlocale.

These functions are well documented, so it's pointless to talk about major changes to the API. Per the docs, getlocale should return an RFC 1766 language code. If you want the platform result, use something like the following:

    def getrawlocale(category=locale.LC_CTYPE):
        return locale.setlocale(category)

    >>> locale.setlocale(locale.LC_CTYPE, 'eng')   
    'English_United Kingdom.1252'
    >>> getrawlocale()                          
    'English_United Kingdom.1252'

    >>> # the new CRT supports RFC1766
    ... locale.setlocale(locale.LC_CTYPE, 'en-GB')
    'en-GB'
    >>> getrawlocale()                            
    'en-GB'

As I mentioned in issue 20088, the locale_alias dict is based on X11's locale.alias file. It doesn't handle most Windows locale strings of the form language_country.codepage. 

On Windows, the _locale extension module could enumerate the system locales at startup to build a mapping. Here's a rough prototype using ctypes (requires Vista or later for the new locale functions):

    import locale
    from ctypes import *
    from ctypes.wintypes import *

    LOCALE_WINDOWS = 1
    LOCALE_SENGLISHLANGUAGENAME = 0x1001
    LOCALE_SENGLISHCOUNTRYNAME = 0x1002
    LOCALE_IDEFAULTANSICODEPAGE = 0x1004
    LCTYPES = (LOCALE_SENGLISHLANGUAGENAME,
               LOCALE_SENGLISHCOUNTRYNAME,
               LOCALE_IDEFAULTANSICODEPAGE)

    kernel32 = WinDLL('kernel32')
    EnumSystemLocalesEx = kernel32.EnumSystemLocalesEx
    GetLocaleInfoEx = kernel32.GetLocaleInfoEx

    EnumLocalesProcEx = WINFUNCTYPE(BOOL, LPWSTR, DWORD, LPARAM)

    def enum_system_locales():
        alias = {}
        codepage = {}
        info = (WCHAR * 100)()
    
        @EnumLocalesProcEx
        def callback(locale, flags, param):
            if '-' not in locale:
                return True
            parts = []
            for lctype in LCTYPES:
                if not GetLocaleInfoEx(locale, 
                                       lctype, 
                                       info, len(info)):
                    raise WinError()
                parts.append(info.value)
            lang, ctry, code = parts
            if lang and ctry and code != '0':
                locale = locale.replace('-', '_')
                full = '{}_{}'.format(lang, ctry)
                alias[full] = locale
                codepage[locale] = 'cp' + code
            return True
        
        if not EnumSystemLocalesEx(callback, 
                                   LOCALE_WINDOWS, 
                                   None, None):
            raise WinError()
        return alias, codepage


    >>> alias, codepage = enum_system_locales()

    >>> alias["English_United Kingdom"]
    'en_GB'
    >>> codepage['en_GB']              
    'cp1252'
    >>> alias["Spanish_United States"] 
    'es_US'
    >>> codepage['es_US']             
    'cp1252'
    >>> alias["Russian_Russia"]
    'ru_RU'
    >>> codepage['ru_RU']
    'cp1251'
    >>> alias["Chinese (Simplified)_People's Republic of China"]
    'zh_CN'
    >>> codepage['zh_CN']
    'cp936'
msg236129 - (view) Author: albertjan (fomcl@yahoo.com) Date: 2015-02-17 10:35
Hi, 

Thanks for your replies. Eryksun (nice to meet you here too!), your function seems very useful, thank you very much. I had indeed already switched to your 'getrawlocale' approach.

Perhaps off-topic (because I have never seen this happen in Windows), but locale.getlocale() sometimes returns (None, None), *even if* locale.setlocale(locale.LC_ALL, "") has been called at the start of the program. For some reason, LANG, LC_ALL and possible other vars are sometimes not set correctly (I know this is not Python's fault, but...). Would it be a good idea to have a 'failsafe' parameter in getlocale? Something like:

def safe_getlocale(failsafe=False):
    current_locale = locale.getlocale()
    if failsafe and current_locale[0] is None and not sys.platform.startswith("win"):
        os.environ["LANG"] = "en_US.UTF-8"
        os.environ["LC_ALL"] = "en_US.UTF-8"
        current_locale = locale.getlocale()
     return current_locale

(sorry for squeezing this in the current issue!)

Albert-Jan
History
Date User Action Args
2015-02-17 10:35:52fomcl@yahoo.comsetmessages: + msg236129
2015-02-14 01:06:23eryksunsetnosy: + eryksun
messages: + msg235937
2015-02-13 20:04:00r.david.murraysetmessages: + msg235918
2015-02-13 19:48:27fomcl@yahoo.comsetmessages: + msg235916
2015-02-13 18:30:41r.david.murraysetnosy: + r.david.murray
messages: + msg235908
2015-02-09 20:55:10fomcl@yahoo.comcreate