Issue23425
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2015-02-09 20:55 by fomcl@yahoo.com, last changed 2022-04-11 14:58 by admin.
Messages (6) | |||
---|---|---|---|
msg235630 - (view) | Author: albertjan (fomcl@yahoo.com) | Date: 2015-02-09 20:55 | |
getlocale() is supposed to (?) return a locale two-tuple in a platform-specific notation. However, in *Windows* 7 64, with Python 3.4, 3.3 and 2.7 a *unix-like*, abbreviated, lang_territory notation is used for french, german, portuguese, spanish. In other words: In these four cases, the output of setlocale is not equal to ".".join(locale.getlocale()) ## Code that demonstrates the differences from __future__ import print_function import locale import collections import pprint languages = ("chinese czech danish dutch english finnish french german greek " "hungarian icelandic italian japanese korean norwegian polish " "portuguese russian slovak spanish swedish turkish") d = collections.defaultdict(list) t = collections.namedtuple("Locale", "lang setlocale getlocale") for language in languages.split(): sloc = locale.setlocale(locale.LC_ALL, language) gloc = locale.getlocale() record = t(language, sloc, gloc) if gloc[0][2] == "_": d["unix-like"].append(record) else: d["windows-like"].append(record) pprint.pprint(dict(d)) ## output n:\>C:\Miniconda3\python.exe N:\temp\loc.py ------------------------------------------------------------- {'unix-like': [Locale(lang='french', setlocale='French_France.1252', getlocale=('fr_FR', 'cp1252')), Locale(lang='german', setlocale='German_Germany.1252', getlocale=('de_DE', 'cp1252')), Locale(lang='portuguese', setlocale='Portuguese_Brazil.1252', getlocale=('pt_BR', 'cp1252')), Locale(lang='spanish', setlocale='Spanish_Spain.1252', getlocale=('es_ES', 'cp1252'))], ------------------------------------------------------------- 'windows-like': [Locale(lang='chinese', setlocale="Chinese (Simplified)_People's Republic of China.936", getlocale=("Chinese (Simplified)_People's Republic of China", '936')), Locale(lang='czech', setlocale='Czech_Czech Republic.1250', getlocale=('Czech_Czech Republic', '1250')), Locale(lang='danish', setlocale='Danish_Denmark.1252', getlocale=('Danish_Denmark', '1252')), Locale(lang='dutch', setlocale='Dutch_Netherlands.1252', getlocale=('Dutch_Netherlands', '1252')), Locale(lang='english', setlocale='English_United States.1252', getlocale=('English_United States', '1252')), Locale(lang='finnish', setlocale='Finnish_Finland.1252', getlocale=('Finnish_Finland', '1252')), Locale(lang='greek', setlocale='Greek_Greece.1253', getlocale=('Greek_Greece', '1253')), Locale(lang='hungarian', setlocale='Hungarian_Hungary.1250', getlocale=('Hungarian_Hungary', '1250')), Locale(lang='icelandic', setlocale='Icelandic_Iceland.1252', getlocale=('Icelandic_Iceland', '1252')), Locale(lang='italian', setlocale='Italian_Italy.1252', getlocale=('Italian_Italy', '1252')), Locale(lang='japanese', setlocale='Japanese_Japan.932', getlocale=('Japanese_Japan', '932')), Locale(lang='korean', setlocale='Korean_Korea.949', getlocale=('Korean_Korea', '949')), Locale(lang='norwegian', setlocale='Norwegian (Bokmål)_Norway.1252', getlocale=('Norwegian (Bokmål)_Norway', '1252')), Locale(lang='polish', setlocale='Polish_Poland.1250', getlocale=('Polish_Poland', '1250')), Locale(lang='russian', setlocale='Russian_Russia.1251', getlocale=('Russian_Russia', '1251')), Locale(lang='slovak', setlocale='Slovak_Slovakia.1250', getlocale=('Slovak_Slovakia', '1250')), Locale(lang='swedish', setlocale='Swedish_Sweden.1252', getlocale=('Swedish_Sweden', '1252')), Locale(lang='turkish', setlocale='Turkish_Turkey.1254', getlocale=('Turkish_Turkey', '1254'))]} |
|||
msg235908 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2015-02-13 18:30 | |
This is either related to or effectively a duplicate of issue 10466, which contains a fair amount of discussion of the underlying problems. |
|||
msg235916 - (view) | Author: albertjan (fomcl@yahoo.com) | Date: 2015-02-13 19:48 | |
I agree that the two issues are related, but I don't see how they could be duplicates. But maybe that's because I do not know the underlying code. issue 10466 is mostly about getdefaultlocale() and whether it's desirable or not that its return value is always uniq-esque, including on windows. The failed call to locale.py*) as a script would demonstrate that the getdefaultlocale() return value ought to be platform-specific and ready for consumption by setlocale(). That's how I read that issue. I personally find it useful to have getdefaultlocale() --a nice, harmonized locale string. With getlocale in Windows, however, the return value is sometimes unix-like, sometimes windows-specific. Until a couple of days ago I thought getlocale was entirely platform-specific. Why should locale.setlocale(locale.LC_ALL, ".".join(locale.getlocale())) succeed on my Dutch windows system, but fail on my neighbour's German windows system? In my humble opinion: -setlocale should return nothing. It's a setter -getlocale should return a platform-specific locale specification, probably what is currently returned by setlocale. The output should be ready for consumption by setlocale. -getdefaultlocale should ALWAYS return a harmonized/unix-like locale specification. In Unix, but not in Windows, it could be used as an argument for setlocale. My two cents. Best wishes, Albert-Jan *) which also fails on Python 2.7 and 3.4 on my Dutch Windows 7 64, btw. |
|||
msg235918 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2015-02-13 20:03 | |
Sorry, when I said "effectively a duplicate" I didn't mean *actually* a duplicate, I meant that fixing one will either result in or require fixing the other (same core cause: the disconnect between the Windows names and the unix names and the need for a *consistent* mapping between them). But, I didn't fully reread that issue or the docs, so maybe I'm wrong about that. |
|||
msg235937 - (view) | Author: Eryk Sun (eryksun) * | Date: 2015-02-14 01:06 | |
> -setlocale should return nothing. It's a setter > -getlocale should return a platform-specific locale specification, > probably what is currently returned by setlocale. The output > should be ready for consumption by setlocale. These functions are well documented, so it's pointless to talk about major changes to the API. Per the docs, getlocale should return an RFC 1766 language code. If you want the platform result, use something like the following: def getrawlocale(category=locale.LC_CTYPE): return locale.setlocale(category) >>> locale.setlocale(locale.LC_CTYPE, 'eng') 'English_United Kingdom.1252' >>> getrawlocale() 'English_United Kingdom.1252' >>> # the new CRT supports RFC1766 ... locale.setlocale(locale.LC_CTYPE, 'en-GB') 'en-GB' >>> getrawlocale() 'en-GB' As I mentioned in issue 20088, the locale_alias dict is based on X11's locale.alias file. It doesn't handle most Windows locale strings of the form language_country.codepage. On Windows, the _locale extension module could enumerate the system locales at startup to build a mapping. Here's a rough prototype using ctypes (requires Vista or later for the new locale functions): import locale from ctypes import * from ctypes.wintypes import * LOCALE_WINDOWS = 1 LOCALE_SENGLISHLANGUAGENAME = 0x1001 LOCALE_SENGLISHCOUNTRYNAME = 0x1002 LOCALE_IDEFAULTANSICODEPAGE = 0x1004 LCTYPES = (LOCALE_SENGLISHLANGUAGENAME, LOCALE_SENGLISHCOUNTRYNAME, LOCALE_IDEFAULTANSICODEPAGE) kernel32 = WinDLL('kernel32') EnumSystemLocalesEx = kernel32.EnumSystemLocalesEx GetLocaleInfoEx = kernel32.GetLocaleInfoEx EnumLocalesProcEx = WINFUNCTYPE(BOOL, LPWSTR, DWORD, LPARAM) def enum_system_locales(): alias = {} codepage = {} info = (WCHAR * 100)() @EnumLocalesProcEx def callback(locale, flags, param): if '-' not in locale: return True parts = [] for lctype in LCTYPES: if not GetLocaleInfoEx(locale, lctype, info, len(info)): raise WinError() parts.append(info.value) lang, ctry, code = parts if lang and ctry and code != '0': locale = locale.replace('-', '_') full = '{}_{}'.format(lang, ctry) alias[full] = locale codepage[locale] = 'cp' + code return True if not EnumSystemLocalesEx(callback, LOCALE_WINDOWS, None, None): raise WinError() return alias, codepage >>> alias, codepage = enum_system_locales() >>> alias["English_United Kingdom"] 'en_GB' >>> codepage['en_GB'] 'cp1252' >>> alias["Spanish_United States"] 'es_US' >>> codepage['es_US'] 'cp1252' >>> alias["Russian_Russia"] 'ru_RU' >>> codepage['ru_RU'] 'cp1251' >>> alias["Chinese (Simplified)_People's Republic of China"] 'zh_CN' >>> codepage['zh_CN'] 'cp936' |
|||
msg236129 - (view) | Author: albertjan (fomcl@yahoo.com) | Date: 2015-02-17 10:35 | |
Hi, Thanks for your replies. Eryksun (nice to meet you here too!), your function seems very useful, thank you very much. I had indeed already switched to your 'getrawlocale' approach. Perhaps off-topic (because I have never seen this happen in Windows), but locale.getlocale() sometimes returns (None, None), *even if* locale.setlocale(locale.LC_ALL, "") has been called at the start of the program. For some reason, LANG, LC_ALL and possible other vars are sometimes not set correctly (I know this is not Python's fault, but...). Would it be a good idea to have a 'failsafe' parameter in getlocale? Something like: def safe_getlocale(failsafe=False): current_locale = locale.getlocale() if failsafe and current_locale[0] is None and not sys.platform.startswith("win"): os.environ["LANG"] = "en_US.UTF-8" os.environ["LC_ALL"] = "en_US.UTF-8" current_locale = locale.getlocale() return current_locale (sorry for squeezing this in the current issue!) Albert-Jan |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:58:12 | admin | set | github: 67613 |
2021-02-26 16:59:12 | eryksun | set | versions: + Python 3.8 |
2021-02-26 16:57:18 | eryksun | set | nosy:
+ paul.moore, tim.golden, zach.ware, steve.dower components: + Windows versions: + Python 3.9, Python 3.10, - Python 2.7, Python 3.3, Python 3.4 |
2015-02-17 10:35:52 | fomcl@yahoo.com | set | messages: + msg236129 |
2015-02-14 01:06:23 | eryksun | set | nosy:
+ eryksun messages: + msg235937 |
2015-02-13 20:04:00 | r.david.murray | set | messages: + msg235918 |
2015-02-13 19:48:27 | fomcl@yahoo.com | set | messages: + msg235916 |
2015-02-13 18:30:41 | r.david.murray | set | nosy:
+ r.david.murray messages: + msg235908 |
2015-02-09 20:55:10 | fomcl@yahoo.com | create |