New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows getlocale unix-like with french, german, portuguese, spanish #67613
Comments
getlocale() is supposed to (?) return a locale two-tuple in a platform-specific notation. However, in *Windows* 7 64, with Python 3.4, 3.3 and 2.7 a *unix-like*, abbreviated, lang_territory notation is used for french, german, portuguese, spanish. In other words: In these four cases, the output of setlocale is not equal to ".".join(locale.getlocale()) ## Code that demonstrates the differences
from __future__ import print_function
import locale
import collections
import pprint
languages = ("chinese czech danish dutch english finnish french german greek "
"hungarian icelandic italian japanese korean norwegian polish "
"portuguese russian slovak spanish swedish turkish")
d = collections.defaultdict(list)
t = collections.namedtuple("Locale", "lang setlocale getlocale")
for language in languages.split():
sloc = locale.setlocale(locale.LC_ALL, language)
gloc = locale.getlocale()
record = t(language, sloc, gloc)
if gloc[0][2] == "_":
d["unix-like"].append(record)
else:
d["windows-like"].append(record)
pprint.pprint(dict(d)) ## output |
This is either related to or effectively a duplicate of bpo-10466, which contains a fair amount of discussion of the underlying problems. |
I agree that the two issues are related, but I don't see how they could be duplicates. But maybe that's because I do not know the underlying code. bpo-10466 is mostly about getdefaultlocale() and whether it's desirable or not that its return value is always uniq-esque, including on windows. The failed call to locale.py*) as a script would demonstrate that the getdefaultlocale() return value ought to be platform-specific and ready for consumption by setlocale(). That's how I read that issue. I personally find it useful to have getdefaultlocale() --a nice, harmonized locale string. With getlocale in Windows, however, the return value is sometimes unix-like, sometimes windows-specific. Until a couple of days ago I thought getlocale was entirely platform-specific. Why should locale.setlocale(locale.LC_ALL, ".".join(locale.getlocale())) succeed on my Dutch windows system, but fail on my neighbour's German windows system? In my humble opinion: My two cents. Best wishes, *) which also fails on Python 2.7 and 3.4 on my Dutch Windows 7 64, btw. |
Sorry, when I said "effectively a duplicate" I didn't mean *actually* a duplicate, I meant that fixing one will either result in or require fixing the other (same core cause: the disconnect between the Windows names and the unix names and the need for a *consistent* mapping between them). But, I didn't fully reread that issue or the docs, so maybe I'm wrong about that. |
These functions are well documented, so it's pointless to talk about major changes to the API. Per the docs, getlocale should return an RFC 1766 language code. If you want the platform result, use something like the following: def getrawlocale(category=locale.LC_CTYPE):
return locale.setlocale(category) >>> locale.setlocale(locale.LC_CTYPE, 'eng')
'English_United Kingdom.1252'
>>> getrawlocale()
'English_United Kingdom.1252'
>>> # the new CRT supports RFC1766
... locale.setlocale(locale.LC_CTYPE, 'en-GB')
'en-GB'
>>> getrawlocale()
'en-GB' As I mentioned in bpo-20088, the locale_alias dict is based on X11's locale.alias file. It doesn't handle most Windows locale strings of the form language_country.codepage. On Windows, the _locale extension module could enumerate the system locales at startup to build a mapping. Here's a rough prototype using ctypes (requires Vista or later for the new locale functions): import locale
from ctypes import *
from ctypes.wintypes import *
LOCALE_WINDOWS = 1
LOCALE_SENGLISHLANGUAGENAME = 0x1001
LOCALE_SENGLISHCOUNTRYNAME = 0x1002
LOCALE_IDEFAULTANSICODEPAGE = 0x1004
LCTYPES = (LOCALE_SENGLISHLANGUAGENAME,
LOCALE_SENGLISHCOUNTRYNAME,
LOCALE_IDEFAULTANSICODEPAGE)
kernel32 = WinDLL('kernel32')
EnumSystemLocalesEx = kernel32.EnumSystemLocalesEx
GetLocaleInfoEx = kernel32.GetLocaleInfoEx
EnumLocalesProcEx = WINFUNCTYPE(BOOL, LPWSTR, DWORD, LPARAM)
def enum_system_locales():
alias = {}
codepage = {}
info = (WCHAR * 100)()
@EnumLocalesProcEx
def callback(locale, flags, param):
if '-' not in locale:
return True
parts = []
for lctype in LCTYPES:
if not GetLocaleInfoEx(locale,
lctype,
info, len(info)):
raise WinError()
parts.append(info.value)
lang, ctry, code = parts
if lang and ctry and code != '0':
locale = locale.replace('-', '_')
full = '{}_{}'.format(lang, ctry)
alias[full] = locale
codepage[locale] = 'cp' + code
return True
if not EnumSystemLocalesEx(callback,
LOCALE_WINDOWS,
None, None):
raise WinError()
return alias, codepage
>>> alias["English_United Kingdom"]
'en_GB'
>>> codepage['en_GB']
'cp1252'
>>> alias["Spanish_United States"]
'es_US'
>>> codepage['es_US']
'cp1252'
>>> alias["Russian_Russia"]
'ru_RU'
>>> codepage['ru_RU']
'cp1251'
>>> alias["Chinese (Simplified)_People's Republic of China"]
'zh_CN'
>>> codepage['zh_CN']
'cp936' |
Hi, Thanks for your replies. Eryksun (nice to meet you here too!), your function seems very useful, thank you very much. I had indeed already switched to your 'getrawlocale' approach. Perhaps off-topic (because I have never seen this happen in Windows), but locale.getlocale() sometimes returns (None, None), *even if* locale.setlocale(locale.LC_ALL, "") has been called at the start of the program. For some reason, LANG, LC_ALL and possible other vars are sometimes not set correctly (I know this is not Python's fault, but...). Would it be a good idea to have a 'failsafe' parameter in getlocale? Something like: def safe_getlocale(failsafe=False):
current_locale = locale.getlocale()
if failsafe and current_locale[0] is None and not sys.platform.startswith("win"):
os.environ["LANG"] = "en_US.UTF-8"
os.environ["LC_ALL"] = "en_US.UTF-8"
current_locale = locale.getlocale()
return current_locale (sorry for squeezing this in the current issue!) Albert-Jan |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: