Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows getlocale unix-like with french, german, portuguese, spanish #67613

Open
fomclyahoocom mannequin opened this issue Feb 9, 2015 · 6 comments
Open

Windows getlocale unix-like with french, german, portuguese, spanish #67613

fomclyahoocom mannequin opened this issue Feb 9, 2015 · 6 comments
Labels
3.8 only security fixes 3.9 only security fixes 3.10 only security fixes OS-windows stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@fomclyahoocom
Copy link
Mannequin

fomclyahoocom mannequin commented Feb 9, 2015

BPO 23425
Nosy @pfmoore, @tjguk, @bitdancer, @zware, @eryksun, @zooba

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2015-02-09.20:55:10.652>
labels = ['type-bug', '3.8', '3.9', '3.10', 'library', 'OS-windows']
title = 'Windows getlocale unix-like with french, german, portuguese, spanish'
updated_at = <Date 2021-02-26.16:59:12.723>
user = 'https://bugs.python.org/fomclyahoocom'

bugs.python.org fields:

activity = <Date 2021-02-26.16:59:12.723>
actor = 'eryksun'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)', 'Windows']
creation = <Date 2015-02-09.20:55:10.652>
creator = 'fomcl@yahoo.com'
dependencies = []
files = []
hgrepos = []
issue_num = 23425
keywords = []
message_count = 6.0
messages = ['235630', '235908', '235916', '235918', '235937', '236129']
nosy_count = 7.0
nosy_names = ['paul.moore', 'tim.golden', 'r.david.murray', 'zach.ware', 'eryksun', 'steve.dower', 'fomcl@yahoo.com']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue23425'
versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

@fomclyahoocom
Copy link
Mannequin Author

fomclyahoocom mannequin commented Feb 9, 2015

getlocale() is supposed to (?) return a locale two-tuple in a platform-specific notation. However, in *Windows* 7 64, with Python 3.4, 3.3 and 2.7 a *unix-like*, abbreviated, lang_territory notation is used for french, german, portuguese, spanish. In other words: In these four cases, the output of setlocale is not equal to ".".join(locale.getlocale())

## Code that demonstrates the differences
from __future__ import print_function
import locale
import collections
import pprint

languages = ("chinese czech danish dutch english finnish french german greek "
             "hungarian icelandic italian japanese korean norwegian polish "
             "portuguese russian slovak spanish swedish turkish")
d = collections.defaultdict(list)
t = collections.namedtuple("Locale", "lang setlocale getlocale")
for language in languages.split():
    sloc = locale.setlocale(locale.LC_ALL, language)
    gloc = locale.getlocale()
    record = t(language, sloc, gloc)
    if gloc[0][2] == "_":
        d["unix-like"].append(record)
    else:
        d["windows-like"].append(record)
     
pprint.pprint(dict(d))

## output
n:\>C:\Miniconda3\python.exe N:\temp\loc.py
-------------------------------------------------------------
{'unix-like': [Locale(lang='french', setlocale='French_France.1252', getlocale=('fr_FR', 'cp1252')),
Locale(lang='german', setlocale='German_Germany.1252', getlocale=('de_DE', 'cp1252')),
Locale(lang='portuguese', setlocale='Portuguese_Brazil.1252', getlocale=('pt_BR', 'cp1252')),
Locale(lang='spanish', setlocale='Spanish_Spain.1252', getlocale=('es_ES', 'cp1252'))],
-------------------------------------------------------------
'windows-like': [Locale(lang='chinese', setlocale="Chinese (Simplified)_People's Republic of China.936", getlocale=("Chinese (Simplified)_People's Republic of China", '936')),
Locale(lang='czech', setlocale='Czech_Czech Republic.1250', getlocale=('Czech_Czech Republic', '1250')),
Locale(lang='danish', setlocale='Danish_Denmark.1252', getlocale=('Danish_Denmark', '1252')),
Locale(lang='dutch', setlocale='Dutch_Netherlands.1252', getlocale=('Dutch_Netherlands', '1252')),
Locale(lang='english', setlocale='English_United States.1252', getlocale=('English_United States', '1252')),
Locale(lang='finnish', setlocale='Finnish_Finland.1252', getlocale=('Finnish_Finland', '1252')),
Locale(lang='greek', setlocale='Greek_Greece.1253', getlocale=('Greek_Greece', '1253')),
Locale(lang='hungarian', setlocale='Hungarian_Hungary.1250', getlocale=('Hungarian_Hungary', '1250')),
Locale(lang='icelandic', setlocale='Icelandic_Iceland.1252', getlocale=('Icelandic_Iceland', '1252')),
Locale(lang='italian', setlocale='Italian_Italy.1252', getlocale=('Italian_Italy', '1252')),
Locale(lang='japanese', setlocale='Japanese_Japan.932', getlocale=('Japanese_Japan', '932')),
Locale(lang='korean', setlocale='Korean_Korea.949', getlocale=('Korean_Korea', '949')),
Locale(lang='norwegian', setlocale='Norwegian (Bokmål)_Norway.1252', getlocale=('Norwegian (Bokmål)_Norway', '1252')),
Locale(lang='polish', setlocale='Polish_Poland.1250', getlocale=('Polish_Poland', '1250')),
Locale(lang='russian', setlocale='Russian_Russia.1251', getlocale=('Russian_Russia', '1251')),
Locale(lang='slovak', setlocale='Slovak_Slovakia.1250', getlocale=('Slovak_Slovakia', '1250')),
Locale(lang='swedish', setlocale='Swedish_Sweden.1252', getlocale=('Swedish_Sweden', '1252')),
Locale(lang='turkish', setlocale='Turkish_Turkey.1254', getlocale=('Turkish_Turkey', '1254'))]}

@fomclyahoocom fomclyahoocom mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Feb 9, 2015
@bitdancer
Copy link
Member

This is either related to or effectively a duplicate of bpo-10466, which contains a fair amount of discussion of the underlying problems.

@fomclyahoocom
Copy link
Mannequin Author

fomclyahoocom mannequin commented Feb 13, 2015

I agree that the two issues are related, but I don't see how they could be duplicates. But maybe that's because I do not know the underlying code.

bpo-10466 is mostly about getdefaultlocale() and whether it's desirable or not that its return value is always uniq-esque, including on windows. The failed call to locale.py*) as a script would demonstrate that the getdefaultlocale() return value ought to be platform-specific and ready for consumption by setlocale(). That's how I read that issue. I personally find it useful to have getdefaultlocale() --a nice, harmonized locale string.

With getlocale in Windows, however, the return value is sometimes unix-like, sometimes windows-specific. Until a couple of days ago I thought getlocale was entirely platform-specific. Why should locale.setlocale(locale.LC_ALL, ".".join(locale.getlocale())) succeed on my Dutch windows system, but fail on my neighbour's German windows system?

In my humble opinion:
-setlocale should return nothing. It's a setter
-getlocale should return a platform-specific locale specification, probably what is currently returned by setlocale. The output should be ready for consumption by setlocale.
-getdefaultlocale should ALWAYS return a harmonized/unix-like locale specification. In Unix, but not in Windows, it could be used as an argument for setlocale.

My two cents.

Best wishes,
Albert-Jan

*) which also fails on Python 2.7 and 3.4 on my Dutch Windows 7 64, btw.

@bitdancer
Copy link
Member

Sorry, when I said "effectively a duplicate" I didn't mean *actually* a duplicate, I meant that fixing one will either result in or require fixing the other (same core cause: the disconnect between the Windows names and the unix names and the need for a *consistent* mapping between them).

But, I didn't fully reread that issue or the docs, so maybe I'm wrong about that.

@eryksun
Copy link
Contributor

eryksun commented Feb 14, 2015

-setlocale should return nothing. It's a setter
-getlocale should return a platform-specific locale specification,
probably what is currently returned by setlocale. The output
should be ready for consumption by setlocale.

These functions are well documented, so it's pointless to talk about major changes to the API. Per the docs, getlocale should return an RFC 1766 language code. If you want the platform result, use something like the following:

    def getrawlocale(category=locale.LC_CTYPE):
        return locale.setlocale(category)
    >>> locale.setlocale(locale.LC_CTYPE, 'eng')   
    'English_United Kingdom.1252'
    >>> getrawlocale()                          
    'English_United Kingdom.1252'

    >>> # the new CRT supports RFC1766
    ... locale.setlocale(locale.LC_CTYPE, 'en-GB')
    'en-GB'
    >>> getrawlocale()                            
    'en-GB'

As I mentioned in bpo-20088, the locale_alias dict is based on X11's locale.alias file. It doesn't handle most Windows locale strings of the form language_country.codepage.

On Windows, the _locale extension module could enumerate the system locales at startup to build a mapping. Here's a rough prototype using ctypes (requires Vista or later for the new locale functions):

    import locale
    from ctypes import *
    from ctypes.wintypes import *

    LOCALE_WINDOWS = 1
    LOCALE_SENGLISHLANGUAGENAME = 0x1001
    LOCALE_SENGLISHCOUNTRYNAME = 0x1002
    LOCALE_IDEFAULTANSICODEPAGE = 0x1004
    LCTYPES = (LOCALE_SENGLISHLANGUAGENAME,
               LOCALE_SENGLISHCOUNTRYNAME,
               LOCALE_IDEFAULTANSICODEPAGE)

    kernel32 = WinDLL('kernel32')
    EnumSystemLocalesEx = kernel32.EnumSystemLocalesEx
    GetLocaleInfoEx = kernel32.GetLocaleInfoEx

    EnumLocalesProcEx = WINFUNCTYPE(BOOL, LPWSTR, DWORD, LPARAM)

    def enum_system_locales():
        alias = {}
        codepage = {}
        info = (WCHAR * 100)()
    
        @EnumLocalesProcEx
        def callback(locale, flags, param):
            if '-' not in locale:
                return True
            parts = []
            for lctype in LCTYPES:
                if not GetLocaleInfoEx(locale, 
                                       lctype, 
                                       info, len(info)):
                    raise WinError()
                parts.append(info.value)
            lang, ctry, code = parts
            if lang and ctry and code != '0':
                locale = locale.replace('-', '_')
                full = '{}_{}'.format(lang, ctry)
                alias[full] = locale
                codepage[locale] = 'cp' + code
            return True
        
        if not EnumSystemLocalesEx(callback, 
                                   LOCALE_WINDOWS, 
                                   None, None):
            raise WinError()
        return alias, codepage
>>> alias, codepage = enum_system_locales()
    >>> alias["English_United Kingdom"]
    'en_GB'
    >>> codepage['en_GB']              
    'cp1252'
    >>> alias["Spanish_United States"] 
    'es_US'
    >>> codepage['es_US']             
    'cp1252'
    >>> alias["Russian_Russia"]
    'ru_RU'
    >>> codepage['ru_RU']
    'cp1251'
    >>> alias["Chinese (Simplified)_People's Republic of China"]
    'zh_CN'
    >>> codepage['zh_CN']
    'cp936'

@fomclyahoocom
Copy link
Mannequin Author

fomclyahoocom mannequin commented Feb 17, 2015

Hi,

Thanks for your replies. Eryksun (nice to meet you here too!), your function seems very useful, thank you very much. I had indeed already switched to your 'getrawlocale' approach.

Perhaps off-topic (because I have never seen this happen in Windows), but locale.getlocale() sometimes returns (None, None), *even if* locale.setlocale(locale.LC_ALL, "") has been called at the start of the program. For some reason, LANG, LC_ALL and possible other vars are sometimes not set correctly (I know this is not Python's fault, but...). Would it be a good idea to have a 'failsafe' parameter in getlocale? Something like:

def safe_getlocale(failsafe=False):
    current_locale = locale.getlocale()
    if failsafe and current_locale[0] is None and not sys.platform.startswith("win"):
        os.environ["LANG"] = "en_US.UTF-8"
        os.environ["LC_ALL"] = "en_US.UTF-8"
        current_locale = locale.getlocale()
     return current_locale

(sorry for squeezing this in the current issue!)

Albert-Jan

@eryksun eryksun added OS-windows 3.9 only security fixes 3.10 only security fixes 3.8 only security fixes labels Feb 26, 2021
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.8 only security fixes 3.9 only security fixes 3.10 only security fixes OS-windows stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
Status: No status
Development

No branches or pull requests

2 participants