classification
Title: locale.getlocale() output fails as setlocale() input
Type: behavior Stage: needs patch
Components: Library (Lib) Versions: Python 3.1, Python 3.2, Python 2.7
process
Status: closed Resolution: works for me
Dependencies: Superseder:
Assigned To: Nosy List: ber, iszegedi, lemburg, r.david.murray
Priority: normal Keywords:

Created on 2007-04-13 10:26 by ber, last changed 2010-11-21 18:46 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
test_locale.py ber, 2007-04-13 10:26 test case for the locale module
Messages (9)
msg31770 - (view) Author: Bernhard Reiter (ber) (Python committer) Date: 2007-04-13 10:26
This problem report about the locale module
consists of three closely related parts 
(this is why I have decided to put it in one report).
a) the example in the docs is wrong / missleading
b) under some locale settings python as a defect
c) a test case for the locale module, showing b)
   but useful as general start for a test module.

Details:
        a)
        Section example:
                The line
                >>> loc = locale.getlocale(locale.LC_ALL) # get current locale
                contradicts that getlocale should not be called with
                        LC_ALL, as stated in the description of getlocale.
                Suggestion is to change the example to be more useful
                as getting the locale as first action is not really useful.
                It should be "C" anyway which will lead to (None, None)
                so the value is already known. It would make more sense to

                first set the default locale to the user preferences:
import locale
locale.setlocale(locale.LC_ALL,'')
loc = locale.getlocale(locale.LC_NUMERIC)
locale.setlocale(locale.LC_NUMERIC,"C")
# convert a string here
locale.setlocale(locale.LC_NUMERIC, loc)

                _but_ this does not work, see problem b).
                What does work is:

import
locale.setlocale(locale.LC_ALL,'')
loc = locale.setlocale(locale.LC_NUMERIC)
locale.setlocale(locale.LC_NUMERIC,"C")
# convert a string here
locale.setlocale(locale.LC_NUMERIC, loc)

Note that all_loc = locale.setlocale(locale.LC_ALL) might contain
several categories (see attached test_locale.py where I needed to decode
this).
'LC_CTYPE=de_DE.UTF-8;LC_NUMERIC=en_GB.utf8;LC_TIME=de_DE.UTF-8;LC_COLLATE=de_DE.UTF-8;LC_MONETARY=de_DE.UTF-8;LC_MESSAGES=de_DE.UTF-8;LC_PAPER=de_DE.UTF-8;LC_NAME=de_DE.UTF-8;LC_ADDRESS=de_DE.UTF-8;LC_TELEPHONE=de_DE.UTF-8;LC_MEASUREMENT=de_DE.UTF-8;LC_IDENTIFICATION=de_DE.UTF-8'


        b)
                The output of getlocale cannot be used as input to
                setlocale sometimes.
                Works with
                * python2.5 und python2.4 on
                  Debian GNU/Linux Etch ppc, de_DE.utf8.

                I had failures with
                * python2.3, python2.4, python2.5
                  on Debian GNU/Linux Sarge ppc, de_DE@euro
                * Windows XP SP2
                        python-2.4.4.msi    German, see:

                >>> import locale
                >>> result = locale.setlocale(locale.LC_NUMERIC,"")
                >>> print result
                German_Germany.1252
                >>> got = locale.getlocale(locale.LC_NUMERIC)
                >>> print got
                ('de_DE', '1252')
                >>> # works
                ... locale.setlocale(locale.LC_NUMERIC, result)
                'German_Germany.1252'
                >>> # fails
                ... locale.setlocale(locale.LC_NUMERIC, got)
                Traceback (most recent call last):
                  File "<stdin>", line 2, in ?
                  File "C:\Python24\lib\locale.py", line 381, in setlocale
                    return _setlocale(category, locale)
                locale.Error: unsupported locale setting
                >>>

msg31771 - (view) Author: Istvan Szegedi (iszegedi) Date: 2007-04-18 10:05
I could reproduce the problem on Fedora Core 5 with Python 2.4.3.

After tracing down the issue, I found the following:

The problem is in locate.py. There is a function called normalize defined in the locate.py file. This function is invoked by setlocale function if the incoming locale argument is not a string. (in your example this condition is true because locale.getlocale function returns a tuple so got variable is a tuple.) The normalize function is using an encoding_alias table which results to translate the full locale into an incorrect version. What happens in my environment is that there is an incoming value en_us.utf-8  which is converted to en_us.utf and that is the return value from normalize function. Then _setlocale low level function invoked in setlocale function throws an exception when it receives en_us.utf  argument and it is an unsupported locale setting.


This is the original code snippet in locale.py where it is converted in a wrong way in normalize function:


    # Second try: langname (without encoding)
    code = locale_alias.get(langname, None)
    if code is not None:
        if '.' in code:
            langname, defenc = code.split('.')
        else:
            langname = code
            defenc = ''
        if encoding:
            encoding = encoding_alias.get(encoding, encoding)
        else:
            encoding = defenc
        if encoding:
            return langname + '.' + encoding
        else:
            return langname

    else:
        return localename


To get it fixed, I modified the code in locate.py as follows:


    # Second try: langname (without encoding)
    code = locale_alias.get(langname, None)
    if code is not None:
        if '.' in code:
            langname, defenc = code.split('.')
        else:
            langname = code
            defenc = ''
#        if encoding:
#            encoding = encoding_alias.get(encoding, encoding)
#        else:
#            encoding = defenc
	if encoding is None:
	     encoding = defenc
        if encoding:
            return langname + '.' + encoding
        else:
            return langname

    else:
        return localename


So the effect of encoding_table is skipped. Then your test_locale.py returns OK.
msg31772 - (view) Author: Istvan Szegedi (iszegedi) Date: 2007-04-19 08:24
I could reproduce the problem on Fedora Core 5 with Python 2.4.3.

After tracing down the issue, I found the following:

The problem is in locate.py. There is a function called normalize defined in the locate.py file. This function is invoked by setlocale function if the incoming locale argument is not a string. (in your example this condition is true because locale.getlocale function returns a tuple so got variable is a tuple.) The normalize function is using an encoding_alias table which results to translate the full locale into an incorrect version. What happens in my environment is that there is an incoming value en_us.utf-8  which is converted to en_us.utf and that is the return value from normalize function. Then _setlocale low level function invoked in setlocale function throws an exception when it receives en_us.utf  argument and it is an unsupported locale setting.


This is the original code snippet in locale.py where it is converted in a wrong way in normalize function:


    # Second try: langname (without encoding)
    code = locale_alias.get(langname, None)
    if code is not None:
        if '.' in code:
            langname, defenc = code.split('.')
        else:
            langname = code
            defenc = ''
        if encoding:
            encoding = encoding_alias.get(encoding, encoding)
        else:
            encoding = defenc
        if encoding:
            return langname + '.' + encoding
        else:
            return langname

    else:
        return localename


To get it fixed, I modified the code in locate.py as follows:


    # Second try: langname (without encoding)
    code = locale_alias.get(langname, None)
    if code is not None:
        if '.' in code:
            langname, defenc = code.split('.')
        else:
            langname = code
            defenc = ''
#        if encoding:
#            encoding = encoding_alias.get(encoding, encoding)
#        else:
#            encoding = defenc
	if encoding is None:
	     encoding = defenc
        if encoding:
            return langname + '.' + encoding
        else:
            return langname

    else:
        return localename


So the effect of encoding_table is skipped. Then your test_locale.py returns OK.
msg31773 - (view) Author: Bernhard Reiter (ber) (Python committer) Date: 2007-04-19 08:55
Istvan,
thanks for looking into this and adding information.
I do not feel competent to judge what the solution would be,
as I do not know the design goals of getlocale().
Given the documentation the function call would only make sense
if setlocale(getlocale(LC_XYZ)) would work in all cases, especially
after the locale has been set to the user preferances with setlocale(LC_ALL,"")
There is no simple test case that can make sure this is the case.

The workaround for current code for me is to use setlocale(LC_XYZ) only
to ask for the currently set locale and then decipher the string if the categories have different settings. This workaround can be seen in my
proposed test_case.py. 

I believe next steps could be to get a full overview and check design and
implementation, add some testcases so that more is covered and then fix 
the implementation. We could try to find out who invented getlocale
and ask.
msg31774 - (view) Author: Istvan Szegedi (iszegedi) Date: 2007-04-19 10:18
Hi Bernhard,

I absolutely agree with you and I cannot  really judge my correction, either. It was just a quick and dirty solution to see if it would fix the problem. In fact, there are other ways to do it as well, like  to modify the 
encoding_alias table not to translate utf-8 string into utf (and thus to prevent it to produce an invalid locale setting for _setlocale )

In the locale.py file I found two names mentioned:

Author:  Mark-Andre Lemburg (mal@lemburg.com) 
and Fredrick Lund  (fredrick@pythonware.com) as a modifyier

so it might be a good idea to drop them a mail and ask for their comments. Do you want to do it or shall I? If you are willing to do it, please, keep me in the loop.
msg31775 - (view) Author: Bernhard Reiter (ber) (Python committer) Date: 2007-04-19 10:23
Feel free to drop them an email, this is a good idea.
Maybe "svn blame" or history inspection produces more names that actually wrote
the code and the documentation.
msg31776 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2007-05-02 12:03
I wrote that code, so let me comment on it:

The setlocale() function returns the setting that was previously active (see the setlocale (3) man-page). 

Unfortunately, there's no clear standard on the way locales are named. The man-page says:

"""
A locale name is typically of the form language[_territory][.codeset][@modifier], where language is an ISO 639 language code, territory is an ISO 3166 country code, and codeset is a character  set
       or encoding identifier like ISO-8859-1 or UTF-8.  For a list of all supported locales, try "locale -a", cf. locale(1).
"""

"Germany_Germany" is clearly not a locale name that fits the above scheme.

If I do "locale -a" on my box, the "Germany_Germany" is not mentioned in the resulting list, so there's no surprise that  the function call generates an error.

Note that you can set the locale without using setlocale(): all that is needed is an environment variable and that is, of course, not subject to any checks by setlocal().

I'd suggest to close this bug report as invalid.

Thanks.
msg31777 - (view) Author: Bernhard Reiter (ber) (Python committer) Date: 2007-05-02 13:25
Marc-Andre,

thanks for your comment!

Note that setlocale() can also be used to query the current locale,
according to my manpage on a Debian GNU/Linux system and also
according to IEEE Std 1003.1, 2004 (POSIX).

This problem report is not invalid in both point.
You cannot deny a), the inconsistency having code that is not allowed a few sections before in the description is appared.
b) also is a problem that occurss on a freshly installed Python
on a freshly installed German Windows version.
Same on Debian Sarge, depending on the default locale.
So if I use the functions according to the documentation,
this will just break in the real world which is not robust.

I know that it is probably hard to make this more robust,
but it is possible depeding on how the interface of this module
should look like. 

Note that in your remark you believe that Germany_Germany 
call fails,
but this is the one that succeeds because it is the locale
this Python version on windows has set and which setlocale
has returned. It is getlocale() that returns a value which
cannot be used for setlocale() again, despite the documentation.
This makes getlocale() useless for the purpose of getting a locale
setting that could later be reused as input for setlocale.
As setlocale() can do this as well, getlocale() seems to be superfluous.

Best Regards,
Bernhard
msg121960 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-11-21 18:46
In investigating issue 10466 I find that getlocale on windows returns the value that windows accepts for me.  For example on my US windows system, getlocale returns ('English_United States', '1252'), and that appears to work when passed to setlocale.  So I'm closing this bug as works for me, since I can't reproduce it.  (Tested on 3.2a3 and 2.6.5).

Issue 10466 turns on the fact that getdefaultlocale() does *not* return something that windows can consume, though.
History
Date User Action Args
2010-11-21 18:46:59r.david.murraysetstatus: open -> closed

nosy: + r.david.murray
messages: + msg121960

resolution: works for me
2010-08-21 12:53:53BreamoreBoysetversions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.6
2009-03-30 23:38:08ajaksu2settitle: locale.getlocale() output fails as setlocale() input -> locale.getlocale() output fails as setlocale() input
stage: needs patch
type: behavior
versions: + Python 2.6, - Python 2.5
2007-04-13 10:26:11bercreate