This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: locale.getpreferredencoding() dies when setlocale fails
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.0, Python 2.6
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: asmodai, catherinedevlin, georg.brandl, heikki, jminka, loewis, r.david.murray
Priority: low Keywords: patch

Created on 2006-03-05 13:50 by catherinedevlin, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
patches-2.5.1-Linux.diff heikki, 2007-11-29 23:22
locale.diff asmodai, 2009-05-03 09:04 Module/_localemodule.c patch to fix invalid locale semantics
Messages (23)
msg27684 - (view) Author: Catherine Devlin (catherinedevlin) * Date: 2006-03-05 13:50
I'm on Ubuntu 5.10, with Python 2.4.2-0ubuntu2, and
when I open a terminal window and run python, I get

>>> import locale
>>> locale.getpreferredencoding()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/lib/python2.4/locale.py", line 417, in
getpreferredencoding
    setlocale(LC_CTYPE, "")
  File "/usr/lib/python2.4/locale.py", line 381, in
setlocale
    return _setlocale(category, locale)
locale.Error: unsupported locale setting

However, if I su - root - or even su right back to my
own account (catherine) ! - then everything works.

This is of concern (to me, anyway) because this error
crashes bzr. 

I chose "Esperanto" as my language when setting up
Ubuntu.  (No, I wasn't trying to be funny - I really do
speak Esperanto!)  That may be why I found the problem,
but I don't think this is simply a problem with flawed
Esperanto support in Ubuntu - because the routine works
after su is used, and because
locale.nl_langinfo(CODESET) works fine (please read on).

Anyway, within locale.getpreferredencoding(), line 417
- setlocale(LC_CTYPE, "") - seems to be the problem...

>>> locale.setlocale(locale.LC_CTYPE)
'C'
>>> locale.setlocale(locale.LC_CTYPE, "")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/lib/python2.4/locale.py", line 381, in
setlocale
    return _setlocale(category, locale)
locale.Error: unsupported locale setting
>>> locale.setlocale(locale.LC_CTYPE, None)
'C'

This makes me wonder if setlocale(LC_TYPE, "") is
really so very necessary.  It seems to be there to prep
for the nl_langinfo call, but it doesn't actually seem
strictly necessary for that call to work.

>>> locale.nl_langinfo(locale.CODESET)
'ANSI_X3.4-1968'

... I get that result whether before or after calling
setlocale, and I get it under any account (including
root, where setlocale does not raise an exception).

Thus, as far as I can tell, it isn't really necessary
to set setlocale(LC_CTYPE, "") or die trying, and
accepting the nl_langinfo result without a 
successful setlocale(LC_CTYPE, "") would be preferable
to raising an unhandled exception.  I suggest that
setlocale(LC_TYPE, "") be enclosed in a try block.

                try:
                    setlocale(LC_CTYPE, "")
                except:
                    None

Since I don't really understand what it's doing in the
first place, I don't know if this is really a good patch.

Thanks!
msg27685 - (view) Author: jminka (jminka) Date: 2006-03-17 20:27
Logged In: YES 
user_id=1116964

I've got the same problem with bzr on Gentoo. If LANG or
LC_ALL consists '/', then bzr has the problem (e.g. en_US is
ok, en_US/ISO8859-1 is wrong). 
msg57964 - (view) Author: Heikki Toivonen (heikki) Date: 2007-11-29 23:22
We noticed this too in Chandler. We worked around this issue with the
patch I am attaching. Maybe not a correct fix, though.
msg86856 - (view) Author: Jeroen Ruigrok van der Werven (asmodai) * (Python committer) Date: 2009-04-30 19:44
Shouldn't the fallback be to setlocale(LC_CTYPE, "C") instead of
silently passing, though?
msg86857 - (view) Author: Jeroen Ruigrok van der Werven (asmodai) * (Python committer) Date: 2009-04-30 20:22
You don't want to completely nix the setlocale(LC_CTYPE, "") call
though. The "" denotes to grab the native environment, in other words,
to grab whatever the current user's LC_CTYPE environment variable is set
to (see `locale -a`) and then set the program's LC_CTYPE to that.

Of course, this might be set to something that might be valid (e.g.
cy_GB.ISO8859-15), but has no matching entry in /usr/share/locale (or
wherever your system provides it) and as such it fails.

Reading SUS (The Single Unix Specification) I see that it explicitly says:

"Upon successful completion, setlocale() shall return the string
associated with the specified category for the new locale. Otherwise,
setlocale() shall return a null pointer and the locale of the process is
not changed."

So the patch seems to be correct actually. We try to setlocale(LC_CTYPE,
"") to grab a locale from the environment to set LC_CTYPE, but we fail
for whatever, so we should just pass since we should not adjust LC_CTYPE.

Mmm, but it seems setlocale() in locale.py is not adhering to the
standard by not allowing the "" case properly. _parse_localename() is
being overly pedantic about this by raising a ValueError.
msg86897 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-05-01 19:54
The patch looks fine to me.
msg86900 - (view) Author: Jeroen Ruigrok van der Werven (asmodai) * (Python committer) Date: 2009-05-01 20:04
OK, then I'll apply it.

But I am curious about your thoughts about the _parse_localename()
method being called from setlocale() raising a ValueError, whereas a
setlocale(LC_CTYPE, "") should not fail at all, which it currently does
if the locale in the environment is not valid.
msg86905 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-05-01 20:24
> But I am curious about your thoughts about the _parse_localename()
> method being called from setlocale() raising a ValueError, whereas a
> setlocale(LC_CTYPE, "") should not fail at all, which it currently does
> if the locale in the environment is not valid.

I fail to see how this is related to this issue. In the OP's report,
the exception was locale.Error, not ValueError, and _parse_localename
isn't ever being called from setlocale() - why do you think it is being
called? AFAICT, the only callers of _parse_localename are getlocale and
getdefaultlocale (which, IMO, should both be deprecated).
msg86909 - (view) Author: Jeroen Ruigrok van der Werven (asmodai) * (Python committer) Date: 2009-05-01 20:58
Sorry, I was actually off by a method last night.

It turns out the problem lies in _localemodule.c.

Let me start with the basic question: is our setlocale() supposed to
mirror POSIX' operations/semantics?
msg86911 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-05-01 21:11
> Let me start with the basic question: is our setlocale() supposed to
> mirror POSIX' operations/semantics?

Yes, it is.
msg86983 - (view) Author: Jeroen Ruigrok van der Werven (asmodai) * (Python committer) Date: 2009-05-02 21:48
I will first point out where our current implementation is broken, in my
opinion of course, after which I propose a small patch.

Both C90 (7.4.1.1) and C99 (7.11.1.1) state:

"A value of "C" for locale specifies the minimal environment for C
translation; a value of "" for locale specifies the locale-specific
native environment. Other implementation-defined strings may be passed
as the second argument to setlocale.

[...]

If a pointer to a string is given for locale and the selection can be
honored, the setlocale function returns a pointer to the string
associated with the specified category for the new locale. If the
selection cannot be honored, the setlocale function returns a null
pointer and the program’s locale is not changed."

Note that neither C or POSIX defines any errors or sets errno or the
likes. It simply returns a null pointer.

In C you would typically start your program with a call like:

#include <locale.h>

int main(int argc, char *argv[]) {
	setlocale(LC_CTYPE, "");

	...
}

This will try to set the locale to what the native environment
specifies, but will not error out if the value, if any, it receives does
not map to a valid locale. It will return a null pointer if it cannot
set the locale. Execution continues and the locale is set to the default
"C".

Our current behaviour in Python does not adhere to these semantics. To
illustrate:

# Obvious non-existing locale
>>> from locale import setlocale, LC_CTYPE
>>> setlocale(LC_CTYPE, 'B')
Error: unsupported locale setting

# Valid locale, but not available on my system
>>> from os import getenv
>>> from locale import setlocale, LC_CTYPE
>>> getenv('LANG')
>>> 'cy_GB.UTF-8'
>>> setlocale(LC_CTYPE, '')
Error: unsupported locale setting

Neither Perl or PHP throw any error when setlocale() is passed an
invalid locale. Python is being unnecessarily disruptive by throwing an
error.

As such I think PyLocale_setlocale() in Modules/_localemodule.c needs to
be adjusted. Patch against trunk enclosed. This changes the semantics of
our current implementation to the following:

>>> from locale import setlocale, LC_CTYPE
>>> rv = setlocale(LC_CTYPE, 'B')
>>> type(rv)
<class 'NoneType'>
>>> rv = setlocale(LC_CTYPE, 'C')
>>> type(rv)
<class 'str'>
>>> rv
'C'
msg86985 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-05-02 22:07
> If a pointer to a string is given for locale and the selection can be
> honored, the setlocale function returns a pointer to the string
> associated with the specified category for the new locale. If the
> selection cannot be honored, the setlocale function returns a null
> pointer and the program’s locale is not changed."
> 
> Note that neither C or POSIX defines any errors or sets errno or the
> likes. It simply returns a null pointer.

Still, this is considered as an error case.

> #include <locale.h>
> 
> int main(int argc, char *argv[]) {
> 	setlocale(LC_CTYPE, "");
> 
> 	...
> }
> 
> This will try to set the locale to what the native environment
> specifies, but will not error out if the value

Yes, but that's a bug in the C code, which fails to check the
return value of setlocale. The fact that the bug is wide-spread
doesn't make it any better.

> As such I think PyLocale_setlocale() in Modules/_localemodule.c needs to
> be adjusted

-1. Errors should never pass silently. That's the whole point of exceptions.
msg87036 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2009-05-03 08:49
Interestingly, my setlocale(3p) man page says:

"""
ERRORS
       No errors are defined.
"""

So isn't it debatable if returning the NULL pointer really is an error?
msg87037 - (view) Author: Jeroen Ruigrok van der Werven (asmodai) * (Python committer) Date: 2009-05-03 08:55
I asked that as well on the POSIX/SUS list and Don Cragun responded with:

"If you make the last argument to setlocale() be a pointer to
unallocated memory, implementations would be allowed to set errno to
EFAULT and terminate the process with a core dump even when this section
says "No errors are defined."  An implementation could also set errno to
ENOENT (e.g., if the "B" locale wasn't known) or to EINVAL (e.g., if the
"B" locale existed but the LC_CTYPE portion of the locale was not in the
proper format).  That wording just means that the standard doesn't
require implementations to detect errors like these nor to report
specific error values for different possible errors."

On the subject whether or not returning a null pointer should be
considered he said:

"The standard is silent on this issue.
Why does it make any difference to an application?
If setlocale(LC_CTYPE, "B") returns a null pointer, the LC_CTYPE portion
of the locale was not changed.  If setlocale(LC_CTYPE, "B") does not
return a null pointer, the LC_CTYPE portion of the locale was
successfully changed."

I am just wondering why we want to be quite different from how many
other languages are approaching the issue. Sure enough, we can use a
try: construct, but it kind of defeats the principle of least
astonishment by being different from the rest on this issue.
msg87038 - (view) Author: Jeroen Ruigrok van der Werven (asmodai) * (Python committer) Date: 2009-05-03 08:56
On the subject whether or not returning a null pointer should be
considered he said:

->

On the subject whether or not returning a null pointer should be
considered an error he said:
msg87039 - (view) Author: Jeroen Ruigrok van der Werven (asmodai) * (Python committer) Date: 2009-05-03 09:00
Georg pointed out a mistake I introduced in my patch, updated now.
msg87040 - (view) Author: Jeroen Ruigrok van der Werven (asmodai) * (Python committer) Date: 2009-05-03 09:04
Really correct this time.
msg87051 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-05-03 17:16
> """
> ERRORS
>        No errors are defined.
> """
> 
> So isn't it debatable if returning the NULL pointer really is an error?

As Jeroen reports, this really means two different things
a) "no errors" really means "no errno codes". Whether or not
   an error may occur is an independent issue.
b) "are defined" really means that POSIX doesn't define any
   standard errno codes; the system may indeed still set errno
   (C99, 7.5p3)
msg87052 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-05-03 17:20
> I am just wondering why we want to be quite different from how many
> other languages are approaching the issue.

Because we have exceptions, and they don't. Would you also propose
that open() should return None, just because fopen(3) returns NULL?

While it may be debatable whether applications care about the error
when passing "" as the locale, there is also the second case where
applications pass an explicit locale

  setlocale(locale.LC_ALL, "de_DE@euro")

When they do that, they surely want to be told if this actually
worked.

> Sure enough, we can use a
> try: construct, but it kind of defeats the principle of least
> astonishment by being different from the rest on this issue.

There is also the backwards compatibility issue: your change
will break existing code.
msg87079 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-05-03 22:21
On Sun, 3 May 2009 at 08:55, Jeroen Ruigrok van der Werven wrote:
> I am just wondering why we want to be quite different from how many
> other languages are approaching the issue. Sure enough, we can use a
> try: construct, but it kind of defeats the principle of least
> astonishment by being different from the rest on this issue.

Only if you imagine that the principal applies to expectations inherited
from other languages.  In a Python context, which is what the principle
actually refers to, it would be astonishing if the error were to be
silently ignored.
msg87308 - (view) Author: Jeroen Ruigrok van der Werven (asmodai) * (Python committer) Date: 2009-05-06 05:34
Committed the initial patch in r72375 for trunk and r72376 for py3k.

Any other branches that would need the merge? 3.0?
msg87315 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-05-06 07:39
It looks like a bug fix to me - so it would apply to all four active
branches.
msg87321 - (view) Author: Jeroen Ruigrok van der Werven (asmodai) * (Python committer) Date: 2009-05-06 08:27
Committed in r72381 and r72395.
History
Date User Action Args
2022-04-11 14:56:15adminsetgithub: 42982
2009-05-06 08:27:57asmodaisetstatus: open -> closed

messages: + msg87321
2009-05-06 07:39:05loewissetstatus: pending -> open

messages: + msg87315
2009-05-06 05:34:50asmodaisetstatus: open -> pending
resolution: accepted
messages: + msg87308

stage: test needed -> resolved
2009-05-03 22:21:28r.david.murraysetnosy: + r.david.murray
messages: + msg87079
2009-05-03 17:20:23loewissetmessages: + msg87052
2009-05-03 17:16:09loewissetmessages: + msg87051
2009-05-03 09:04:46asmodaisetfiles: + locale.diff

messages: + msg87040
2009-05-03 09:04:25asmodaisetfiles: - locale.diff
2009-05-03 09:00:47asmodaisetfiles: + locale.diff

messages: + msg87039
2009-05-03 09:00:12asmodaisetfiles: - locale.diff
2009-05-03 08:56:44asmodaisetmessages: + msg87038
2009-05-03 08:55:35asmodaisetmessages: + msg87037
2009-05-03 08:49:24georg.brandlsetnosy: + georg.brandl
messages: + msg87036
2009-05-02 22:07:55loewissetmessages: + msg86985
2009-05-02 21:48:33asmodaisetfiles: + locale.diff

messages: + msg86983
2009-05-01 21:11:08loewissetmessages: + msg86911
2009-05-01 20:58:51asmodaisetmessages: + msg86909
2009-05-01 20:24:44loewissetmessages: + msg86905
2009-05-01 20:04:29asmodaisetmessages: + msg86900
2009-05-01 19:54:41loewissetnosy: + loewis
messages: + msg86897
2009-04-30 20:22:20asmodaisetmessages: + msg86857
2009-04-30 19:44:32asmodaisetnosy: + asmodai
messages: + msg86856
2009-04-07 04:05:35ajaksu2setkeywords: + patch
stage: test needed
type: behavior
versions: + Python 2.6, Python 3.0, - Python 2.5, Python 2.4
2007-11-29 23:22:28heikkisetfiles: + patches-2.5.1-Linux.diff
nosy: + heikki
messages: + msg57964
versions: + Python 2.5
2006-03-05 13:50:20catherinedevlincreate