Issue1443504
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2006-03-05 13:50 by catherinedevlin, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
patches-2.5.1-Linux.diff | heikki, 2007-11-29 23:22 | |||
locale.diff | asmodai, 2009-05-03 09:04 | Module/_localemodule.c patch to fix invalid locale semantics |
Messages (23) | |||
---|---|---|---|
msg27684 - (view) | Author: Catherine Devlin (catherinedevlin) * | Date: 2006-03-05 13:50 | |
I'm on Ubuntu 5.10, with Python 2.4.2-0ubuntu2, and when I open a terminal window and run python, I get >>> import locale >>> locale.getpreferredencoding() Traceback (most recent call last): File "<stdin>", line 1, in ? File "/usr/lib/python2.4/locale.py", line 417, in getpreferredencoding setlocale(LC_CTYPE, "") File "/usr/lib/python2.4/locale.py", line 381, in setlocale return _setlocale(category, locale) locale.Error: unsupported locale setting However, if I su - root - or even su right back to my own account (catherine) ! - then everything works. This is of concern (to me, anyway) because this error crashes bzr. I chose "Esperanto" as my language when setting up Ubuntu. (No, I wasn't trying to be funny - I really do speak Esperanto!) That may be why I found the problem, but I don't think this is simply a problem with flawed Esperanto support in Ubuntu - because the routine works after su is used, and because locale.nl_langinfo(CODESET) works fine (please read on). Anyway, within locale.getpreferredencoding(), line 417 - setlocale(LC_CTYPE, "") - seems to be the problem... >>> locale.setlocale(locale.LC_CTYPE) 'C' >>> locale.setlocale(locale.LC_CTYPE, "") Traceback (most recent call last): File "<stdin>", line 1, in ? File "/usr/lib/python2.4/locale.py", line 381, in setlocale return _setlocale(category, locale) locale.Error: unsupported locale setting >>> locale.setlocale(locale.LC_CTYPE, None) 'C' This makes me wonder if setlocale(LC_TYPE, "") is really so very necessary. It seems to be there to prep for the nl_langinfo call, but it doesn't actually seem strictly necessary for that call to work. >>> locale.nl_langinfo(locale.CODESET) 'ANSI_X3.4-1968' ... I get that result whether before or after calling setlocale, and I get it under any account (including root, where setlocale does not raise an exception). Thus, as far as I can tell, it isn't really necessary to set setlocale(LC_CTYPE, "") or die trying, and accepting the nl_langinfo result without a successful setlocale(LC_CTYPE, "") would be preferable to raising an unhandled exception. I suggest that setlocale(LC_TYPE, "") be enclosed in a try block. try: setlocale(LC_CTYPE, "") except: None Since I don't really understand what it's doing in the first place, I don't know if this is really a good patch. Thanks! |
|||
msg27685 - (view) | Author: jminka (jminka) | Date: 2006-03-17 20:27 | |
Logged In: YES user_id=1116964 I've got the same problem with bzr on Gentoo. If LANG or LC_ALL consists '/', then bzr has the problem (e.g. en_US is ok, en_US/ISO8859-1 is wrong). |
|||
msg57964 - (view) | Author: Heikki Toivonen (heikki) | Date: 2007-11-29 23:22 | |
We noticed this too in Chandler. We worked around this issue with the patch I am attaching. Maybe not a correct fix, though. |
|||
msg86856 - (view) | Author: Jeroen Ruigrok van der Werven (asmodai) * | Date: 2009-04-30 19:44 | |
Shouldn't the fallback be to setlocale(LC_CTYPE, "C") instead of silently passing, though? |
|||
msg86857 - (view) | Author: Jeroen Ruigrok van der Werven (asmodai) * | Date: 2009-04-30 20:22 | |
You don't want to completely nix the setlocale(LC_CTYPE, "") call though. The "" denotes to grab the native environment, in other words, to grab whatever the current user's LC_CTYPE environment variable is set to (see `locale -a`) and then set the program's LC_CTYPE to that. Of course, this might be set to something that might be valid (e.g. cy_GB.ISO8859-15), but has no matching entry in /usr/share/locale (or wherever your system provides it) and as such it fails. Reading SUS (The Single Unix Specification) I see that it explicitly says: "Upon successful completion, setlocale() shall return the string associated with the specified category for the new locale. Otherwise, setlocale() shall return a null pointer and the locale of the process is not changed." So the patch seems to be correct actually. We try to setlocale(LC_CTYPE, "") to grab a locale from the environment to set LC_CTYPE, but we fail for whatever, so we should just pass since we should not adjust LC_CTYPE. Mmm, but it seems setlocale() in locale.py is not adhering to the standard by not allowing the "" case properly. _parse_localename() is being overly pedantic about this by raising a ValueError. |
|||
msg86897 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2009-05-01 19:54 | |
The patch looks fine to me. |
|||
msg86900 - (view) | Author: Jeroen Ruigrok van der Werven (asmodai) * | Date: 2009-05-01 20:04 | |
OK, then I'll apply it. But I am curious about your thoughts about the _parse_localename() method being called from setlocale() raising a ValueError, whereas a setlocale(LC_CTYPE, "") should not fail at all, which it currently does if the locale in the environment is not valid. |
|||
msg86905 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2009-05-01 20:24 | |
> But I am curious about your thoughts about the _parse_localename() > method being called from setlocale() raising a ValueError, whereas a > setlocale(LC_CTYPE, "") should not fail at all, which it currently does > if the locale in the environment is not valid. I fail to see how this is related to this issue. In the OP's report, the exception was locale.Error, not ValueError, and _parse_localename isn't ever being called from setlocale() - why do you think it is being called? AFAICT, the only callers of _parse_localename are getlocale and getdefaultlocale (which, IMO, should both be deprecated). |
|||
msg86909 - (view) | Author: Jeroen Ruigrok van der Werven (asmodai) * | Date: 2009-05-01 20:58 | |
Sorry, I was actually off by a method last night. It turns out the problem lies in _localemodule.c. Let me start with the basic question: is our setlocale() supposed to mirror POSIX' operations/semantics? |
|||
msg86911 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2009-05-01 21:11 | |
> Let me start with the basic question: is our setlocale() supposed to > mirror POSIX' operations/semantics? Yes, it is. |
|||
msg86983 - (view) | Author: Jeroen Ruigrok van der Werven (asmodai) * | Date: 2009-05-02 21:48 | |
I will first point out where our current implementation is broken, in my opinion of course, after which I propose a small patch. Both C90 (7.4.1.1) and C99 (7.11.1.1) state: "A value of "C" for locale specifies the minimal environment for C translation; a value of "" for locale specifies the locale-specific native environment. Other implementation-defined strings may be passed as the second argument to setlocale. [...] If a pointer to a string is given for locale and the selection can be honored, the setlocale function returns a pointer to the string associated with the specified category for the new locale. If the selection cannot be honored, the setlocale function returns a null pointer and the program’s locale is not changed." Note that neither C or POSIX defines any errors or sets errno or the likes. It simply returns a null pointer. In C you would typically start your program with a call like: #include <locale.h> int main(int argc, char *argv[]) { setlocale(LC_CTYPE, ""); ... } This will try to set the locale to what the native environment specifies, but will not error out if the value, if any, it receives does not map to a valid locale. It will return a null pointer if it cannot set the locale. Execution continues and the locale is set to the default "C". Our current behaviour in Python does not adhere to these semantics. To illustrate: # Obvious non-existing locale >>> from locale import setlocale, LC_CTYPE >>> setlocale(LC_CTYPE, 'B') Error: unsupported locale setting # Valid locale, but not available on my system >>> from os import getenv >>> from locale import setlocale, LC_CTYPE >>> getenv('LANG') >>> 'cy_GB.UTF-8' >>> setlocale(LC_CTYPE, '') Error: unsupported locale setting Neither Perl or PHP throw any error when setlocale() is passed an invalid locale. Python is being unnecessarily disruptive by throwing an error. As such I think PyLocale_setlocale() in Modules/_localemodule.c needs to be adjusted. Patch against trunk enclosed. This changes the semantics of our current implementation to the following: >>> from locale import setlocale, LC_CTYPE >>> rv = setlocale(LC_CTYPE, 'B') >>> type(rv) <class 'NoneType'> >>> rv = setlocale(LC_CTYPE, 'C') >>> type(rv) <class 'str'> >>> rv 'C' |
|||
msg86985 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2009-05-02 22:07 | |
> If a pointer to a string is given for locale and the selection can be > honored, the setlocale function returns a pointer to the string > associated with the specified category for the new locale. If the > selection cannot be honored, the setlocale function returns a null > pointer and the program’s locale is not changed." > > Note that neither C or POSIX defines any errors or sets errno or the > likes. It simply returns a null pointer. Still, this is considered as an error case. > #include <locale.h> > > int main(int argc, char *argv[]) { > setlocale(LC_CTYPE, ""); > > ... > } > > This will try to set the locale to what the native environment > specifies, but will not error out if the value Yes, but that's a bug in the C code, which fails to check the return value of setlocale. The fact that the bug is wide-spread doesn't make it any better. > As such I think PyLocale_setlocale() in Modules/_localemodule.c needs to > be adjusted -1. Errors should never pass silently. That's the whole point of exceptions. |
|||
msg87036 - (view) | Author: Georg Brandl (georg.brandl) * | Date: 2009-05-03 08:49 | |
Interestingly, my setlocale(3p) man page says: """ ERRORS No errors are defined. """ So isn't it debatable if returning the NULL pointer really is an error? |
|||
msg87037 - (view) | Author: Jeroen Ruigrok van der Werven (asmodai) * | Date: 2009-05-03 08:55 | |
I asked that as well on the POSIX/SUS list and Don Cragun responded with: "If you make the last argument to setlocale() be a pointer to unallocated memory, implementations would be allowed to set errno to EFAULT and terminate the process with a core dump even when this section says "No errors are defined." An implementation could also set errno to ENOENT (e.g., if the "B" locale wasn't known) or to EINVAL (e.g., if the "B" locale existed but the LC_CTYPE portion of the locale was not in the proper format). That wording just means that the standard doesn't require implementations to detect errors like these nor to report specific error values for different possible errors." On the subject whether or not returning a null pointer should be considered he said: "The standard is silent on this issue. Why does it make any difference to an application? If setlocale(LC_CTYPE, "B") returns a null pointer, the LC_CTYPE portion of the locale was not changed. If setlocale(LC_CTYPE, "B") does not return a null pointer, the LC_CTYPE portion of the locale was successfully changed." I am just wondering why we want to be quite different from how many other languages are approaching the issue. Sure enough, we can use a try: construct, but it kind of defeats the principle of least astonishment by being different from the rest on this issue. |
|||
msg87038 - (view) | Author: Jeroen Ruigrok van der Werven (asmodai) * | Date: 2009-05-03 08:56 | |
On the subject whether or not returning a null pointer should be considered he said: -> On the subject whether or not returning a null pointer should be considered an error he said: |
|||
msg87039 - (view) | Author: Jeroen Ruigrok van der Werven (asmodai) * | Date: 2009-05-03 09:00 | |
Georg pointed out a mistake I introduced in my patch, updated now. |
|||
msg87040 - (view) | Author: Jeroen Ruigrok van der Werven (asmodai) * | Date: 2009-05-03 09:04 | |
Really correct this time. |
|||
msg87051 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2009-05-03 17:16 | |
> """ > ERRORS > No errors are defined. > """ > > So isn't it debatable if returning the NULL pointer really is an error? As Jeroen reports, this really means two different things a) "no errors" really means "no errno codes". Whether or not an error may occur is an independent issue. b) "are defined" really means that POSIX doesn't define any standard errno codes; the system may indeed still set errno (C99, 7.5p3) |
|||
msg87052 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2009-05-03 17:20 | |
> I am just wondering why we want to be quite different from how many > other languages are approaching the issue. Because we have exceptions, and they don't. Would you also propose that open() should return None, just because fopen(3) returns NULL? While it may be debatable whether applications care about the error when passing "" as the locale, there is also the second case where applications pass an explicit locale setlocale(locale.LC_ALL, "de_DE@euro") When they do that, they surely want to be told if this actually worked. > Sure enough, we can use a > try: construct, but it kind of defeats the principle of least > astonishment by being different from the rest on this issue. There is also the backwards compatibility issue: your change will break existing code. |
|||
msg87079 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2009-05-03 22:21 | |
On Sun, 3 May 2009 at 08:55, Jeroen Ruigrok van der Werven wrote: > I am just wondering why we want to be quite different from how many > other languages are approaching the issue. Sure enough, we can use a > try: construct, but it kind of defeats the principle of least > astonishment by being different from the rest on this issue. Only if you imagine that the principal applies to expectations inherited from other languages. In a Python context, which is what the principle actually refers to, it would be astonishing if the error were to be silently ignored. |
|||
msg87308 - (view) | Author: Jeroen Ruigrok van der Werven (asmodai) * | Date: 2009-05-06 05:34 | |
Committed the initial patch in r72375 for trunk and r72376 for py3k. Any other branches that would need the merge? 3.0? |
|||
msg87315 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2009-05-06 07:39 | |
It looks like a bug fix to me - so it would apply to all four active branches. |
|||
msg87321 - (view) | Author: Jeroen Ruigrok van der Werven (asmodai) * | Date: 2009-05-06 08:27 | |
Committed in r72381 and r72395. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:15 | admin | set | github: 42982 |
2009-05-06 08:27:57 | asmodai | set | status: open -> closed messages: + msg87321 |
2009-05-06 07:39:05 | loewis | set | status: pending -> open messages: + msg87315 |
2009-05-06 05:34:50 | asmodai | set | status: open -> pending resolution: accepted messages: + msg87308 stage: test needed -> resolved |
2009-05-03 22:21:28 | r.david.murray | set | nosy:
+ r.david.murray messages: + msg87079 |
2009-05-03 17:20:23 | loewis | set | messages: + msg87052 |
2009-05-03 17:16:09 | loewis | set | messages: + msg87051 |
2009-05-03 09:04:46 | asmodai | set | files:
+ locale.diff messages: + msg87040 |
2009-05-03 09:04:25 | asmodai | set | files: - locale.diff |
2009-05-03 09:00:47 | asmodai | set | files:
+ locale.diff messages: + msg87039 |
2009-05-03 09:00:12 | asmodai | set | files: - locale.diff |
2009-05-03 08:56:44 | asmodai | set | messages: + msg87038 |
2009-05-03 08:55:35 | asmodai | set | messages: + msg87037 |
2009-05-03 08:49:24 | georg.brandl | set | nosy:
+ georg.brandl messages: + msg87036 |
2009-05-02 22:07:55 | loewis | set | messages: + msg86985 |
2009-05-02 21:48:33 | asmodai | set | files:
+ locale.diff messages: + msg86983 |
2009-05-01 21:11:08 | loewis | set | messages: + msg86911 |
2009-05-01 20:58:51 | asmodai | set | messages: + msg86909 |
2009-05-01 20:24:44 | loewis | set | messages: + msg86905 |
2009-05-01 20:04:29 | asmodai | set | messages: + msg86900 |
2009-05-01 19:54:41 | loewis | set | nosy:
+ loewis messages: + msg86897 |
2009-04-30 20:22:20 | asmodai | set | messages: + msg86857 |
2009-04-30 19:44:32 | asmodai | set | nosy:
+ asmodai messages: + msg86856 |
2009-04-07 04:05:35 | ajaksu2 | set | keywords:
+ patch stage: test needed type: behavior versions: + Python 2.6, Python 3.0, - Python 2.5, Python 2.4 |
2007-11-29 23:22:28 | heikki | set | files:
+ patches-2.5.1-Linux.diff nosy: + heikki messages: + msg57964 versions: + Python 2.5 |
2006-03-05 13:50:20 | catherinedevlin | create |