Issue6393
Created on 2009-07-01 10:51 by mark.dickinson, last changed 2009-09-27 15:58 by slavi.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | Remove |
| issue6393-fix.patch | ronaldoussoren, 2009-07-09 08:31 | |||
| Messages (19) | |||
|---|---|---|---|
| msg89972 - (view) | Author: Mark Dickinson (mark.dickinson) | Date: 2009-07-01 10:51 | |
There was a report[1] on c.l.p. that python3 from the OS X Python 3.1 dmg download at www.python.org/download/releases/3.1/ crashes on startup. I can reproduce this with the python.org download (using the OS X Terminal) only with a bad locale setting: newton:~ dickinsm$ LANG=utf-8 python3 Fatal Python error: Py_Initialize: can't initialize sys standard streams LookupError: unknown encoding: Abort trap (core dumped) The core dump isn't useful: just lots of 'No symbol table info available.' This is on OS X 10.5.7/Intel. I can't reproduce it with either the py3k branch or the release31-maint branch, built from scratch. I suspect that this has to do with the behaviour of nl_langinfo(CODESET) on OS X: namely, after doing (in C) setlocale(LC_CTYPE, ""), the result of nl_langinfo(CODESET) appears to be "UTF-8" for well-defined utf-8 locales (e.g., 'en_US.UTF-8'), "US-ASCII" for meaningless locales (e.g., 'invalid'), but one just gets "" for locales like 'utf-8' or 'en_US'. This in turn affects Python's locale.getpreferredencoding function. See also issue 2173, which may be related. Ronald, any ideas? [1] http://mail.python.org/pipermail/python-list/2009-June/718255.html |
|||
| msg90285 - (view) | Author: Ned Deily (ned.deily) | Date: 2009-07-08 21:58 | |
This is a side effect of the fix for Issue6202. Prior to r73268, locale.getpreferredencoding always returned "mac-roman" regardless of the setting of LANG, so this wasn't a problem in py3k (or 3.0.x builds) up through 3.1rc1. I can reproduce it on current py3k and release31-maint. |
|||
| msg90302 - (view) | Author: Ned Deily (ned.deily) | Date: 2009-07-09 03:55 | |
Note, you can produce the same error on OS X or linux by setting PYTHONIOENCODING="", which effectively overrides the value returned nl_langinfo(CODESET). In pythonrun.c, create_stdio passes PYTHONENCODING, if set, on as the "encoding" value to TextIOWrapper. If no encoding was specified, TextIOWrapper uses the value returned by locale.getpreferrencoding(). It then calls PyCodec_IncrementalDecoder and the unknown (or empty) encoding is finally detected. That raises the question of how far python should go in protecting the user. One *could* add a check in pythonrun.c to substitute some suitable default (UTF-8) if nl_langinfo(CODESET) returns an empty value. Or perhaps just abort there with a more meaningful error message. |
|||
| msg90303 - (view) | Author: Ned Deily (ned.deily) | Date: 2009-07-09 03:58 | |
"... create_stdio passes PYTHONIOENCODING ..." |
|||
| msg90308 - (view) | Author: Mark Dickinson (mark.dickinson) | Date: 2009-07-09 07:51 | |
> One *could* add a check in pythonrun.c to substitute some suitable > default (UTF-8) if nl_langinfo(CODESET) returns an empty value. While googling for the source of this problem, I found other software projects that take this approach. It doesn't seem totally unreasonable. I just wish I understood *why* nl_langinfo(CODESET) is returning "" in these cases. I've looked for the source at http://www.opensource.apple.com, but can't find it; maybe that part of Darwin isn't open source. It seems that a lot of people end up with an OS X Terminal setup such that LC_CTYPE is 'UTF-8' (perhaps this is a 10.4 thing---I haven't encountered this myself); I don't think these people should have to deal with a confusing error on startup; defaulting to UTF-8 on OS X seems like a reasonable compromise. |
|||
| msg90310 - (view) | Author: Ronald Oussoren (ronaldoussoren) | Date: 2009-07-09 08:02 | |
The manpage says that nl_langinfo returns an empty string when there is an invalid setting. There is validity in saying that 'LANG=utf-8' is an invalid setting, the LANG variable is supposed to a locale name, which would be a language setting (possibly combined with a codeset definition). "utf-8" is not a language. I wouldn't mind falling back to utf-8 as the default codeset when nl_langinfo returns an empty string because utf-8 is the default character set on OSX, and furthermore defaulting to some value is way better than crashing. I do wonder how the user ended up with LANG=utf-8 in the first place. |
|||
| msg90312 - (view) | Author: Mark Dickinson (mark.dickinson) | Date: 2009-07-09 08:11 | |
> There is validity in saying that 'LANG=utf-8' is an invalid setting Agreed. But that doesn't really explain why e.g. LANG=en_US also produces "", while LANG=invalid produces "US-ASCII". > I do wonder how the user ended up with LANG=utf-8 in the first place. Me too. As far as I can gather, it's a result of setting the Terminal preferences (particularly the character encoding and 'Set LANG environment variable on startup' checkbox) in some particular way, on some versions of OS X, for users in some countries, at some particular phases of the moon, etc... |
|||
| msg90314 - (view) | Author: Ronald Oussoren (ronaldoussoren) | Date: 2009-07-09 08:31 | |
The attached patch (issue6393-fix.patch) seems to fix the issue. Could you please test and have a look at the patch? It basicly tests if the output of nl_langinfo(CODESET) is the empty string and defaults to 'UTF-8' in that case (but only on OSX). I intent to apply this patch unless someone objects to that. |
|||
| msg90320 - (view) | Author: Mark Dickinson (mark.dickinson) | Date: 2009-07-09 09:55 | |
Thanks, Ronald! The patch fixes the problem for me. (I directly patched the locale.py file installed from the Python dmg, since I still haven't figured out how to build a python executable that exhibits this problem.) The patch doesn't look quite right, though: in the else clause, it looks as though you're testing 'result' before it exists. Shouldn't the 'result = nl_langinfo(CODESET)' line come before the 'if not result and ....' line? On the subject of Terminal and LANG, LC_CTYPE settings, I found an interesting link: http://pastie.textmate.org/111807 Indeed, after setting my region to 'South Africa' in Preferences -> International -> Formats, a newly opened Terminal window gives me: newton:~ dickinsm$ locale LANG= LC_COLLATE="C" LC_CTYPE="UTF-8" LC_MESSAGES="C" LC_MONETARY="C" LC_NUMERIC="C" LC_TIME="C" LC_ALL= And then python3 crashes on startup as above. This is on a newborn (3- week old) MacBook Pro that's been barely changed from default settings (and no transfer of files and settings from an old Mac, either). |
|||
| msg90323 - (view) | Author: Ronald Oussoren (ronaldoussoren) | Date: 2009-07-09 10:16 | |
Good catch, the code in the else is indeed in the wrong order. |
|||
| msg90373 - (view) | Author: Ned Deily (ned.deily) | Date: 2009-07-10 03:35 | |
Looks good and the "patched" patch also works in a py3k installer build. BTW, Mark, I was curious as to why you were unable to reproduce the problem with your own build. I should have mentioned that my testing was with complete installer (framework) builds. I subsequently experimented with a non-framework build and found that I could not reproduce the problem running from the ./python in the build directory. Stepping through gdb showed that, during the calls from create_stdio, the import of locale fails in textio.c, so it falls back to using "ascii" as the default encoding (~line 899) and avoids the crash. If I do a make install, the unpatched installed bin/python3 does crash in the same way as with the installer python3. |
|||
| msg90445 - (view) | Author: Antoine Pitrou (pitrou) | Date: 2009-07-12 12:49 | |
Once this patch is checked in, should we do an emergency 3.1.1 release? |
|||
| msg90447 - (view) | Author: Mark Dickinson (mark.dickinson) | Date: 2009-07-12 13:00 | |
I'm don't know whether this is really worth a 3.1.1, all by itself. There's an easy workaround, which is for affected users to set their locale properly. |
|||
| msg90608 - (view) | Author: Graham Dumpleton (grahamd) | Date: 2009-07-17 07:39 | |
I see this problem on both MacOS X 10.5 and on Windows. This is when using Python embedded inside of Apache/mod_wsgi. On MacOS X the error is: Fatal Python error: Py_Initialize: can't initialize sys standard streams ImportError: No module named encodings.utf_8 On Windows the error is: Fatal Python error: Py_Initialize: can't initialize sys standard streams LookupError: unknown encoding: cp0 The talk about the fix mentioned it only addressing MacOS X. What about Windows case I am seeing. Will it help with that at all? |
|||
| msg90609 - (view) | Author: Graham Dumpleton (grahamd) | Date: 2009-07-17 07:41 | |
Hmmm, actually my MacOS X error is different, although Windows one is same, except that encoding is listed and isn't empty. |
|||
| msg90610 - (view) | Author: Graham Dumpleton (grahamd) | Date: 2009-07-17 07:49 | |
You can ignore my MacOS X example as that was caused by something else. My question still stands as to whether the fix will address the similar problem I saw on Windows. |
|||
| msg90617 - (view) | Author: Graham Dumpleton (grahamd) | Date: 2009-07-17 10:24 | |
I have created issue6501 for my Windows variant of this problem given that it appears to be subtly different due to there being an encoding where as the MacOS X variant doesn't have one. Seeing that the fix for the MacOS X issue is in Python code, I will when I have a chance look at whether can work out any fix for the Windows variant. Not sure I have right tools to compile Python from C code on Windows, so if a C code problem, not sure can really investigate. |
|||
| msg92322 - (view) | Author: Ronald Oussoren (ronaldoussoren) | Date: 2009-09-06 14:02 | |
I've applied the fixed version of my patch in r74687 (3.x) and r74688 (3.1). |
|||
| msg93174 - (view) | Author: Svetoslav Agafonkin (slavi) | Date: 2009-09-27 15:58 | |
There is an error in r74687 (3.x) and r74688 (3.1) fixes - in the 'else' clause there should be 'return result' at the end. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2009-09-27 15:58:14 | slavi | set | status: pending -> open nosy: + slavi messages: + msg93174 |
| 2009-09-06 14:02:36 | ronaldoussoren | set | status: open -> pending resolution: fixed messages: + msg92322 stage: committed/rejected |
| 2009-07-17 10:24:41 | grahamd | set | messages: + msg90617 |
| 2009-07-17 07:49:10 | grahamd | set | messages: + msg90610 |
| 2009-07-17 07:41:40 | grahamd | set | messages: + msg90609 |
| 2009-07-17 07:39:44 | grahamd | set | nosy:
+ grahamd messages: + msg90608 |
| 2009-07-12 13:01:00 | mark.dickinson | set | messages: + msg90447 |
| 2009-07-12 12:49:47 | pitrou | set | priority: critical versions: + Python 3.2 nosy: + pitrou, benjamin.peterson messages: + msg90445 |
| 2009-07-10 03:35:39 | ned.deily | set | messages: + msg90373 |
| 2009-07-09 10:16:09 | ronaldoussoren | set | messages: + msg90323 |
| 2009-07-09 09:55:02 | mark.dickinson | set | messages: + msg90320 |
| 2009-07-09 08:31:24 | ronaldoussoren | set | keywords:
+ needs review, patch files: + issue6393-fix.patch messages: + msg90314 |
| 2009-07-09 08:11:50 | mark.dickinson | set | messages: + msg90312 |
| 2009-07-09 08:02:54 | ronaldoussoren | set | messages: + msg90310 |
| 2009-07-09 07:51:47 | mark.dickinson | set | messages: + msg90308 |
| 2009-07-09 03:58:05 | ned.deily | set | messages: + msg90303 |
| 2009-07-09 03:55:40 | ned.deily | set | messages: + msg90302 |
| 2009-07-08 21:58:28 | ned.deily | set | nosy:
+ ned.deily messages: + msg90285 |
| 2009-07-08 19:50:34 | Phil | set | nosy:
+ Phil |
| 2009-07-01 10:51:55 | mark.dickinson | create | |