msg89972 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-07-01 10:51 |
There was a report[1] on c.l.p. that python3 from the OS X Python 3.1
dmg download at www.python.org/download/releases/3.1/ crashes on
startup. I can reproduce this with the python.org download (using the
OS X Terminal) only with a bad locale setting:
newton:~ dickinsm$ LANG=utf-8 python3
Fatal Python error: Py_Initialize: can't initialize sys standard streams
LookupError: unknown encoding:
Abort trap (core dumped)
The core dump isn't useful: just lots of 'No symbol table info
available.'
This is on OS X 10.5.7/Intel.
I can't reproduce it with either the py3k branch or the release31-maint
branch, built from scratch.
I suspect that this has to do with the behaviour of nl_langinfo(CODESET)
on OS X: namely, after doing (in C) setlocale(LC_CTYPE, ""), the result
of nl_langinfo(CODESET) appears to be "UTF-8" for well-defined utf-8
locales (e.g., 'en_US.UTF-8'), "US-ASCII" for meaningless locales (e.g.,
'invalid'), but one just gets "" for locales like 'utf-8' or 'en_US'.
This in turn affects Python's locale.getpreferredencoding function.
See also issue 2173, which may be related.
Ronald, any ideas?
[1] http://mail.python.org/pipermail/python-list/2009-June/718255.html
|
msg90285 - (view) |
Author: Ned Deily (ned.deily) * |
Date: 2009-07-08 21:58 |
This is a side effect of the fix for Issue6202. Prior to r73268,
locale.getpreferredencoding always returned "mac-roman" regardless of the
setting of LANG, so this wasn't a problem in py3k (or 3.0.x builds) up
through 3.1rc1. I can reproduce it on current py3k and release31-maint.
|
msg90302 - (view) |
Author: Ned Deily (ned.deily) * |
Date: 2009-07-09 03:55 |
Note, you can produce the same error on OS X or linux by setting
PYTHONIOENCODING="", which effectively overrides the value returned
nl_langinfo(CODESET). In pythonrun.c, create_stdio passes
PYTHONENCODING, if set, on as the "encoding" value to TextIOWrapper. If
no encoding was specified, TextIOWrapper uses the value returned by
locale.getpreferrencoding(). It then calls PyCodec_IncrementalDecoder
and the unknown (or empty) encoding is finally detected.
That raises the question of how far python should go in protecting the
user. One *could* add a check in pythonrun.c to substitute some
suitable default (UTF-8) if nl_langinfo(CODESET) returns an empty value.
Or perhaps just abort there with a more meaningful error message.
|
msg90303 - (view) |
Author: Ned Deily (ned.deily) * |
Date: 2009-07-09 03:58 |
"... create_stdio passes PYTHONIOENCODING ..."
|
msg90308 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-07-09 07:51 |
> One *could* add a check in pythonrun.c to substitute some suitable
> default (UTF-8) if nl_langinfo(CODESET) returns an empty value.
While googling for the source of this problem, I found other software
projects that take this approach. It doesn't seem totally unreasonable.
I just wish I understood *why* nl_langinfo(CODESET) is returning "" in
these cases. I've looked for the source at
http://www.opensource.apple.com, but can't find it; maybe that part of
Darwin isn't open source.
It seems that a lot of people end up with an OS X Terminal setup such that
LC_CTYPE is 'UTF-8' (perhaps this is a 10.4 thing---I haven't encountered
this myself); I don't think these people should have to deal with a
confusing error on startup; defaulting to UTF-8 on OS X seems like a
reasonable compromise.
|
msg90310 - (view) |
Author: Ronald Oussoren (ronaldoussoren) * |
Date: 2009-07-09 08:02 |
The manpage says that nl_langinfo returns an empty string when there is
an invalid setting.
There is validity in saying that 'LANG=utf-8' is an invalid setting, the
LANG variable is supposed to a locale name, which would be a language
setting (possibly combined with a codeset definition). "utf-8" is not a
language.
I wouldn't mind falling back to utf-8 as the default codeset when
nl_langinfo returns an empty string because utf-8 is the default
character set on OSX, and furthermore defaulting to some value is way
better than crashing.
I do wonder how the user ended up with LANG=utf-8 in the first place.
|
msg90312 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-07-09 08:11 |
> There is validity in saying that 'LANG=utf-8' is an invalid setting
Agreed. But that doesn't really explain why e.g. LANG=en_US also
produces "", while LANG=invalid produces "US-ASCII".
> I do wonder how the user ended up with LANG=utf-8 in the first place.
Me too. As far as I can gather, it's a result of setting the Terminal
preferences (particularly the character encoding and 'Set LANG
environment variable on startup' checkbox) in some particular way, on
some versions of OS X, for users in some countries, at some particular
phases of the moon, etc...
|
msg90314 - (view) |
Author: Ronald Oussoren (ronaldoussoren) * |
Date: 2009-07-09 08:31 |
The attached patch (issue6393-fix.patch) seems to fix the issue.
Could you please test and have a look at the patch? It basicly tests if
the output of nl_langinfo(CODESET) is the empty string and defaults to
'UTF-8' in that case (but only on OSX).
I intent to apply this patch unless someone objects to that.
|
msg90320 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-07-09 09:55 |
Thanks, Ronald! The patch fixes the problem for me.
(I directly patched the locale.py file installed from
the Python dmg, since I still haven't figured out how
to build a python executable that exhibits this
problem.)
The patch doesn't look quite right, though: in the else clause,
it looks as though you're testing 'result' before it exists.
Shouldn't the 'result = nl_langinfo(CODESET)' line come
before the 'if not result and ....' line?
On the subject of Terminal and LANG, LC_CTYPE settings, I found an
interesting link:
http://pastie.textmate.org/111807
Indeed, after setting my region to 'South Africa' in Preferences ->
International -> Formats, a newly opened Terminal window gives me:
newton:~ dickinsm$ locale
LANG=
LC_COLLATE="C"
LC_CTYPE="UTF-8"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=
And then python3 crashes on startup as above. This is on a newborn (3-
week old) MacBook Pro that's been barely changed from default settings
(and no transfer of files and settings from an old Mac, either).
|
msg90323 - (view) |
Author: Ronald Oussoren (ronaldoussoren) * |
Date: 2009-07-09 10:16 |
Good catch, the code in the else is indeed in the wrong order.
|
msg90373 - (view) |
Author: Ned Deily (ned.deily) * |
Date: 2009-07-10 03:35 |
Looks good and the "patched" patch also works in a py3k installer build.
BTW, Mark, I was curious as to why you were unable to reproduce the
problem with your own build. I should have mentioned that my testing
was with complete installer (framework) builds. I subsequently
experimented with a non-framework build and found that I could not
reproduce the problem running from the ./python in the build directory.
Stepping through gdb showed that, during the calls from create_stdio,
the import of locale fails in textio.c, so it falls back to using
"ascii" as the default encoding (~line 899) and avoids the crash. If I
do a make install, the unpatched installed bin/python3 does crash in the
same way as with the installer python3.
|
msg90445 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-07-12 12:49 |
Once this patch is checked in, should we do an emergency 3.1.1 release?
|
msg90447 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-07-12 13:00 |
I'm don't know whether this is really worth a 3.1.1, all by itself.
There's an easy workaround, which is for affected users to set their
locale properly.
|
msg90608 - (view) |
Author: Graham Dumpleton (grahamd) |
Date: 2009-07-17 07:39 |
I see this problem on both MacOS X 10.5 and on Windows. This is when using
Python embedded inside of Apache/mod_wsgi.
On MacOS X the error is:
Fatal Python error: Py_Initialize: can't initialize sys standard streams
ImportError: No module named encodings.utf_8
On Windows the error is:
Fatal Python error: Py_Initialize: can't initialize sys standard streams
LookupError: unknown encoding: cp0
The talk about the fix mentioned it only addressing MacOS X. What about
Windows case I am seeing. Will it help with that at all?
|
msg90609 - (view) |
Author: Graham Dumpleton (grahamd) |
Date: 2009-07-17 07:41 |
Hmmm, actually my MacOS X error is different, although Windows one is
same, except that encoding is listed and isn't empty.
|
msg90610 - (view) |
Author: Graham Dumpleton (grahamd) |
Date: 2009-07-17 07:49 |
You can ignore my MacOS X example as that was caused by something else.
My question still stands as to whether the fix will address the similar
problem I saw on Windows.
|
msg90617 - (view) |
Author: Graham Dumpleton (grahamd) |
Date: 2009-07-17 10:24 |
I have created issue6501 for my Windows variant of this problem given that
it appears to be subtly different due to there being an encoding where as
the MacOS X variant doesn't have one.
Seeing that the fix for the MacOS X issue is in Python code, I will when I
have a chance look at whether can work out any fix for the Windows
variant. Not sure I have right tools to compile Python from C code on
Windows, so if a C code problem, not sure can really investigate.
|
msg92322 - (view) |
Author: Ronald Oussoren (ronaldoussoren) * |
Date: 2009-09-06 14:02 |
I've applied the fixed version of my patch in r74687 (3.x) and r74688
(3.1).
|
msg93174 - (view) |
Author: Svetoslav Agafonkin (slavi) |
Date: 2009-09-27 15:58 |
There is an error in r74687 (3.x) and r74688 (3.1) fixes - in the 'else'
clause there should be 'return result' at the end.
|
msg95124 - (view) |
Author: Ned Deily (ned.deily) * |
Date: 2009-11-10 18:08 |
The missing return result in the else case has been subsequently fixed in
r75539 (py3k) and r75541 (3.0) so this issue should be re-closed.
|
msg293537 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-05-12 09:51 |
New changeset 94a3694c3dda97e3bcb51264bf47d948c5424d84 by Victor Stinner in branch '2.7':
bpo-6393: Fix locale.getprerredencoding() on macOS (#1555)
https://github.com/python/cpython/commit/94a3694c3dda97e3bcb51264bf47d948c5424d84
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:50 | admin | set | github: 50642 |
2017-05-12 09:51:40 | vstinner | set | nosy:
+ vstinner messages:
+ msg293537
|
2017-05-12 09:28:12 | vstinner | set | pull_requests:
+ pull_request1651 |
2010-02-10 18:14:42 | srid | set | nosy:
+ srid
|
2009-11-24 16:37:44 | ronaldoussoren | set | status: open -> closed |
2009-11-10 18:08:14 | ned.deily | set | messages:
+ msg95124 |
2009-09-27 15:58:14 | slavi | set | status: pending -> open nosy:
+ slavi messages:
+ msg93174
|
2009-09-06 14:02:36 | ronaldoussoren | set | status: open -> pending resolution: fixed messages:
+ msg92322
stage: resolved |
2009-07-17 10:24:41 | grahamd | set | messages:
+ msg90617 |
2009-07-17 07:49:10 | grahamd | set | messages:
+ msg90610 |
2009-07-17 07:41:40 | grahamd | set | messages:
+ msg90609 |
2009-07-17 07:39:44 | grahamd | set | nosy:
+ grahamd messages:
+ msg90608
|
2009-07-12 13:01:00 | mark.dickinson | set | messages:
+ msg90447 |
2009-07-12 12:49:47 | pitrou | set | priority: critical versions:
+ Python 3.2 nosy:
+ pitrou, benjamin.peterson
messages:
+ msg90445
|
2009-07-10 03:35:39 | ned.deily | set | messages:
+ msg90373 |
2009-07-09 10:16:09 | ronaldoussoren | set | messages:
+ msg90323 |
2009-07-09 09:55:02 | mark.dickinson | set | messages:
+ msg90320 |
2009-07-09 08:31:24 | ronaldoussoren | set | keywords:
+ needs review, patch files:
+ issue6393-fix.patch messages:
+ msg90314
|
2009-07-09 08:11:50 | mark.dickinson | set | messages:
+ msg90312 |
2009-07-09 08:02:54 | ronaldoussoren | set | messages:
+ msg90310 |
2009-07-09 07:51:47 | mark.dickinson | set | messages:
+ msg90308 |
2009-07-09 03:58:05 | ned.deily | set | messages:
+ msg90303 |
2009-07-09 03:55:40 | ned.deily | set | messages:
+ msg90302 |
2009-07-08 21:58:28 | ned.deily | set | nosy:
+ ned.deily messages:
+ msg90285
|
2009-07-08 19:50:34 | Phil | set | nosy:
+ Phil
|
2009-07-01 10:51:55 | mark.dickinson | create | |