Title: OS X: python3 from python-3.1.dmg crashes at startup
Type: crash Stage: resolved
Components: Interpreter Core, macOS Versions: Python 3.1, Python 3.2
Status: closed Resolution: fixed
Assigned To: ronaldoussoren Nosy List: Phil, benjamin.peterson, grahamd, mark.dickinson, ned.deily, pitrou, ronaldoussoren, slavi, srid, vstinner
Priority: critical Keywords: needs review, patch

Created on 2009-07-01 10:51 by mark.dickinson, last changed 2022-04-11 14:56 by admin. This issue is now closed.

issue6393-fix.patch ronaldoussoren, 2009-07-09 08:31
PR 1555 merged vstinner, 2017-05-12 09:28
Messages
msg89972 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-07-01 10:51
There was a report[1] on c.l.p. that python3 from the OS X Python 3.1 
dmg download at crashes on 
startup.  I can reproduce this with the download (using the 
OS X Terminal) only with a bad locale setting:

newton:~ dickinsm$ LANG=utf-8 python3
Fatal Python error: Py_Initialize: can't initialize sys standard streams
LookupError: unknown encoding: 
Abort trap (core dumped)

The core dump isn't useful:  just lots of 'No symbol table info 

This is on OS X 10.5.7/Intel.

I can't reproduce it with either the py3k branch or the release31-maint 
branch, built from scratch.

I suspect that this has to do with the behaviour of nl_langinfo(CODESET) 
on OS X: namely, after doing (in C) setlocale(LC_CTYPE, ""), the result 
of nl_langinfo(CODESET) appears to be "UTF-8" for well-defined utf-8 
locales (e.g., 'en_US.UTF-8'), "US-ASCII" for meaningless locales (e.g., 
'invalid'), but one just gets "" for locales like 'utf-8' or 'en_US'.
This in turn affects Python's locale.getpreferredencoding function. 
See also issue 2173, which may be related.

Ronald, any ideas?

msg90285 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2009-07-08 21:58
This is a side effect of the fix for Issue6202.  Prior to r73268, 
locale.getpreferredencoding always returned "mac-roman" regardless of the 
setting of LANG, so this wasn't a problem in py3k (or 3.0.x builds) up 
through 3.1rc1.  I can reproduce it on current py3k and release31-maint.
msg90302 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2009-07-09 03:55
Note, you can produce the same error on OS X or linux by setting 
PYTHONIOENCODING="", which effectively overrides the value returned 
nl_langinfo(CODESET).  In pythonrun.c, create_stdio passes 
PYTHONENCODING, if set, on as the "encoding" value to TextIOWrapper.  If 
no encoding was specified, TextIOWrapper uses the value returned by 
locale.getpreferrencoding().  It then calls PyCodec_IncrementalDecoder 
and the unknown (or empty) encoding is finally detected.

That raises the question of how far python should go in protecting the 
user.  One *could* add a check in pythonrun.c to substitute some 
suitable default (UTF-8) if nl_langinfo(CODESET) returns an empty value.  
Or perhaps just abort there with a more meaningful error message.
msg90303 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2009-07-09 03:58
"... create_stdio passes PYTHONIOENCODING ..."
msg90308 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-07-09 07:51
> One *could* add a check in pythonrun.c to substitute some suitable
> default (UTF-8) if nl_langinfo(CODESET) returns an empty value.

While googling for the source of this problem, I found other software 
projects that take this approach.  It doesn't seem totally unreasonable.

I just wish I understood *why* nl_langinfo(CODESET) is returning "" in 
these cases.  I've looked for the source at, but can't find it;  maybe that part of 
Darwin isn't open source.

It seems that a lot of people end up with an OS X Terminal setup such that 
LC_CTYPE is 'UTF-8' (perhaps this is a 10.4 thing---I haven't encountered 
this myself);  I don't think these people should have to deal with a 
confusing error on startup;  defaulting to UTF-8 on OS X seems like a 
reasonable compromise.
msg90310 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2009-07-09 08:02
The manpage says that nl_langinfo returns an empty string when there is 
an invalid setting.

There is validity in saying that 'LANG=utf-8' is an invalid setting, the 
LANG variable is supposed to a locale name, which would be a language 
setting (possibly combined with a codeset definition). "utf-8" is not a 

I wouldn't mind falling back to utf-8 as the default codeset when 
nl_langinfo returns an empty string because utf-8 is the default 
character set on OSX, and furthermore defaulting to some value is way 
better than crashing.

I do wonder how the user ended up with LANG=utf-8 in the first place.
msg90312 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-07-09 08:11
> There is validity in saying that 'LANG=utf-8' is an invalid setting

Agreed.  But that doesn't really explain why e.g. LANG=en_US also 
produces "", while LANG=invalid produces "US-ASCII".

> I do wonder how the user ended up with LANG=utf-8 in the first place.

Me too.  As far as I can gather, it's a result of setting the Terminal 
preferences (particularly the character encoding and 'Set LANG 
environment variable on startup' checkbox) in some particular way, on 
some versions of OS X, for users in some countries, at some particular 
phases of the moon, etc...
msg90314 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2009-07-09 08:31
The attached patch (issue6393-fix.patch) seems to fix the issue.

Could you please test and have a look at the patch? It basicly tests if 
the output of nl_langinfo(CODESET) is the empty string and defaults to 
'UTF-8' in that case (but only on OSX).

I intent to apply this patch unless someone objects to that.
msg90320 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-07-09 09:55
Thanks, Ronald!  The patch fixes the problem for me.
(I directly patched the file installed from
the Python dmg, since I still haven't figured out how
to build a python executable that exhibits this

The patch doesn't look quite right, though: in the else clause,
it looks as though you're testing 'result' before it exists.
Shouldn't the 'result = nl_langinfo(CODESET)' line come
before the 'if not result and ....' line?

On the subject of Terminal and LANG, LC_CTYPE settings, I found an 
interesting link:

Indeed, after setting my region to 'South Africa' in Preferences -> 
International -> Formats, a newly opened Terminal window gives me:

newton:~ dickinsm$ locale

And then python3 crashes on startup as above.  This is on a newborn (3-
week old) MacBook Pro that's been barely changed from default settings 
(and no transfer of files and settings from an old Mac, either).
msg90323 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2009-07-09 10:16
Good catch, the code in the else is indeed in the wrong order.
msg90373 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2009-07-10 03:35
Looks good and the "patched" patch also works in a py3k installer build.

BTW, Mark, I was curious as to why you were unable to reproduce the 
problem with your own build.  I should have mentioned that my testing 
was with complete installer (framework) builds.  I subsequently 
experimented with a non-framework build and found that I could not 
reproduce the problem running from the ./python in the build directory.  
Stepping through gdb showed that, during the calls from create_stdio, 
the import of locale fails in textio.c, so it falls back to using 
"ascii" as the default encoding (~line 899) and avoids the crash.  If I 
do a make install, the unpatched installed bin/python3 does crash in the 
same way as with the installer python3.
msg90445 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-07-12 12:49
Once this patch is checked in, should we do an emergency 3.1.1 release?
msg90447 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-07-12 13:00
I'm don't know whether this is really worth a 3.1.1, all by itself.  
There's an easy workaround, which is for affected users to set their 
locale properly.
msg90608 - (view) Author: Graham Dumpleton (grahamd) Date: 2009-07-17 07:39
I see this problem on both MacOS X 10.5 and on Windows. This is when using 
Python embedded inside of Apache/mod_wsgi.

On MacOS X the error is:

Fatal Python error: Py_Initialize: can't initialize sys standard streams
ImportError: No module named encodings.utf_8

On Windows the error is:

Fatal Python error: Py_Initialize: can't initialize sys standard streams
LookupError: unknown encoding: cp0

The talk about the fix mentioned it only addressing MacOS X. What about 
Windows case I am seeing. Will it help with that at all?
msg90609 - (view) Author: Graham Dumpleton (grahamd) Date: 2009-07-17 07:41
Hmmm, actually my MacOS X error is different, although Windows one is 
same, except that encoding is listed and isn't empty.
msg90610 - (view) Author: Graham Dumpleton (grahamd) Date: 2009-07-17 07:49
You can ignore my MacOS X example as that was caused by something else.

My question still stands as to whether the fix will address the similar 
problem I saw on Windows.
msg90617 - (view) Author: Graham Dumpleton (grahamd) Date: 2009-07-17 10:24
I have created issue6501 for my Windows variant of this problem given that 
it appears to be subtly different due to there being an encoding where as 
the MacOS X variant doesn't have one.

Seeing that the fix for the MacOS X issue is in Python code, I will when I 
have a chance look at whether can work out any fix for the Windows 
variant. Not sure I have right tools to compile Python from C code on 
Windows, so if a C code problem, not sure can really investigate.
msg92322 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2009-09-06 14:02
I've applied the fixed version of my patch in r74687 (3.x) and r74688 
msg93174 - (view) Author: Svetoslav Agafonkin (slavi) Date: 2009-09-27 15:58
There is an error in r74687 (3.x) and r74688 (3.1) fixes - in the 'else' 
clause there should be 'return result' at the end.
msg95124 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2009-11-10 18:08
The missing return result in the else case has been subsequently fixed in 
r75539 (py3k) and r75541 (3.0) so this issue should be re-closed.
msg293537 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-12 09:51
New changeset 94a3694c3dda97e3bcb51264bf47d948c5424d84 by Victor Stinner in branch '2.7':
bpo-6393: Fix locale.getprerredencoding() on macOS (#1555)
