classification
Title: Interpreter fails in initialize on systems where HAVE_LANGINFO_H is undefined
Type: crash Stage:
Components: Interpreter Core Versions: Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, Roman.Evstifeev, WanderingLogic, lemburg, loewis, pitrou, python-dev, skrah, vstinner, xdegaye
Priority: normal Keywords: patch

Created on 2014-10-27 21:30 by WanderingLogic, last changed 2016-04-27 10:32 by Roman.Evstifeev.

Files
File name Uploaded Description Edit
no_langinfo_during_init.patch WanderingLogic, 2014-10-27 21:30
locale.patch xdegaye, 2016-04-25 08:03 review
Messages (10)
msg230106 - (view) Author: Matt Frank (WanderingLogic) * Date: 2014-10-27 21:30
On systems where configure is unable to find langinfo.h (or where nl_langinfo() is not defined), configure undefines HAVE_LANGINFO_H in pyconfig.h.  Then in pythonrun.c:get_locale_encoding() the call to nl_langinfo() is wrapped in an #ifdef, but the #else path on the ifdef does a PyErr_SetNone(PyExc_NotImplementedError) and returns NULL, which  causes initfsencoding() to fail with the message "Py_Initialize: Unable to get the locale encoding", which causes the interpreter to abort.

I'm confused because http://bugs.python.org/issue8610 (from 2010) seems to have come down on the side of deciding that nl_langinfo() failures should be treated as implicitly returning either "ASCII" or "UTF-8" (I'm not sure which).  But maybe that was for a different part of the interpreter?

In any case there are 4 choices here, all of which are preferable to what we are doing now.

1. Fail during configure.  If we can't even start the interpreter, then why waste the users time with the build?
2. Fail during compilation.  The #else path could contain #error "Python only works on systems where nl_langinfo() is correctly implemented."  Again, this would be far preferable to failing only once the user has finished the install and tries to get the interpreter prompt.
3. Implement our own python_nl_langinfo() that we fall back on when the system one doesn't exist.  (It could, for example, return "ASCII" (or "ANSI_X3.4-1968") to start with, and "UTF-8" after we see a call to setlocale(LC_CTYPE, "") or setlocale(LC_ALL, "").
4. just return the string "ASCII".

The attached patch does the last.  I'm willing to try to write the patch for choice (3) if that's what you'd prefer.  (I have an implementation that does (3) for systems that also don't have setlocale() implemented, but I don't yet know how to do it if nl_langinfo() doesn't exist but setlocale() does.)
msg230111 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-10-27 23:20
> I'm confused because http://bugs.python.org/issue8610 (from 2010) seems
to have come down on the side of deciding that nl_langinfo() failures
should be treated as implicitly returning either "ASCII" or "UTF-8"

It's very important than Py_DecodeLocale and Py_EncodeLocale use the same
encoding than sys.getfilesystemencoding().

What is your platform? Which encoding is used by these functions?
msg230385 - (view) Author: Matt Frank (WanderingLogic) * Date: 2014-10-31 20:36
My platform is the Android command-line shell.  Essentially it is like an embedded linux platform with a very quirky partially implemented libc (not glibc).  It has no langinfo.h and while it has locale.h, the implementations of setlocale() and localeconv() do nothing (and return null).  The wcstombs() and mbstowcs() functions are both mapped to strncpy().

As was the original intent of utf-8, since the Linux kernel (and most supported file systems) store filenames as null-terminated byte strings, utf-8 encoded file names "work" with software that assumes that the encoding is utf-8 (for example the xterm program that I'm using to "ssh" into the machine) (for another example, the Dalvik JVM that runs user-apps.)

My intent with this tracker is to make it slightly easier for people who have libc like Android where the locale support is completely broken and really only 8-bit "ascii" is supported to get something reasonable to compile and run, while simultaneously not breaking the supported platforms.

If you look at what Kivy and Py4A have done, they basically have patches all over the main interpreter that, once applied, make the interpreter not work on any supported platform.  I'm trying to avoid that approach.  Two possibilities for this particular part of the interpreter are to implement option (3) above, or to implement option (4) above.  Option (3) is preferable in the long run, but option(4) is a much smaller change (as long as it does consistently with the decision of tracker 8610.)
msg230391 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-10-31 21:29
Has anyone made an effort to get this fixed in Android?  I find it strange that hundreds of projects now work around Android bugs instead of putting (friendly) pressure on the Android maintainers.

Minimal langinfo.h and locale.h support should be trivial to implement.
msg230393 - (view) Author: Matt Frank (WanderingLogic) * Date: 2014-10-31 21:57
I am working on using my resources at Intel to put some pressure on Google to fix some of the (many) problems in the Bionic libc.

I have a sort of "polyfill" library that implements locale.h, langinfo.h, as well as the structure definitions for wchar.h, and it borrows the utf8 mbs*towcs() and wcs*tombs() implementations from FreeBSD.  It implements a setlocale() and nl_langinfo() that starts in locale "C", fakes it as though the user's envvars are set to "C.UTF-8" (so if you call setlocale(LC_ALL, "") the encoding is changed to UTF-8).

But Bionic has been broken for many years, and it will most likely take many more years before I (or somebody) can arrange the right set of things to get it fixed.  It is not really in Google's interest to have people writing non-JVM code, so they seem to only grudgingly support it, their JVM APIs are the "walled garden" that keeps apps sticky to their platform, while allowing them to quickly switch to new processor architectures if they need to.

But all of that is not really germane to this bug.  The fact is that cpython, when compiled for a system with no langinfo.h creates an executable that does nothing but crash.

What other systems (other than Android) have no langinfo.h?  (Alternatively, why has this feature-test been in configure.ac for many years?)  If the solution for Android is "it's android's bug and they should fix it" then shouldn't we remove all the #ifdef HAVE_LANGINFO_H tests from the code and just let compilation fail on systems that don't have langinfo.h?  That is option (1) or (2) that I suggested above.
msg230394 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-10-31 21:57
To expand a little, here ...

   https://code.google.com/p/android/issues/list

... I cannot find either a localeconv() or an nl_langinfo() issue.


Perhaps the maintainers would be willing to add minimal versions?
msg230407 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-10-31 22:39
If the platform doesn't provide anything, we can maybe adopt the same
approach than Mac OS X: force the encoding to UTF-8 and just don't use
the C library.
msg264160 - (view) Author: Xavier de Gaye (xdegaye) * (Python committer) Date: 2016-04-25 08:03
Android default system encoding is UTF-8 as specified at http://developer.android.com/reference/java/nio/charset/Charset.html

<quote>The platform's default charset is UTF-8. (This is in contrast to some older implementations, where the default charset depended on the user's locale.) </quote>

> If the platform doesn't provide anything, we can maybe adopt the same
> approach than Mac OS X: force the encoding to UTF-8 and just don't use
> the C library.

The attached patch does the same thing as proposed by Victor but emphasizes that Android does not HAVE_LANGINFO_H and does not have CODESET.  And the fact that HAVE_LANGINFO_H and CODESET are not defined causes other problems (maybe as well in Mac OS X). In that case, PyCursesWindow_New() in _cursesmodule.c falls back nicely to "utf-8", but _Py_device_encoding() in fileutils.c instead does a Py_RETURN_NONE. It seems that this impacts _io_TextIOWrapper___init___impl() in textio.c and os_device_encoding_impl() in posixmodule.c. And indeed, os.device_encoding(0) returns None on android.
msg264202 - (view) Author: Roundup Robot (python-dev) Date: 2016-04-25 23:57
New changeset ad6be34ce8c9 by Stefan Krah in branch 'default':
Issue #22747: Workaround for systems without langinfo.h.
https://hg.python.org/cpython/rev/ad6be34ce8c9
msg264203 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2016-04-26 00:00
We don't support Android officially yet, but I think until #8610
is resolved something must be done here.
History
Date User Action Args
2017-01-05 16:03:13xdegayeunlinkissue26865 dependencies
2016-05-22 12:45:21xdegayelinkissue26865 dependencies
2016-04-27 10:32:22Roman.Evstifeevsetnosy: + Roman.Evstifeev
2016-04-26 00:00:35skrahsetmessages: + msg264203
2016-04-25 23:57:11python-devsetnosy: + python-dev
messages: + msg264202
2016-04-25 19:17:11skrahlinkissue17905 superseder
2016-04-25 08:03:06xdegayesetfiles: + locale.patch
nosy: + xdegaye
messages: + msg264160

2014-10-31 22:39:13vstinnersetmessages: + msg230407
2014-10-31 21:57:36skrahsetmessages: + msg230394
2014-10-31 21:57:16WanderingLogicsetmessages: + msg230393
2014-10-31 21:29:31skrahsetnosy: + skrah
messages: + msg230391
2014-10-31 20:36:42WanderingLogicsetmessages: + msg230385
2014-10-27 23:20:25vstinnersetmessages: + msg230111
2014-10-27 21:30:08WanderingLogiccreate