Interpreter fails in initialize on systems where HAVE_LANGINFO_H is undefined #66936

WanderingLogic · 2014-10-27T21:30:08Z

BPO	22747
Nosy	@malemburg, @loewis, @pitrou, @vstinner, @skrah, @xdegaye, @Fak3
Files	no_langinfo_during_init.patch locale.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2019-05-15.02:56:56.877>
created_at = <Date 2014-10-27.21:30:08.131>
labels = ['interpreter-core', 'type-crash']
title = 'Interpreter fails in initialize on systems where HAVE_LANGINFO_H is undefined'
updated_at = <Date 2019-05-15.02:56:56.876>
user = 'https://bugs.python.org/WanderingLogic'

bugs.python.org fields:

activity = <Date 2019-05-15.02:56:56.876>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2019-05-15.02:56:56.877>
closer = 'vstinner'
components = ['Interpreter Core']
creation = <Date 2014-10-27.21:30:08.131>
creator = 'WanderingLogic'
dependencies = []
files = ['37046', '42585']
hgrepos = []
issue_num = 22747
keywords = ['patch']
message_count = 11.0
messages = ['230106', '230111', '230385', '230391', '230393', '230394', '230407', '264160', '264202', '264203', '342542']
nosy_count = 10.0
nosy_names = ['lemburg', 'loewis', 'pitrou', 'vstinner', 'Arfrever', 'skrah', 'xdegaye', 'python-dev', 'Roman.Evstifeev', 'WanderingLogic']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'crash'
url = 'https://bugs.python.org/issue22747'
versions = ['Python 3.4']

WanderingLogic · 2014-10-27T21:30:08Z

On systems where configure is unable to find langinfo.h (or where nl_langinfo() is not defined), configure undefines HAVE_LANGINFO_H in pyconfig.h. Then in pythonrun.c:get_locale_encoding() the call to nl_langinfo() is wrapped in an #ifdef, but the #else path on the ifdef does a PyErr_SetNone(PyExc_NotImplementedError) and returns NULL, which causes initfsencoding() to fail with the message "Py_Initialize: Unable to get the locale encoding", which causes the interpreter to abort.

I'm confused because http://bugs.python.org/issue8610 (from 2010) seems to have come down on the side of deciding that nl_langinfo() failures should be treated as implicitly returning either "ASCII" or "UTF-8" (I'm not sure which). But maybe that was for a different part of the interpreter?

In any case there are 4 choices here, all of which are preferable to what we are doing now.

Fail during configure. If we can't even start the interpreter, then why waste the users time with the build?
Fail during compilation. The #else path could contain #error "Python only works on systems where nl_langinfo() is correctly implemented." Again, this would be far preferable to failing only once the user has finished the install and tries to get the interpreter prompt.
Implement our own python_nl_langinfo() that we fall back on when the system one doesn't exist. (It could, for example, return "ASCII" (or "ANSI_X3.4-1968") to start with, and "UTF-8" after we see a call to setlocale(LC_CTYPE, "") or setlocale(LC_ALL, "").
just return the string "ASCII".

The attached patch does the last. I'm willing to try to write the patch for choice (3) if that's what you'd prefer. (I have an implementation that does (3) for systems that also don't have setlocale() implemented, but I don't yet know how to do it if nl_langinfo() doesn't exist but setlocale() does.)

vstinner · 2014-10-27T23:20:26Z

I'm confused because http://bugs.python.org/issue8610 (from 2010) seems
to have come down on the side of deciding that nl_langinfo() failures
should be treated as implicitly returning either "ASCII" or "UTF-8"

It's very important than Py_DecodeLocale and Py_EncodeLocale use the same
encoding than sys.getfilesystemencoding().

What is your platform? Which encoding is used by these functions?

WanderingLogic · 2014-10-31T20:36:42Z

My platform is the Android command-line shell. Essentially it is like an embedded linux platform with a very quirky partially implemented libc (not glibc). It has no langinfo.h and while it has locale.h, the implementations of setlocale() and localeconv() do nothing (and return null). The wcstombs() and mbstowcs() functions are both mapped to strncpy().

As was the original intent of utf-8, since the Linux kernel (and most supported file systems) store filenames as null-terminated byte strings, utf-8 encoded file names "work" with software that assumes that the encoding is utf-8 (for example the xterm program that I'm using to "ssh" into the machine) (for another example, the Dalvik JVM that runs user-apps.)

My intent with this tracker is to make it slightly easier for people who have libc like Android where the locale support is completely broken and really only 8-bit "ascii" is supported to get something reasonable to compile and run, while simultaneously not breaking the supported platforms.

If you look at what Kivy and Py4A have done, they basically have patches all over the main interpreter that, once applied, make the interpreter not work on any supported platform. I'm trying to avoid that approach. Two possibilities for this particular part of the interpreter are to implement option (3) above, or to implement option (4) above. Option (3) is preferable in the long run, but option(4) is a much smaller change (as long as it does consistently with the decision of tracker 8610.)

skrah · 2014-10-31T21:29:32Z

Has anyone made an effort to get this fixed in Android? I find it strange that hundreds of projects now work around Android bugs instead of putting (friendly) pressure on the Android maintainers.

Minimal langinfo.h and locale.h support should be trivial to implement.

WanderingLogic · 2014-10-31T21:57:16Z

I am working on using my resources at Intel to put some pressure on Google to fix some of the (many) problems in the Bionic libc.

I have a sort of "polyfill" library that implements locale.h, langinfo.h, as well as the structure definitions for wchar.h, and it borrows the utf8 mbs*towcs() and wcs*tombs() implementations from FreeBSD. It implements a setlocale() and nl_langinfo() that starts in locale "C", fakes it as though the user's envvars are set to "C.UTF-8" (so if you call setlocale(LC_ALL, "") the encoding is changed to UTF-8).

But Bionic has been broken for many years, and it will most likely take many more years before I (or somebody) can arrange the right set of things to get it fixed. It is not really in Google's interest to have people writing non-JVM code, so they seem to only grudgingly support it, their JVM APIs are the "walled garden" that keeps apps sticky to their platform, while allowing them to quickly switch to new processor architectures if they need to.

But all of that is not really germane to this bug. The fact is that cpython, when compiled for a system with no langinfo.h creates an executable that does nothing but crash.

What other systems (other than Android) have no langinfo.h? (Alternatively, why has this feature-test been in configure.ac for many years?) If the solution for Android is "it's android's bug and they should fix it" then shouldn't we remove all the #ifdef HAVE_LANGINFO_H tests from the code and just let compilation fail on systems that don't have langinfo.h? That is option (1) or (2) that I suggested above.

skrah · 2014-10-31T21:57:37Z

To expand a little, here ...

https://code.google.com/p/android/issues/list

... I cannot find either a localeconv() or an nl_langinfo() issue.

Perhaps the maintainers would be willing to add minimal versions?

vstinner · 2014-10-31T22:39:13Z

If the platform doesn't provide anything, we can maybe adopt the same
approach than Mac OS X: force the encoding to UTF-8 and just don't use
the C library.

xdegaye · 2016-04-25T08:03:06Z

Android default system encoding is UTF-8 as specified at http://developer.android.com/reference/java/nio/charset/Charset.html

<quote>The platform's default charset is UTF-8. (This is in contrast to some older implementations, where the default charset depended on the user's locale.) </quote>

If the platform doesn't provide anything, we can maybe adopt the same
approach than Mac OS X: force the encoding to UTF-8 and just don't use
the C library.

The attached patch does the same thing as proposed by Victor but emphasizes that Android does not HAVE_LANGINFO_H and does not have CODESET. And the fact that HAVE_LANGINFO_H and CODESET are not defined causes other problems (maybe as well in Mac OS X). In that case, PyCursesWindow_New() in _cursesmodule.c falls back nicely to "utf-8", but _Py_device_encoding() in fileutils.c instead does a Py_RETURN_NONE. It seems that this impacts _io_TextIOWrapper___init___impl() in textio.c and os_device_encoding_impl() in posixmodule.c. And indeed, os.device_encoding(0) returns None on android.

python-dev · 2016-04-25T23:57:12Z

New changeset ad6be34ce8c9 by Stefan Krah in branch 'default':
Issue bpo-22747: Workaround for systems without langinfo.h.
https://hg.python.org/cpython/rev/ad6be34ce8c9

skrah · 2016-04-26T00:00:36Z

We don't support Android officially yet, but I think until bpo-8610
is resolved something must be done here.

vstinner · 2019-05-15T02:56:57Z

Python 3 (I don't recall which version exactly) has been fixed to always use UTF-8 on Android for the filesystem encoding and even for the locale encoding in most places. I close the issue.

WanderingLogic mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump labels Oct 27, 2014

vstinner closed this as completed May 15, 2019

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interpreter fails in initialize on systems where HAVE_LANGINFO_H is undefined #66936

Interpreter fails in initialize on systems where HAVE_LANGINFO_H is undefined #66936

WanderingLogic mannequin commented Oct 27, 2014

WanderingLogic mannequin commented Oct 27, 2014

vstinner commented Oct 27, 2014

WanderingLogic mannequin commented Oct 31, 2014

skrah mannequin commented Oct 31, 2014

WanderingLogic mannequin commented Oct 31, 2014

skrah mannequin commented Oct 31, 2014

vstinner commented Oct 31, 2014

xdegaye mannequin commented Apr 25, 2016

python-dev mannequin commented Apr 25, 2016

skrah mannequin commented Apr 26, 2016

vstinner commented May 15, 2019

Interpreter fails in initialize on systems where HAVE_LANGINFO_H is undefined #66936

Interpreter fails in initialize on systems where HAVE_LANGINFO_H is undefined #66936

Comments

WanderingLogic mannequin commented Oct 27, 2014

WanderingLogic mannequin commented Oct 27, 2014

vstinner commented Oct 27, 2014

WanderingLogic mannequin commented Oct 31, 2014

skrah mannequin commented Oct 31, 2014

WanderingLogic mannequin commented Oct 31, 2014

skrah mannequin commented Oct 31, 2014

vstinner commented Oct 31, 2014

xdegaye mannequin commented Apr 25, 2016

python-dev mannequin commented Apr 25, 2016

skrah mannequin commented Apr 26, 2016

vstinner commented May 15, 2019