classification
Title: Compile fails when sizeof(wchar_t) == 1
Type: compile error Stage: needs patch
Components: Build Versions: Python 3.1, Python 3.2, Python 3.3, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: dcoles, lemburg, loewis, pitrou, vstinner
Priority: normal Keywords: patch

Created on 2011-05-05 20:45 by dcoles, last changed 2019-03-15 22:44 by BreamoreBoy.

Files
File name Uploaded Description Edit
wchar_check.patch dcoles, 2011-05-17 05:40 Patch to check wchar_t is at least 16 bits and wchar_h is present review
Messages (14)
msg135242 - (view) Author: David Coles (dcoles) * Date: 2011-05-05 20:45
On Android platforms bionic defines wchar_t as char. This causes compiling of unicodeobject.c to fail with "#error "unsupported wchar_t and PyUNICODE sizes, see issue #8670".

The unusual thing is that the configure script does detect if wchar_t is usable (HAVE_USABLE_WCHAR_T) but the wide code support block in unicodeobject.c does not check this (only an #ifdef HAVE_WCHAR_T).

Possibly the quick solution is to change this #ifdef to '#if defined(HAVE_WCHAR_T) && defined(HAVE_USABLE_WCHAR_T)'. The header unicodeobject.h does check for HAVE_USABLE_WCHAR_T but will only define HAVE_WCHAR_T if it is not already defined.
msg135247 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-05-05 21:02
Do you want to propose a patch? I'm not sure any of us can test on an Android setup.
msg135250 - (view) Author: David Coles (dcoles) * Date: 2011-05-05 21:29
Sure. There are a few other build issues (mainly missing types/structs in Bionic) that are preventing a complete build. I'll put together a patch once I have a working build and can verify that HAVE_USABLE_WCHAR_T is set correctly set on a normal Linux build.
msg135360 - (view) Author: David Coles (dcoles) * Date: 2011-05-06 19:12
After doing some more investigation it appears that Android's wchar_t support before android-9 is totally broken (see http://android.git.kernel.org/?p=platform/ndk.git;a=blob_plain;f=docs/STANDALONE-TOOLCHAIN.html;hb=HEAD). With android-9 you get 4 byte wchar_t and working wide character functions.

Possibly of more interest for Python is that it's no longer buildable without wchar_t support. While unicodeobject is pretty good at checking HAVE_WCHAR_H, a number of modules and even pythonrun.c directly use wchar_t or functions like PyUnicode_FromWideChar without providing a fallback. Does Python 3 now require wchar_t or are these bugs? (either option seems sensible).

A few other notes:
HAVE_USABLE_WCHAR_T looks like it was a check for unsigned/>16 bits wchar_t that would allow them to be directly memcpy'd. The code in unicodeobject.c seems not to really use this anymore except (it's happy with signed or unsigned) and it looks like the check is only used for Windows now.

To properly support wchar_t of size 1 you would basically implement multibyte character storage either with UTF-8 or just packing two wchar_t's with UTF-16. At least in Android the distinction doesn't seem to matter as Android's internationalziation/localization policy seems to be "use Java".
msg135363 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-05-06 19:55
> Possibly of more interest for Python is that it's no longer buildable
> without wchar_t support. While unicodeobject is pretty good at
> checking HAVE_WCHAR_H, a number of modules and even pythonrun.c
> directly use wchar_t or functions like PyUnicode_FromWideChar without
> providing a fallback. Does Python 3 now require wchar_t or are these
> bugs? (either option seems sensible).

It's pretty much required since we rely on mbstowcs and friends to
convert some 8-bit strings (such as environment variables, command-line
args...) to unicode.

> At least in Android the distinction doesn't seem to matter as
> Android's internationalziation/localization policy seems to be "use
> Java".

Ha :-)
msg135366 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011-05-06 20:31
David Coles wrote:
> 
> David Coles <coles.david@gmail.com> added the comment:
> 
> After doing some more investigation it appears that Android's wchar_t support before android-9 is totally broken (see http://android.git.kernel.org/?p=platform/ndk.git;a=blob_plain;f=docs/STANDALONE-TOOLCHAIN.html;hb=HEAD). With android-9 you get 4 byte wchar_t and working wide character functions.
>
> Possibly of more interest for Python is that it's no longer buildable without wchar_t support. While unicodeobject is pretty good at checking HAVE_WCHAR_H, a number of modules and even pythonrun.c directly use wchar_t or functions like PyUnicode_FromWideChar without providing a fallback. Does Python 3 now require wchar_t or are these bugs? (either option seems sensible).

wchar_t should be fairly portable these days. I think the main
problem is that we never assumed sizeof(wchar_t) == 1 to be a
possibility. On Windows, wchar_t was 16 bit and the glibc started
out with 32 bits.

> A few other notes:
> HAVE_USABLE_WCHAR_T looks like it was a check for unsigned/>16 bits wchar_t that would allow them to be directly memcpy'd. The code in unicodeobject.c seems not to really use this anymore except (it's happy with signed or unsigned) and it looks like the check is only used for Windows now.

Note that HAVE_USABLE_WCHAR_T is only used to check whether
Python can use wchar_t as alias for Py_UNICODE. Python's Unicode
implementation needs Py_UNICODE to be an unsigned type with
either 2 bytes or 4 bytes. If wchar_t does not provide these
sizes or is a signed type, Python cannot use it for Py_UNICODE
and must instead use "unsigned short".

If the configure script does not detect this case, then a patch
would be helpful.

The other wchar_t C lib functions should still remain usable,
though.

> To properly support wchar_t of size 1 you would basically implement multibyte character storage either with UTF-8 or just packing two wchar_t's with UTF-16. At least in Android the distinction doesn't seem to matter as Android's internationalziation/localization policy seems to be "use Java".

Python should not use wchar_t for Py_UNICODE on such platforms
and instead go with "unsigned short".

I would assume that the wchar_t C lib routines work based on UTF-8
with sizeof(wchar_t) == 1, so the PyUnicode_*WideChar*() APIs would
need to be adjusted to work more or less like the UTF-8 codecs.
msg135367 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011-05-06 20:37
From the document you posted:

"""
As documented, the Android platform did not really support wchar_t until
Android 2.3. What this means in practical terms is that:

  - If you target platform android-9 or higher, the size of wchar_t is
    4 bytes, and most wide-char functions are available in the C library
    (with the exception of multi-byte encoding/decoding functions and
     wsprintf/wsscanf).

  - If you target any prior API level, the size of wchar_t will be 1 byte
    and none of the wide-char functions will work anyway.

We recommend any developer to get rid of any dependencies on the wchar_t type
and switch to better representations. The support provided in Android is only
there to help you migrate existing code.
"""

With none of the wide-char functions working in Android <2.3, I don't
think you have a good chance of getting Python 3.x working, unless
you remove all their uses in the code and replace them with standard
char* functions.

The last paragraph doesn't sound very promising either. I wonder
what they mean with "better representation". The C standard doesn't
have any better representation for Unicode at the moment.
msg135369 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-05-06 21:24
I think what they mean is a better representation from an Android API, such as UChar32 from utils/AndroidUnicode.h.

I agree that it's not worthwhile trying to port Python to those Android versions that have a single-byte wchar_t definition. 

David, I think you are misunderstanding the purpose of HAVE_USABLE_WCHAR_T: It does *not* specify whether wchar_t can be used. Instead, it specifies whether wchar_t can be used as the datatype for Py_UNICODE. Calling it HAVE_A_WCHAR_T_DEFINITION_THAT_IS_SUITABLE_FOR_USE_AS_BASE_TYPE_OF_PY_UNICODE was just a little too tedious :-)
msg135370 - (view) Author: David Coles (dcoles) * Date: 2011-05-06 21:34
On Fri, May 6, 2011 at 1:31 PM, Marc-Andre Lemburg
<report@bugs.python.org> wrote:
> wchar_t should be fairly portable these days. I think the main
> problem is that we never assumed sizeof(wchar_t) == 1 to be a
> possibility. On Windows, wchar_t was 16 bit and the glibc started
> out with 32 bits.

Well a 1 byte wchar_t is a bit "ass backwards". I think it's very much
an edge case. :)

> Note that HAVE_USABLE_WCHAR_T is only used to check whether
> Python can use wchar_t as alias for Py_UNICODE. Python's Unicode
> implementation needs Py_UNICODE to be an unsigned type with
> either 2 bytes or 4 bytes. If wchar_t does not provide these
> sizes or is a signed type, Python cannot use it for Py_UNICODE
> and must instead use "unsigned short".

Right. That makes sense. In that case it's probably sensible to keep around.

> If the configure script does not detect this case, then a patch
> would be helpful.

Yup. I'll put something together that causes configure to bail out if
you're either missing HAVE_WCHAR_H or if SIZEOF_WCHAR_T is less than
16 bits.

> Python should not use wchar_t for Py_UNICODE on such platforms
> and instead go with "unsigned short".
>
> I would assume that the wchar_t C lib routines work based on UTF-8
> with sizeof(wchar_t) == 1, so the PyUnicode_*WideChar*() APIs would
> need to be adjusted to work more or less like the UTF-8 codecs.

Yes. Using UTF-8 would be the sensible solution. Sadly it looks like
all the wide character functions <2.3 are undefined, so in this case
Android saying it has wchar_t support is worse than useless.

On Fri, May 6, 2011 at 1:37 PM, Marc-Andre Lemburg
<report@bugs.python.org> wrote:
> With none of the wide-char functions working in Android <2.3, I don't
> think you have a good chance of getting Python 3.x working, unless
> you remove all their uses in the code and replace them with standard
> char* functions.

I agree. In my case I should be able to bump the required version
number without too much fuss. It seems a bit silly to write in support
for a platform that no longer supports said feature.

> The last paragraph doesn't sound very promising either. I wonder
> what they mean with "better representation". The C standard doesn't
> have any better representation for Unicode at the moment.

In C I guess the only sensible alternative would be UTF-8 char strings
(or maybe using uint32_t), but in Python's case it really depends on
how the underlying OS represents internationalized characters. Perhaps
in other projects you would use an external library like ICU, but
that's out the scope of my experience. :)
msg135372 - (view) Author: David Coles (dcoles) * Date: 2011-05-06 21:42
On Fri, May 6, 2011 at 2:24 PM, Martin v. Löwis <report@bugs.python.org> wrote:
>
> Martin v. Löwis <martin@v.loewis.de> added the comment:
>
> I think what they mean is a better representation from an Android API, such as UChar32 from utils/AndroidUnicode.h.

Ah. Sadly I don't think that's exposed in the NDK yet.

> I agree that it's not worthwhile trying to port Python to those Android versions that have a single-byte wchar_t definition.

Yup. Will be using Android 2.3+. If I'm forced to use an earlier
version of Android I think it would be more sensible to use the 2.x
series of Python.

> David, I think you are misunderstanding the purpose of HAVE_USABLE_WCHAR_T: It does *not* specify whether wchar_t can be used. Instead, it specifies whether wchar_t can be used as the datatype for Py_UNICODE. Calling it HAVE_A_WCHAR_T_DEFINITION_THAT_IS_SUITABLE_FOR_USE_AS_BASE_TYPE_OF_PY_UNICODE was just a little too tedious :-)

Haha :). Yes. My initial reading of the pyconfig.h was wrong. Got a
bit suspicious when my Linux box was not defining it. Then I saw them
memcpy and it made sense.
msg135375 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011-05-06 22:17
David Coles wrote:
> 
>> I agree that it's not worthwhile trying to port Python to those Android versions that have a single-byte wchar_t definition.
> 
> Yup. Will be using Android 2.3+. If I'm forced to use an earlier
> version of Android I think it would be more sensible to use the 2.x
> series of Python.

Since you sound like you want to get Python running on Android,
are you aware of this project ?

  http://code.google.com/p/android-scripting/
msg135379 - (view) Author: David Coles (dcoles) * Date: 2011-05-06 22:49
On Fri, May 6, 2011 at 3:17 PM, Marc-Andre Lemburg
<report@bugs.python.org> wrote:
> Since you sound like you want to get Python running on Android,
> are you aware of this project ?
>
>  http://code.google.com/p/android-scripting/

Yes. It's excellent. I've actually been using it as a porting
reference for Python 3.

The problem is work decided to be very future proof with it's clients
and decided to use Python 3 as the embedded scripting language.
Because of differences in the C API, it looked like it might be easier
to do a port of Python 3 to Android (we already cross compile an ARM
version for another Linux platform) then to have the Android client
with 2.6 version.
msg136142 - (view) Author: David Coles (dcoles) * Date: 2011-05-17 05:40
Attached is a patch that updates configure.in to make sure that wchar.h is present and that wchar_t is at least 16 bits wide.

On android-8 this patch causes the configure step to fail since SIZEOF_WCHAR_T == 1. On android-9 and my Linux desktop the build continues as normal, passing the build tests. If wchar.h is removed from the system then the build also fails with an error as expected.

The patch does not contain the configure diff since it also contained some other changes and I wasn't sure of the correct autoreconf process for Python.
msg236547 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2015-02-24 21:26
Is this effectively superseded by work being done on #23496?
History
Date User Action Args
2019-03-15 22:44:40BreamoreBoysetnosy: - BreamoreBoy
2015-02-24 21:26:05BreamoreBoysetnosy: + BreamoreBoy
messages: + msg236547
2011-05-17 05:40:12dcolessetfiles: + wchar_check.patch
keywords: + patch
messages: + msg136142
2011-05-06 22:49:57dcolessetmessages: + msg135379
2011-05-06 22:17:13lemburgsetmessages: + msg135375
2011-05-06 21:42:10dcolessetmessages: + msg135372
2011-05-06 21:34:52dcolessetmessages: + msg135370
2011-05-06 21:24:55loewissetnosy: + loewis
messages: + msg135369
2011-05-06 20:37:46lemburgsetmessages: + msg135367
2011-05-06 20:31:02lemburgsetnosy: + lemburg
messages: + msg135366
2011-05-06 19:55:46pitrousetmessages: + msg135363
2011-05-06 19:12:48dcolessetmessages: + msg135360
2011-05-05 21:29:26dcolessetmessages: + msg135250
2011-05-05 21:07:37vstinnersetnosy: + vstinner
2011-05-05 21:02:55pitrousetversions: + Python 3.1, Python 2.7, Python 3.2, Python 3.3
nosy: + pitrou

messages: + msg135247

stage: needs patch
2011-05-05 20:45:29dcolescreate