msg74398 - (view) |
Author: Trent Mick (trentm) *  |
Date: 2008-10-06 22:27 |
Revision 63955 removed a block from configure.in (and effectively from
pyconfig.h.in) having to do with endianness that results in an incorrect
setting for "WORDS_BIGENDIAN" in Universal builds on Mac OS X.
The removed part was this:
> AH_VERBATIM([WORDS_BIGENDIAN],
> [
> /* Define to 1 if your processor stores words with the most
significant byte
> first (like Motorola and SPARC, unlike Intel and VAX).
>
> The block below does compile-time checking for endianness on
platforms
> that use GCC and therefore allows compiling fat binaries on OSX by
using
> '-arch ppc -arch i386' as the compile flags. The phrasing was
choosen
> such that the configure-result is used on systems that don't use
GCC.
> */
> #ifdef __BIG_ENDIAN__
> #define WORDS_BIGENDIAN 1
> #else
> #ifndef __LITTLE_ENDIAN__
> #undef WORDS_BIGENDIAN
> #endif
> #endif])
This used to allow "WORDS_BIGENDIAN" to be correct for all parts of a
universal Python build done via `gcc -arch i386 -arch ppc ...`.
This was originally added for issue 1471883 (see msg50040 for a
discussion of this particular bit).
The result of this bug is that Python extensions using either of the
following to get native byte ordering for UTF-16 decoding:
PyUnicode_DecodeUTF16(..., byteorder=0);
PyUnicode_DecodeUTF16Stateful(..., byteorder=0, ...);
on Mac OS X/PowerPC with a universal build built on Intel hardware (most
such builds) will get the wrong byte-ordering.
The fix is to restore that section to configure.in and re-run autoconf
and autoheader.
Ronald,
Was there are particular reason that this block was removed from
configure.in (and pyconfig.h.in)?
I'd like to hear comments from either Ronald or Martin, and then I can
commit the fix.
|
msg74399 - (view) |
Author: Trent Mick (trentm) *  |
Date: 2008-10-06 22:31 |
This also shows up in the byte ordering that Python uses to encode utf-16:
$ uname -a
Darwin sphinx 8.11.0 Darwin Kernel Version 8.11.0: Wed Oct 10 18:26:00
PDT 2007; root:xnu-792.24.17~1/RELEASE_PPC Power Macintosh powerpc
$ python2.6 -c "import codecs; codecs.open('26.txt', 'w',
'utf-16').write('hi')"
$ od -cx 26.txt
0000000 377 376 h \0 i \0
fffe 6800 6900
0000006
$ /usr/bin/python -c "import codecs; codecs.open('system.txt', 'w',
'utf-16').write('hi')"
$ od -cx system.txt
0000000 376 377 \0 h \0 i
feff 0068 0069
0000006
The BOM here ensures, of course, that this is still valid UTF-16
content, but the difference in behaviour here btwn Python versions might
not be intended.
|
msg74400 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2008-10-06 22:40 |
Does this also affect sys.byteorder and the struct module ?
I think those would be more important to get right than the UTF-16
codec, since this only uses the native byte ordering for increased
performance and compatibility with other OS tools. Since UTF-16 is not
wide-spread on Mac OS X, it's not so much an issue... it would be on
Windows.
|
msg74402 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2008-10-06 22:47 |
BTW: Does this simplified approach really work for Python on Mac OS X:
On 2008-10-07 00:27, Trent Mick wrote:
>> The block below does compile-time checking for endianness on
> platforms
>> that use GCC and therefore allows compiling fat binaries on OSX by
> using
>> '-arch ppc -arch i386' as the compile flags. The phrasing was
> choosen
>> such that the configure-result is used on systems that don't use
> GCC.
For most other tools that require configure tests regarding endianness
on Mac OS X, the process of building a universal binary goes something
like this:
http://developer.apple.com/opensource/buildingopensourceuniversal.html
ie. you run the whole process twice and then combine the results using
lipo.
|
msg74404 - (view) |
Author: Trent Mick (trentm) *  |
Date: 2008-10-06 22:49 |
> Does this also affect sys.byteorder and the struct module ?
Doesn't seem to affect sys.byteorder:
$ /usr/bin/python -c "import sys; print sys.byteorder"
big
$ python2.6 -c "import sys; print sys.byteorder"
big
> I think those would be more important to get right than the UTF-16
> codec, since this only uses the native byte ordering for increased
> performance and compatibility with other OS tools. Since UTF-16 is not
> wide-spread on Mac OS X, it's not so much an issue...
It is an issue for Python extensions that use that API. For example, it
is the cause of recent Komodo builds not starting Mac OS X/PowerPC
(http://bugs.activestate.com/show_bug.cgi?id=79366) because the PyXPCOM
extension and embedded Python 2.6 build was getting UTF-16 data mixed up
when talking with Mozilla APIs.
it would be on
> Windows.
|
msg74406 - (view) |
Author: Trent Mick (trentm) *  |
Date: 2008-10-06 22:52 |
> BTW: Does this simplified approach really work for Python on Mac OS X
It works for Python 2.5:
http://svn.python.org/view/*checkout*/python/branches/release25-maint/configure.in?rev=66299
search for "BIGENDIAN".
|
msg74407 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2008-10-06 22:59 |
On 2008-10-07 00:52, Trent Mick wrote:
> Trent Mick <trentm@gmail.com> added the comment:
>
>> BTW: Does this simplified approach really work for Python on Mac OS X
>
> It works for Python 2.5:
>
> http://svn.python.org/view/*checkout*/python/branches/release25-maint/configure.in?rev=66299
>
> search for "BIGENDIAN".
Thanks... didn't see that the settings enables a compile-time check.
|
msg74424 - (view) |
Author: Ronald Oussoren (ronaldoussoren) *  |
Date: 2008-10-07 06:13 |
The issue was introduced while moving universal-binary specific trickery
from pyconfig.h.in to a separate header file. Obviously I must have been
drunk at the time, because I didn't move the WORDS_BIGENDIAN bits
correctly.
The attached patch in "pymacconfig.h.patch" adds detection of
WORDS_BIGENDIAN to pymacconfig.h, the header where the other pyconfig.h
overrides for universal builds are as well.
Background: this work was done while adding support for 4-way universal
builds, that is x86, x86_64, ppc and ppc64. This required many more
updates to pyconfig.h, most of which couldn't be done in a clean
platform-independent way. That's why I (tried to) move the setting of
pyconfig.h values that are affected by the current architecture to
Include/pymacconfig.h.
NOTE: I haven't tested my patch yet, I'll do a full test round later
today.
|
msg74425 - (view) |
Author: Trent Mick (trentm) *  |
Date: 2008-10-07 07:06 |
> Added file: http://bugs.python.org/file11723/pymacconfig.h.patch
I'll test that on my end tomorrow -- though it looks like it will work fine.
Thanks.
|
msg74442 - (view) |
Author: Ronald Oussoren (ronaldoussoren) *  |
Date: 2008-10-07 12:31 |
Annoyingly enough my patch isn't good enough, it turns out that ctypes
has introduced a SIZEOF__BOOL definition in configure.in and that needs
special caseing as well.
pymacconfig.h.patch2 fixes that issue as well. Do you have access to a
PPC G5 system? I've determined the correct value of SIZEOF__BOOL for
that platform by reading the assembly code for a small test program and
hence am not 100% sure that sizeof(_Bool) actually is 1 on that
architecture.
One other annoying issue cropped up: regrtest.py consistently hangs in
test_signal (with 100% CPU usage) when I run it in rossetta (PPC
emulator). I'll test this on an actual PPC machine as well, this might
well be an issue with the PPC emulator.
|
msg74448 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2008-10-07 13:37 |
I agree with Trent that this is a bug, and I agree with the second patch
(pymacconfig.h.patch2).
Mark-Andre, sys.byteorder is not affected because detects the byte order
at run-time, not at compile-time. Likewise, in the struct module,
several code paths rely on dynamic determination of the endianness, such
as _PyLong_FromByteArray, the float packing, and the whichtable function.
|
msg74459 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2008-10-07 15:23 |
On 2008-10-07 14:33, Ronald Oussoren wrote:
> Ronald Oussoren <ronaldoussoren@mac.com> added the comment:
>
> Annoyingly enough my patch isn't good enough, it turns out that ctypes
> has introduced a SIZEOF__BOOL definition in configure.in and that needs
> special caseing as well.
>
> pymacconfig.h.patch2 fixes that issue as well. Do you have access to a
> PPC G5 system? I've determined the correct value of SIZEOF__BOOL for
> that platform by reading the assembly code for a small test program and
> hence am not 100% sure that sizeof(_Bool) actually is 1 on that
> architecture.
Using this helper:
#include <stdio.h>
main() {
printf("sizeof(_Bool)=%i bytes\n", sizeof(_Bool));
}
I get:
sizeof(_Bool)=4 bytes
on a G4 PPC.
Seems strange to me, but reasonable since it is defined like this
in stdbool.h:
#if __STDC_VERSION__ < 199901L && __GNUC__ < 3
typedef int _Bool;
#endif
|
msg74463 - (view) |
Author: Trent Mick (trentm) *  |
Date: 2008-10-07 16:29 |
> I get:
>
> sizeof(_Bool)=4 bytes
>
> on a G4 PPC.
Same thing on a G5 PPC:
$ cat main.c
#include <stdio.h>
int main(void) {
printf("sizeof(_Bool) is %d\n", sizeof(_Bool));
}
$ gcc main.c
$ ./a.out
sizeof(_Bool) is 4
|
msg74474 - (view) |
Author: Ronald Oussoren (ronaldoussoren) *  |
Date: 2008-10-07 19:54 |
On 7 Oct, 2008, at 18:29, Trent Mick wrote:
>
> Trent Mick <trentm@gmail.com> added the comment:
>
>> I get:
>>
>> sizeof(_Bool)=4 bytes
>>
>> on a G4 PPC.
>
> Same thing on a G5 PPC:
>
> $ cat main.c
> #include <stdio.h>
>
> int main(void) {
> printf("sizeof(_Bool) is %d\n", sizeof(_Bool));
> }
> $ gcc main.c
What if you compile using 'gcc -arch ppc64 main.c'?
Ronald
|
msg74494 - (view) |
Author: Trent Mick (trentm) *  |
Date: 2008-10-07 23:06 |
> What if you compile using 'gcc -arch ppc64 main.c'?
$ gcc -arch ppc64 main.c
$ ./a.out
sizeof(_Bool) is 1
As you figured out.
|
msg78412 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2008-12-28 15:38 |
Applied the patch in r67982.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:40 | admin | set | github: 48310 |
2008-12-28 15:38:00 | benjamin.peterson | set | status: open -> closed nosy:
+ benjamin.peterson resolution: fixed messages:
+ msg78412 |
2008-10-07 23:06:10 | trentm | set | messages:
+ msg74494 |
2008-10-07 19:54:17 | ronaldoussoren | set | messages:
+ msg74474 |
2008-10-07 16:29:01 | trentm | set | messages:
+ msg74463 title: PyUnicode_DecodeUTF16(..., byteorder=0) gets it wrong on Mac OS X/PowerPC -> PyUnicode_DecodeUTF16(..., byteorder=0) gets it wrong on Mac OS X/PowerPC |
2008-10-07 15:23:49 | lemburg | set | messages:
+ msg74459 title: PyUnicode_DecodeUTF16(..., byteorder=0) gets it wrong on Mac OS X/PowerPC -> PyUnicode_DecodeUTF16(..., byteorder=0) gets it wrong on Mac OS X/PowerPC |
2008-10-07 13:37:39 | loewis | set | messages:
+ msg74448 |
2008-10-07 12:31:54 | ronaldoussoren | set | files:
+ pymacconfig.h.patch2 messages:
+ msg74442 |
2008-10-07 07:06:41 | trentm | set | messages:
+ msg74425 title: PyUnicode_DecodeUTF16(..., byteorder=0) gets it wrong on Mac OS X/PowerPC -> PyUnicode_DecodeUTF16(..., byteorder=0) gets it wrong on Mac OS X/PowerPC |
2008-10-07 06:13:04 | ronaldoussoren | set | files:
+ pymacconfig.h.patch messages:
+ msg74424 |
2008-10-06 22:59:05 | lemburg | set | messages:
+ msg74407 title: PyUnicode_DecodeUTF16(..., byteorder=0) gets it wrong on Mac OS X/PowerPC -> PyUnicode_DecodeUTF16(..., byteorder=0) gets it wrong on Mac OS X/PowerPC |
2008-10-06 22:53:22 | trentm | set | keywords:
+ patch files:
+ issue4060_macosx_endian.patch |
2008-10-06 22:52:49 | trentm | set | messages:
+ msg74406 |
2008-10-06 22:49:39 | trentm | set | messages:
+ msg74404 |
2008-10-06 22:47:14 | lemburg | set | messages:
+ msg74402 title: PyUnicode_DecodeUTF16(..., byteorder=0) gets it wrong on Mac OS X/PowerPC -> PyUnicode_DecodeUTF16(..., byteorder=0) gets it wrong on Mac OS X/PowerPC |
2008-10-06 22:40:34 | lemburg | set | nosy:
+ lemburg messages:
+ msg74400 |
2008-10-06 22:31:28 | trentm | set | messages:
+ msg74399 |
2008-10-06 22:27:09 | trentm | create | |