This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: struct.unpack('?', '\x02') returns (False,) on Mac OSX
Type: behavior Stage: patch review
Components: Build, Library (Lib) Versions: Python 3.6, Python 3.4, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: gregory.p.smith, mark.dickinson, ned.deily, ronaldoussoren, wayedt
Priority: normal Keywords: needs review, patch

Created on 2014-07-19 18:30 by wayedt, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
issue-22012.txt ronaldoussoren, 2014-07-21 08:09 review
Messages (13)
msg223470 - (view) Author: Tyler Wade (wayedt) Date: 2014-07-19 18:30
On Mac OSX, struct.unpack incorrectly handles bools.

Python 3.4.1 (default, May 19 2014, 13:10:29)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import struct
>>> struct.unpack('????', b'\x00\x01\x02\x03')
(False, True, False, True)
msg223472 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2014-07-19 20:11
Doing a quick look, the results vary.  Using current python.org 2.7.8 and 3.4.1 installers, the results are correct.  These interpreters are built with Apple gcc-4.2 (non-LLVM) from Xcode 3.2.6.  Other 2.7 and 3.4.x builds I have available at the moment, including the Apple-supplied Python 2.7, were built with clang LLVM and they all give the incorrect result.
  
$ /usr/bin/python2.7 -c "import sys,struct;print(struct.unpack('????', b'\x00\x01\x02\x03'), sys.version)"
((False, True, False, True), '2.7.5 (default, Mar  9 2014, 22:15:05) \n[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)]')

$ arch -i386 /usr/bin/python2.7 -c "import sys,struct;print(struct.unpack('????', b'\x00\x01\x02\x03'), sys.version)"
((False, True, False, True), '2.7.5 (default, Mar  9 2014, 22:15:05) \n[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)]')

$ /usr/local/bin/python2.7 -c "import sys,struct;print(struct.unpack('????', b'\x00\x01\x02\x03'), sys.version)"
((False, True, True, True), '2.7.8 (v2.7.8:ee879c0ffa11, Jun 29 2014, 21:07:35) \n[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]')

$ /usr/local/bin/python2.7-32 -c "import sys,struct;print(struct.unpack('????', b'\x00\x01\x02\x03'), sys.version)"
((False, True, True, True), '2.7.8 (v2.7.8:ee879c0ffa11, Jun 29 2014, 21:07:35) \n[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]')

$ /usr/local/bin/python3.4 -c "import sys,struct;print(struct.unpack('????', b'\x00\x01\x02\x03'), sys.version)"
(False, True, True, True) 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 00:54:21)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]

$ /usr/local/bin/python3.4-32 -c "import sys,struct;print(struct.unpack('????', b'\x00\x01\x02\x03'), sys.version)"
(False, True, True, True) 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 00:54:21)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]

$ ./bin/python3.4 -c "import sys,struct;print(struct.unpack('????', b'\x00\x01\x02\x03'), sys.version)"
(False, True, False, True) 3.4.1+ (3.4:d6b71971b228, Jul 16 2014, 16:30:56)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)]
msg223480 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2014-07-19 21:22
The relevant piece of code in the struct module looks like this:

static PyObject *
nu_bool(const char *p, const formatdef *f)
{
    BOOL_TYPE x;
    memcpy((char *)&x, p, sizeof x);
    return PyBool_FromLong(x != 0);
}

Is it possible that BOOL_TYPE is a bitfield of length 1, and that clang is somehow making use of that fact?

One thing I don't understand is that this shouldn't affect *standard* packing and unpacking (as opposed to native), but I still see the problem for a format of "<????".  However, it's fine for ">????".  Some debugging shows that we're calling 'nu_bool' even for "<????", which is a bit odd.
msg223481 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2014-07-19 21:31
> Some debugging shows that we're calling 'nu_bool' even for "<????", which is a bit odd.

Ah, I see.  There's an optimisation that uses the native table functions instead of the big/little-endian ones if the size and byte order match.
msg223487 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2014-07-19 23:25
FTR, the problem also shows up with the current clang-3.4 (3.4.2) and clang3.5 (svn213451-1) packages in Debian testing (on i386), building --with-pydebug and without, so the issue is not confined to OS X.
msg223501 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2014-07-20 07:06
> On 19 jul. 2014, at 23:22, Mark Dickinson <report@bugs.python.org> wrote:
> 
> 
> Mark Dickinson added the comment:
> 
> The relevant piece of code in the struct module looks like this:
> 
> static PyObject *
> nu_bool(const char *p, const formatdef *f)
> {
>    BOOL_TYPE x;
>    memcpy((char *)&x, p, sizeof x);
>    return PyBool_FromLong(x != 0);
> }
> 
> Is it possible that BOOL_TYPE is a bitfield of length 1, and that clang is somehow making use of that fact?

I haven't found a definitive source yet, but it seems that the only valid values for _Bool, which BOOL_TYPE expands to, are 0 and 1.  Clang might make use of that restriction. 

Ronald
msg223502 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2014-07-20 08:25
For native struct packing / unpacking, it's not even clear to me that this should be considered a bug (though for standard unpacking it definitely is):  the primary use-case for native unpacking is unpacking a collection of bytes that was written (from Python, or C, or ...) on the same machine, and in that case we're only going to be unpacking 0s and 1s.

FWIW, the docs say: "Either 0 or 1 in the native or standard bool representation will be packed, and any non-zero value will be True when unpacking.", so for native packing either the docs or the code need to be fixed.

The simplest solution would be to replace `nu_bool` with something identical to `bu_bool`.  In theory that would break on any platform where "sizeof(_Bool)" is greater than 1.  In practice, I doubt that Python's going to meet such platforms in a hurry.
msg223503 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2014-07-20 08:32
> In practice, I doubt that Python's going to meet such platforms in a hurry.

Hrm; looks like that's not a particularly safe assumption, especially with older compilers that might do a "typedef int _Bool".

From http://msdn.microsoft.com/en-us/library/tf4dy80a.aspx:

"That means that for Visual C++ 4.2, a call of sizeof(bool) yields 4, while in Visual C++ 5.0 and later, the same call yields 1."
msg223511 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2014-07-20 14:46
> On 20 jul. 2014, at 10:32, Mark Dickinson <report@bugs.python.org> wrote:
> 
> 
> Mark Dickinson added the comment:
> 
>> In practice, I doubt that Python's going to meet such platforms in a hurry.
> 
> Hrm; looks like that's not a particularly safe assumption, especially with older compilers that might do a "typedef int _Bool"

I'm not sure about the actual definition, but at least for OSX on PPC sizeof(bool) is not 1. I ran into this with pyobjc when I tried to treat BOOL and bool the same.

All of this is with Objective-C, but the same should be true for plain C.
msg223565 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2014-07-21 08:09
I just confirmed that clang only uses the LSB for _Bool values by looking at the assembly generated for the following code:

<quote>
#include <stdbool.h>
#include <stdio.h>

int main(void)
{
	_Bool x;
	*(unsigned char*)&x = 42;
	printf("%d\n", (int)x);
	return 0;
}
</quote>

The attached patch fixes the issue for me. The new testcase fails without the changes to _struct.c and passes with those changes.
msg223566 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2014-07-21 09:13
The last draf of ISO C '11: <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf>.

This says that _Bool is large enough to store 0 and 1, and that conversion of any integer data to _Bool results in 0 or 1.  If I interpret the document correctly there is no conforming way to create a value of _Bool that has a value other than 0 or 1, and that should mean clang's behavior is not a bug.

BTW. I haven't tested my patch on a PPC system yet (where sizeof(bool) != 1), and won't be able to do so until I'm back home (I'm currently at EuroPython)
msg224030 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2014-07-26 08:56
Does anyone have feedback for my proposed patch (other the bug in test code when sizeof(bool) != 1,  the test values for big and little endian are in the wrong order)?
msg224710 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2014-08-04 11:55
BTW. There is also an argument to be made against my patch and for a documentation update: it is unclear to me if clang ever creates _Bool false values where the bit pattern isn't exactly the same as the zero value of an integer of the same size (for example due to generated code only updating the least significant bit and starting with a uninitialised value that has some of the other bits set).  If it does so depends on what's more efficient to do in machine code, and my knowledge of assembly is too rusty to have anything useful to say about that (although I'd suspect that overwriting the complete _Bool value would be more efficient than loading the current value, updating the LSB and storing it again).

If it can do so the current behavior of struct.unpack would be more correct than the version you get after applying my patch.
History
Date User Action Args
2022-04-11 14:58:06adminsetgithub: 66211
2016-10-07 17:12:39berker.peksaglinkissue28377 superseder
2015-12-12 00:36:30gregory.p.smithsetversions: + Python 3.6
2015-12-12 00:36:04gregory.p.smithsetcomponents: + Library (Lib)
2015-12-12 00:35:39gregory.p.smithsetnosy: + gregory.p.smith
2014-08-04 11:55:02ronaldoussorensetmessages: + msg224710
2014-07-26 08:56:07ronaldoussorensetmessages: + msg224030
2014-07-21 09:13:17ronaldoussorensetkeywords: + patch, needs review

messages: + msg223566
stage: patch review
2014-07-21 08:09:15ronaldoussorensetfiles: + issue-22012.txt

messages: + msg223565
2014-07-20 14:46:00ronaldoussorensetmessages: + msg223511
title: struct.unpack('?', '\x02') returns (False,) when Python is built with clang -> struct.unpack('?', '\x02') returns (False,) on Mac OSX
2014-07-20 08:57:06ned.deilysettitle: struct.unpack('?', '\x02') returns (False,) on Mac OSX -> struct.unpack('?', '\x02') returns (False,) when Python is built with clang
2014-07-20 08:32:06mark.dickinsonsetmessages: + msg223503
2014-07-20 08:25:18mark.dickinsonsetmessages: + msg223502
2014-07-20 07:06:21ronaldoussorensetmessages: + msg223501
title: struct.unpack('?', '\x02') returns (False,) when Python is built with clang -> struct.unpack('?', '\x02') returns (False,) on Mac OSX
2014-07-19 23:26:57ned.deilysettitle: struct.unpack('?', '\x02') returns (False,) on Mac OSX -> struct.unpack('?', '\x02') returns (False,) when Python is built with clang
2014-07-19 23:25:58ned.deilysetcomponents: + Build, - macOS
2014-07-19 23:25:06ned.deilysetassignee: ronaldoussoren ->
messages: + msg223487
2014-07-19 21:31:03mark.dickinsonsetmessages: + msg223481
2014-07-19 21:22:45mark.dickinsonsetnosy: + mark.dickinson
messages: + msg223480
2014-07-19 20:11:29ned.deilysetnosy: + ned.deily

messages: + msg223472
versions: - Python 3.1, Python 3.2, Python 3.3
2014-07-19 18:30:27wayedtcreate