classification
Title: Python ctypes BigEndianStructure bitfield assignment misbehavior in Linux
Type: behavior Stage: resolved
Components: ctypes Versions: Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Alan.Ning, Brian Trotter, amaury.forgeotdarc, belopolsky, mrolle, rgaddi, xiang.zhang
Priority: normal Keywords:

Created on 2014-02-14 21:43 by Alan.Ning, last changed 2016-11-09 07:17 by xiang.zhang. This issue is now closed.

Messages (6)
msg211241 - (view) Author: Alan Ning (Alan.Ning) Date: 2014-02-14 21:43
I am seeing a strange issue with bitfields and BigEndianStructure under Ubuntu 12.04 x64, Python 2.7.3.

This bug only occurs if I define my bitfields using c_uint. If I switch to c_ushort, it goes away.

Below is a simple code that highlights the problem. I have two structures - BitField1U and BitField2U. It is a union of a 4 bytes array and a bitfield definition.

Under Linux, by simply setting fields.a = 1 twice, it modifies the underlying byte array twice in a very different way. This behavior does not occur in Windows.

Output: Ubuntu 12.04x64 Python 2.7.3
20000000
20000020 <- problem
20000000
20000000

Output: Window 7 x64 Python 2.7.3
20000000
20000000
20000000
20000000

This bug was originally reported as a question in StackOverflow. 
http://stackoverflow.com/questions/21785874/python-ctypes-bitfield-windows-vs-linux


Source code:

import ctypes
import binascii

class BitField1(ctypes.BigEndianStructure):
    _pack_ = 1
    _fields_ = [
    ('a', ctypes.c_uint, 3),
    ('b', ctypes.c_uint, 1),
    ]

class BitField1U(ctypes.Union):
    _pack_ = 1
    _fields_ = [("fields", BitField1), 
        ("raw_bytes", ctypes.c_ubyte * 4)]

class BitField2(ctypes.BigEndianStructure):
    _pack_ = 1
    _fields_ = [
    ('a', ctypes.c_ushort, 3),
    ('b', ctypes.c_ushort, 1),
    ]

class BitField2U(ctypes.Union):
    _pack_ = 1
    _fields_ = [("fields", BitField2), 
        ("raw_bytes", ctypes.c_ubyte * 4)]

def printBytes(raw_bytes) :
    ba = bytearray(raw_bytes)
    print(binascii.hexlify(ba))
    
def printFields(fields) :
    print(fields.a),
    print(fields.b),
    print

b1 = BitField1U()
b2 = BitField2U()

# Simply set fields.a = 1 twice, and notice how the raw_bytes changes.

b1.fields.a = 1
printBytes(b1.raw_bytes)
b1.fields.a = 1
printBytes(b1.raw_bytes)

b2.fields.a = 1
printBytes(b2.raw_bytes)
b2.fields.a = 1
printBytes(b2.raw_bytes)
msg211648 - (view) Author: Rob Gaddi (rgaddi) Date: 2014-02-19 20:10
I was just working on similar things, and found the same problem.  I can confirm failure on both Python 2.7.4 and Python 3.3.1 running on 64-bit Linux, and that the Windows builds do not have this problem.

My code:

from __future__ import print_function
from ctypes import *
from itertools import product

bases = (BigEndianStructure, LittleEndianStructure)
packs = (True, False)
basetypes = ( (c_uint,16), (c_ushort,16), (c_uint,32) )

print("Base                     Basetype  pack  high  low   size  bytes")
for basetype, base, pack in product(basetypes, bases, packs):
    fields = [
        ('high', basetype[0], basetype[1]),
        ('low', basetype[0], basetype[1]),
    ]
    cls = type('', (base,), {'_pack_' : pack, '_fields_' : fields})
    
    x = cls(high = 0x1234, low = 0x5678)
    
    bacls = c_uint8 * sizeof(x)
    ba = bacls.from_buffer(x)
    s = ''.join('{0:02X}'.format(b) for b in ba)
    
    k = '*' if (x.high != 0x1234 or x.low != 0x5678) else ''
    
    report = "{name:25s}{basetype:10s}{pack:4d}  {high:04X}  {low:04X}  {size:4d}  {s}{k}".format(
        name = base.__name__,
        high = x.high,
        low = x.low,
        size = sizeof(x),
        pack = pack,
        basetype = basetype[0].__name__,
        s = s,
        k = k
    )
    print(report)
        
My results:
Base                     Basetype  pack  high  low   size  bytes
BigEndianStructure       c_uint       1  0000  5678     4  00005678*
BigEndianStructure       c_uint       0  0000  5678     4  00005678*
Structure                c_uint       1  1234  5678     4  34127856
Structure                c_uint       0  1234  5678     4  34127856
BigEndianStructure       c_ushort     1  1234  5678     4  12345678
BigEndianStructure       c_ushort     0  1234  5678     4  12345678
Structure                c_ushort     1  1234  5678     4  34127856
Structure                c_ushort     0  1234  5678     4  34127856
BigEndianStructure       c_uint       1  1234  5678     8  0000123400005678
BigEndianStructure       c_uint       0  1234  5678     8  0000123400005678
Structure                c_uint       1  1234  5678     8  3412000078560000
Structure                c_uint       0  1234  5678     8  3412000078560000

On python3, the BigEndianStructure seemingly at random will set the high or low fields from one execution to the next, but always misses one or the other.  I have always seen high = 0, low = 0x5678 on python2.
msg250491 - (view) Author: Brian Trotter (Brian Trotter) Date: 2015-09-11 19:18
I am experiencing the same bug with c_uint32 bitfields inside BigEndianStructure in Python 3.4.0 on Ubuntu 14.04.3 x64. No problem in Windows 7 x64. As shown in the example below, the fourth byte is the only one that is written correctly. This is a rather significant error.


Source:

import ctypes

class BitFieldsBE(ctypes.BigEndianStructure):
    _pack_ = 1
    _fields_ = [
        ('a', ctypes.c_uint32, 8),
        ('b', ctypes.c_uint32, 8),
        ('c', ctypes.c_uint32, 8),
        ('d', ctypes.c_uint32, 8)]

class BitFieldsLE(ctypes.LittleEndianStructure):
    _pack_ = 1
    _fields_ = [
        ('a', ctypes.c_uint32, 8),
        ('b', ctypes.c_uint32, 8),
        ('c', ctypes.c_uint32, 8),
        ('d', ctypes.c_uint32, 8)]

be = BitFieldsBE()
le = BitFieldsLE()

def prints(arg):
    print(arg)
    print('be',bytes(be))
    print('le',bytes(le))

prints('00000000')
be.a = 0xba; be.b = 0xbe; be.c = 0xfa; be.d = 0xce
le.a = 0xba; le.b = 0xbe; le.c = 0xfa; le.d = 0xce
prints('babeface')
be.a = 0xde; be.b = 0xad; be.c = 0xbe; be.d = 0xef
le.a = 0xde; le.b = 0xad; le.c = 0xbe; le.d = 0xef
prints('deadbeef')


Output:

00000000
be b'\x00\x00\x00\x00'
le b'\x00\x00\x00\x00'
babeface
be b'\x00\xfa\x00\xce'
le b'\xba\xbe\xfa\xce'
deadbeef
be b'\x00\xbe\x00\xef'
le b'\xde\xad\xbe\xef'
msg280371 - (view) Author: Michael Rolle (mrolle) Date: 2016-11-09 02:28
Similar problem with 2.7.8 with cygwin.

My example is:

Python 2.7.8 (default, Jul 25 2014, 14:04:36)
[GCC 4.8.3] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from ctypes import *
>>> class C (BigEndianStructure): _fields_ = (('rva', c_uint, 31), ('fl', c_uint, 1))
...
>>> buffer(x)[:]; x.rva; x.fl
'\x00\x00\x00\x00'
0L
0L
>>> x.rva = 256
>>> buffer(x)[:]; x.rva; x.fl
'\x00\x00\x02\x00'
256L
0L
>>> x.rva = 256
>>> buffer(x)[:]; x.rva; x.fl
'\x00\x00\x02\x00'
256L
0L
>>> x.fl = 1
>>> buffer(x)[:]; x.rva; x.fl
'\x00\x02\x00\x01'
65536L
1L
>>> x.fl = 1
>>> buffer(x)[:]; x.rva; x.fl
'\x01\x00\x02\x01'
8388864L
1L
>>> x.fl = 1
>>> buffer(x)[:]; x.rva; x.fl
'\x01\x02\x00\x01'
8454144L
1L
>>> x.fl = 1
>>> buffer(x)[:]; x.rva; x.fl
'\x01\x00\x02\x01'
8388864L
1L
>>> x.rva = 256
>>> buffer(x)[:]; x.rva; x.fl
'\x00\x00\x02\x01'
256L
1L
>>> x.rva = 256
>>> buffer(x)[:]; x.rva; x.fl
'\x00\x00\x02\x00'
256L
0L
>>> x.rva = 256
>>> buffer(x)[:]; x.rva; x.fl
'\x00\x00\x02\x00'
256L
0L

I'm disappointed that this bug hasn't been fixed after two years!

I understand that ctypes might not be portable across different
platforms.  However, the above behavior is clearly wrong.

BTW, I also have Python 2.7.8 (default, Jun 30 2014, 16:08:48) [MSC v.1500 64 bit (AMD64)] on win32, and this version works fine.
msg280374 - (view) Author: Michael Rolle (mrolle) Date: 2016-11-09 03:43
As a separate issue, I'd like to find an appropriate package,
other than ctypes, for interpreting data bytes in a consistently
defined manner, independent of the platform I'm running on.
The struct package is perfect where there are no bitfields
involved, i.e., where each item occupies whole bytes.  But
it doesn't support packing/unpacking bitfields.

Actually, ctypes could fit the bill if you specified that bitfields
be allocated from MSB to LSB for BigEndianStructure, and from LSB
to MSB for LittleEndianStructure.  This way, for instance, it wouldn't matter if a sequence of 4-bit fields were based on c_ubyte
or c_ushort, etc.  Each pair of fields would be allocated to the
next consecutive byte.  And if the platform native compiler for some strange reason doesn't follow either of these rules, then Structure
would follow the platform compiler.

differs from both Big and Little
msg280377 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-11-09 07:17
The bug is fixed in #23319. More recent Py2.7 and Py3.4+ should get rid of it.
History
Date User Action Args
2016-11-09 07:17:33xiang.zhangsetstatus: open -> closed

nosy: + xiang.zhang
messages: + msg280377

resolution: fixed
stage: resolved
2016-11-09 03:43:18mrollesetmessages: + msg280374
2016-11-09 02:28:40mrollesetnosy: + mrolle
messages: + msg280371
2015-09-11 19:18:43Brian Trottersetversions: + Python 3.4
nosy: + Brian Trotter

messages: + msg250491

type: behavior
2014-02-19 20:10:02rgaddisetnosy: + rgaddi

messages: + msg211648
versions: + Python 3.3
2014-02-14 21:45:41yselivanovsetnosy: + amaury.forgeotdarc, belopolsky
2014-02-14 21:43:22Alan.Ningcreate