classification
Title: struct.pack(): trailing padding bytes on x64
Type: behavior Stage:
Components: Extension Modules Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Allan Haldane, Eric.Wieser, mark.dickinson, martin.panter, meador.inge, skrah
Priority: normal Keywords:

Created on 2016-04-13 08:38 by skrah, last changed 2019-04-21 07:32 by skrah.

Messages (6)
msg263315 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2016-04-13 08:38
On the x64 architecture gcc adds trailing padding bytes after the last
struct member.  NumPy does the same:

>>> import numpy as np
>>> 
>>> t = np.dtype([('x', 'u1'), ('y', 'u8'), ('z', 'u1')], align=True)
>>> x = np.array([(1, 2, 3)], dtype=t)
>>> x.tostring()
b'\x01\xf7\xba\xab\x03\x7f\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'


The struct module in native mode does not:

>>> struct.pack("BQB", 1, 2, 3)
b'\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03'


I'm not sure if this is intended -- or if full compatibility to
native compilers is even achievable in the general case.
msg263322 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-13 10:21
This behaviour seems to be documented, although it is not very explicit, and a bit surprising to me. See the third note at the end of <https://docs.python.org/3/library/struct.html#byte-order-size-and-alignment>: “align the end . . . with a repeat count of zero”, and the example

>>> pack('llh0l', 1, 2, 3)
b'\x00\x00\x00\x01\x00\x00\x00\x02\x00\x03\x00\x00'
msg263328 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2016-04-13 11:37
Thank you.  So technically, in the above NumPy example the format
string generated by NumPy would be considered incomplete if we assume
struct syntax:

>>> m = memoryview(x)
>>> m.format
'T{B:x:xxxxxxxL:y:B:z:}'


I find this "0L" thing a very odd notation. Taking care of this
manually requires a) knowledge of what the compiler does and b)
searching for the largest struct member.
msg265213 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2016-05-10 00:56
I'm not to crazy about the trailing padding syntax either.  The behavior is documented all the way back to Python 2.6.  So, I would be hesitant to change it now.

If the new 'T{...}' struct syntax from issue3132 ever gets added, then maybe we could address this there?

FWIW, internal and trailing padding is implementation defined by the C standard.  That being said, most compilers I have worked with add the trailing padding.
msg269887 - (view) Author: Allan Haldane (Allan Haldane) Date: 2016-07-06 16:00
Hello,

Over at numpy I have a proposed fix for the bug you discovered, that numpy drops trailing padding in the 3118 format string. My strategy is going to make numpy interpret format strings exactly the same way as the struct module, let me know if you disagree.

See https://github.com/numpy/numpy/pull/7798
msg309085 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2017-12-27 14:27
I have just worked on PEP-3118 ==> Datashape translation and I have
encountered many issues similar to the ones in the PR referenced by
Allan.

It seems to me that we should simplify the PEP-3118 struct syntax as much 
as possible in order to remove any ambiguities.


I think generally that numpy's approach is the best for data interchange, so I
would propose this modified struct syntax for PEP-3118:


1) Padding is always explicit and exact, also for natively aligned types.

2) Padding is only allowed in struct fields.

3) Trailing padding is explicit.

4) If no padding is present in a struct, it is assumed to be packed with
alignment 1 for the entire struct.

5) The tuple syntax "bxL" is not supported, only the T{} syntax with
explicit field names.

6) Repetition "10s" is only allowed for bytes. "10f" is a tuple (not
supported), an array of 10 floats would be (10)f.

7) Modifiers (@, =, <, >, !) are only given for primitive data types,
not for entire structs or types.

8) Implementations are free to reject any padding that would not arise
naturally by specifying alignment or packing constraints (like gcc does 
with attributes).


Here is my implementation with a grammar:

  https://github.com/plures/ndtypes/blob/master/libndtypes/compat/bpgrammar.y

Some tests against numpy:

  https://github.com/plures/xnd/blob/master/python/test_xnd.py#L1509


I think the best way forward would be to tweak the above grammar so that
it covers everything that numpy can export.
History
Date User Action Args
2019-04-21 07:32:27skrahsetnosy: + Eric.Wieser
2017-12-27 14:27:23skrahsetversions: + Python 3.7, - Python 3.6
2017-12-27 14:27:08skrahsetmessages: + msg309085
2016-07-06 16:00:47Allan Haldanesetnosy: + Allan Haldane
messages: + msg269887
2016-05-10 00:56:25meador.ingesetnosy: + meador.inge
messages: + msg265213
2016-04-13 11:37:19skrahsetmessages: + msg263328
2016-04-13 10:21:19martin.pantersetnosy: + martin.panter
messages: + msg263322
2016-04-13 08:38:49skrahcreate