Issue 26746: struct.pack(): trailing padding bytes on x64

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/70933

classification

Title:	struct.pack(): trailing padding bytes on x64
Type:	behavior	Stage:
Components:	Extension Modules	Versions:	Python 3.7

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	Allan Haldane, Eric.Wieser, mark.dickinson, martin.panter, meador.inge, skrah
Priority:	normal	Keywords:

Created on 2016-04-13 08:38 by skrah, last changed 2022-04-11 14:58 by admin.

Messages (6)
msg263315 - (view)	Author: Stefan Krah (skrah) *	Date: 2016-04-13 08:38
On the x64 architecture gcc adds trailing padding bytes after the last struct member. NumPy does the same: >>> import numpy as np >>> >>> t = np.dtype([('x', 'u1'), ('y', 'u8'), ('z', 'u1')], align=True) >>> x = np.array([(1, 2, 3)], dtype=t) >>> x.tostring() b'\x01\xf7\xba\xab\x03\x7f\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00' The struct module in native mode does not: >>> struct.pack("BQB", 1, 2, 3) b'\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03' I'm not sure if this is intended -- or if full compatibility to native compilers is even achievable in the general case.
msg263322 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-04-13 10:21
This behaviour seems to be documented, although it is not very explicit, and a bit surprising to me. See the third note at the end of <https://docs.python.org/3/library/struct.html#byte-order-size-and-alignment>: “align the end . . . with a repeat count of zero”, and the example >>> pack('llh0l', 1, 2, 3) b'\x00\x00\x00\x01\x00\x00\x00\x02\x00\x03\x00\x00'
msg263328 - (view)	Author: Stefan Krah (skrah) *	Date: 2016-04-13 11:37
Thank you. So technically, in the above NumPy example the format string generated by NumPy would be considered incomplete if we assume struct syntax: >>> m = memoryview(x) >>> m.format 'T{B:x:xxxxxxxL:y:B:z:}' I find this "0L" thing a very odd notation. Taking care of this manually requires a) knowledge of what the compiler does and b) searching for the largest struct member.
msg265213 - (view)	Author: Meador Inge (meador.inge) *	Date: 2016-05-10 00:56
I'm not to crazy about the trailing padding syntax either. The behavior is documented all the way back to Python 2.6. So, I would be hesitant to change it now. If the new 'T{...}' struct syntax from issue3132 ever gets added, then maybe we could address this there? FWIW, internal and trailing padding is implementation defined by the C standard. That being said, most compilers I have worked with add the trailing padding.
msg269887 - (view)	Author: Allan Haldane (Allan Haldane)	Date: 2016-07-06 16:00
Hello, Over at numpy I have a proposed fix for the bug you discovered, that numpy drops trailing padding in the 3118 format string. My strategy is going to make numpy interpret format strings exactly the same way as the struct module, let me know if you disagree. See https://github.com/numpy/numpy/pull/7798
msg309085 - (view)	Author: Stefan Krah (skrah) *	Date: 2017-12-27 14:27
I have just worked on PEP-3118 ==> Datashape translation and I have encountered many issues similar to the ones in the PR referenced by Allan. It seems to me that we should simplify the PEP-3118 struct syntax as much as possible in order to remove any ambiguities. I think generally that numpy's approach is the best for data interchange, so I would propose this modified struct syntax for PEP-3118: 1) Padding is always explicit and exact, also for natively aligned types. 2) Padding is only allowed in struct fields. 3) Trailing padding is explicit. 4) If no padding is present in a struct, it is assumed to be packed with alignment 1 for the entire struct. 5) The tuple syntax "bxL" is not supported, only the T{} syntax with explicit field names. 6) Repetition "10s" is only allowed for bytes. "10f" is a tuple (not supported), an array of 10 floats would be (10)f. 7) Modifiers (@, =, <, >, !) are only given for primitive data types, not for entire structs or types. 8) Implementations are free to reject any padding that would not arise naturally by specifying alignment or packing constraints (like gcc does with attributes). Here is my implementation with a grammar: https://github.com/plures/ndtypes/blob/master/libndtypes/compat/bpgrammar.y Some tests against numpy: https://github.com/plures/xnd/blob/master/python/test_xnd.py#L1509 I think the best way forward would be to tweak the above grammar so that it covers everything that numpy can export.

History
Date	User	Action	Args
2022-04-11 14:58:29	admin	set	github: 70933
2019-04-21 07:32:27	skrah	set	nosy: + Eric.Wieser
2017-12-27 14:27:23	skrah	set	versions: + Python 3.7, - Python 3.6
2017-12-27 14:27:08	skrah	set	messages: + msg309085
2016-07-06 16:00:47	Allan Haldane	set	nosy: + Allan Haldane messages: + msg269887
2016-05-10 00:56:25	meador.inge	set	nosy: + meador.inge messages: + msg265213
2016-04-13 11:37:19	skrah	set	messages: + msg263328
2016-04-13 10:21:19	martin.panter	set	nosy: + martin.panter messages: + msg263322
2016-04-13 08:38:49	skrah	create