Issue 3132: implement PEP 3118 struct changes

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/47382

classification

Title:	implement PEP 3118 struct changes
Type:	enhancement	Stage:	patch review
Components:	Library (Lib)	Versions:	Python 3.6

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	Arfrever, MrJean1, ajaksu2, barry, belopolsky, benjamin.peterson, inducer, mark.dickinson, martin.panter, meador.inge, ncoghlan, noufal, paulehoffman, pitrou, pv, skrah, teoliphant
Priority:	high	Keywords:	patch

Created on 2008-06-17 22:30 by benjamin.peterson, last changed 2022-04-11 14:56 by admin.

Files
File name	Uploaded	Description	Edit
pep-3118.patch	meador.inge, 2010-02-17 03:30
struct-string.py3k.patch	meador.inge, 2010-05-18 04:07	Patch for 'T{}' syntax and multiple byte order specifiers.
struct-string.py3k.2.patch	meador.inge, 2010-05-20 13:50	Patch with fixed assertions
struct-string.py3k.3.patch	meador.inge, 2011-01-07 03:59	Patch for 'T{}' against py3k r87813	review
grammar.y	skrah, 2016-04-13 10:20

Messages (58)
msg68347 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2008-06-17 22:30
It seems the new modifiers to the struct.unpack/pack module that were proposed in PEP 3118 haven't been implemented yet.
msg68507 - (view)	Author: Jean Brouwers (MrJean1)	Date: 2008-06-21 15:59
If the struct changes are made, add also 2 formats for C types ssize_t and size_t, perhaps 'z' resp. 'Z'. In particular since on platforms sizeof(size_t) != sizeof(long).
msg71313 - (view)	Author: Barry A. Warsaw (barry) *	Date: 2008-08-18 03:00
It's looking pessimistic that this is going to make it by beta 3. If they can't get in by then, it's too late.
msg71316 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-08-18 09:36
Let's retarget it to 3.1 then. It's a new feature, not a behaviour change or a deprecation, so adding it to 3.0 isn't a necessity.
msg71338 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2008-08-18 15:02
Actually, this may be a requirement of #2394; PEP 3118 states that memoryview.tolist would use the struct module to do the unpacking.
msg71342 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-08-18 15:51
> Actually, this may be a requirement of #2394; PEP 3118 states that > memoryview.tolist would use the struct module to do the unpacking. :-( However, we don't have any examples of the buffer API / memoryview object working with something else than 1-dimensional contiguous char arrays (e.g. bytearray). Therefore, I suggest that Python 3.0 provide official support only for 1-dimensional contiguous char arrays. Then tolist() will be easy to implement even without using the struct module (just a list of integers, if I understand the functionality).
msg71882 - (view)	Author: Travis Oliphant (teoliphant) *	Date: 2008-08-24 21:38
This can be re-targeted to 3.1 as described.
msg87921 - (view)	Author: Daniel Diniz (ajaksu2) *	Date: 2009-05-16 20:33
Travis, Do you think you can contribute for this to actually land in 3.2? Having a critical issue slipping from 3.0 to 3.3 would be bad... Does this supersede issue 2395 or is this a subset of that one.?
msg99296 - (view)	Author: Meador Inge (meador.inge) *	Date: 2010-02-13 01:28
Is anyone working on implementing these new struct modifiers? If not, then I would love to take a shot at it.
msg99297 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2010-02-13 01:35
2010/2/12 Meador Inge <report@bugs.python.org>: > > Meador Inge <meadori@gmail.com> added the comment: > > Is anyone working on implementing these new struct modifiers? If not, then I would love to take a shot at it. Not to my knowledge.
msg99309 - (view)	Author: Travis Oliphant (teoliphant) *	Date: 2010-02-13 06:06
On Feb 12, 2010, at 7:29 PM, Meador Inge wrote: > > Meador Inge <meadori@gmail.com> added the comment: > > Is anyone working on implementing these new struct modifiers? If > not, then I would love to take a shot at it. That would be great. -Travis
msg99312 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-02-13 11:07
Some of the proposed struct module additions look far from straightforward; I find that section of the PEP significantly lacking in details and motivation. "Unpacking a long-double will return a decimal object or a ctypes long-double." Returning a Decimal object here doesn't make a lot of sense, since Decimal objects aren't generally compatible with floats. And ctypes long double objects don't seem to exist, as far as I can tell. It might be better not to add this code. Another bit that's not clear to me: how is unpacking an object pointer expected to work, and how would it typically be used? What if the unpacked pointer no longer points to a valid Python object? How would this work in other Python implementations? For the 'X{}' format (pointer to a function), is this supposed to mean a Python function or a C function? What's a 'specific pointer'?
msg99313 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-02-13 12:01
Whoops. ctypes does have long double, of course. Apologies.
msg99460 - (view)	Author: Meador Inge (meador.inge) *	Date: 2010-02-17 03:30
Hi All, On Sat, Feb 13, 2010 at 5:07 AM, Mark Dickinson <report@bugs.python.org>wrote: > > Mark Dickinson <dickinsm@gmail.com> added the comment: > > Some of the proposed struct module additions look far from straightforward; > I find that section of the PEP significantly lacking in details and > motivation. > I agree. > "Unpacking a long-double will return a decimal object or a ctypes > long-double." > > Returning a Decimal object here doesn't make a lot of sense, since Decimal > objects aren't generally compatible with floats. And ctypes long double > objects don't seem to exist, as far as I can tell. It might be better not > to add this code. And under what conditions would a ctype long double be used vs. a Decimal object. > Another bit that's not clear to me: how is unpacking an object pointer > expected to work, and how would it typically be used? What if the unpacked > pointer no longer points to a valid Python object? How would this work in > other Python implementations? > I guess if an object associated with the packed address does not exist, then you would unpack None (?). This is especially a problem if the struct-sting is being sent over the wire to another machine. > For the 'X{}' format (pointer to a function), is this supposed to mean a > Python function or a C function? > I read that as a Python function. However, I am not completely sure how the prototype would be enforced when unpacking. I am also wondering, though, how the signatures on pointers-to-functions are specified? Are the arguments and return type full struct strings as well? > What's a 'specific pointer'? I think this means a pointer to a specific type, e.g. '&d' is a pointer to a double. If this is the case, though, the use cases are not completely clear to me. I also have the following questions: * Can pointers be nested, '&&d' ? * What nesting level can structures have? Arbitrary? * The new array syntax claims "multi-dimensional array of whatever follows". Truly whatever? Arrays of structures? Arrays of pointers? * "complex (whatever the next specifier is)". Not really 'whatever'. You can not have a 'complex bool' or 'complex int'. What other types of complex are there besides complex double? * How do array specifiers and pointer specifiers mix? For example, would '(2, 2)&d' be a two-by-two array of pointers to doubles? What about '&(2, 2)d'? Is this a pointer to an two-by-two array of doubles? The new features of the struct-string syntax are so different that I think we need to specify a grammar. I think it will clarify some of the open questions. In addition, I was thinking that a reasonable implemenation strategy would be to keep the current struct-string syntax mostly in place within the C module implementation. The C implementation would just provide an interface to pack\unpack sequences of primitive data elements. Then we could write a layer in the Python 'struct' module that took care of the higher-order concepts like nested structures, arrays, named values, and pointers to functions. The higher-order concepts would be mapped to the appropriate primitive sequence strings. I think this will simplify the implementation and will provide a way to phase it. We can implement the primitive type extensions in C first followed by the higher-level Python stuff. The result of each phase is immediately usuable. I have attached a patch against the PEP containing my current thoughts on fleshing out the grammar and some of the current open questions. This still needs work, but I wanted to share to see if I am on the right track. Please advise on how to proceed.
msg99472 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-02-17 15:12
> And under what conditions would a ctype long double be used vs. a > Decimal object. Well, I'm guessing that this was really just an open question for the PEP, and that the PEP authors hadn't decided which of these two options was more appropriate. If all long doubles were converted to Decimal, then we need to determine what precision is appropriate to use for the conversion: any long double can be represented exactly as a Decimal, but to get an exact representation can need thousands of digits in some cases, so it's probably better to always round to some fixed number of signficant digits. 36 significant digits is a reasonable choice here: it's the minimum number of digits that's guaranteed to distinguish two distinct long doubles, for the case where a long double has 113 bits of precision (i.e., IEEE 754 binary128 format); other common long double formats have smaller precision than this (usually 53 (normal double), 64 (x87 extended doubles), or 106 (double double)). There would probably also need to be some way to 'repack' the Decimal instance. The 'platform long double -> Decimal' conversion itself would also be nontrivial to implement; I can lend a hand here if you want it. Using ctypes makes more sense to me, since it doesn't involve trying to mix decimal and binary, except that I don't know whether it's acceptable for other standard library modules to have dependencies on ctypes. I'm not sure whether ctypes is available on all platforms that Python runs on. It's also a bit ugly that, depending on the platform, sometimes a long double will unpack to an instance of ctypes.long_double, and sometimes (when long double == double) to a regular Python float. Anyway, this particular case (long double) isn't a big deal: it can be overcome, one way or another. I'm more worried about some of the other aspects of the changes. [About unpacking with the 'O' format.] > I guess if an object associated with the packed address does not > exist, then you would unpack None (?). This is especially a problem > if the struct-sting is being sent over the wire to another machine. And how do you determine whether an address gives a valid object or not? I can only assume that packing and unpacking with the 'O' format is only supposed to be used in certain restricted circumstances, but it's not clear to me what those circumstances are. > I also have the following questions: [...] I think a lot of this discussion needs to go back to python-dev; with luck, we can get some advice and clarifications from the PEP authors there. I'm not sure whether it's appropriate to modify the original PEP (especially since it's already accepted), or whether it would be better to produce a separate document describing the proposed changes in detail.
msg99474 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-02-17 15:18
I'm looking for previous discussions of this PEP. There's a python-dev thread in April 2007: http://mail.python.org/pipermail/python-dev/2007-April/072537.html Are there other discussions that I'm missing?
msg99551 - (view)	Author: Meador Inge (meador.inge) *	Date: 2010-02-19 01:02
Mark, > I think a lot of this discussion needs to go back to python-dev; with > luck, we can get some advice and clarifications from the PEP authors > there. So the next step is to kick off a thread on python-dev summarizing the questions\problems we have come up with? I can get that started. > Are there other discussions that I'm missing? I did a quick search and came up with the same.
msg99655 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-02-21 13:06
Closed issue 2395 as a duplicate of this one.
msg99656 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-02-21 13:09
[Meador Inge] > So the next step is to kick off a thread on python-dev summarizing the > questions\problems we have come up with? I can get that started. Sounds good. I'd really like to see some examples of how these struct-module additions would be used in real life.
msg99677 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-02-21 19:23
About long doubles again: I just encountered someone on the #python IRC channel who wanted to know whether struct.pack and struct.unpack supported reading and writing of x87 80-bit long doubles (padded to 12 bytes each in the input). A few quotes from him/her, with permission (responses from others, including me, edited out; I can supply a fuller transcript if necessary, but I hope what's below isn't misleading). [18:39] bdesk: Hi, is struct.pack able to handle 80-bit x86 extended floats? [18:40] bdesk: How can I read and write these 80-bit floats, in binary, using python? [18:44] bdesk: dickinsm: I have a C program that uses binary files as input and output, and I want to deal with these files using python if possible. [18:49] bdesk: I don't need to do arithmetic with the full 80 bits of precision within the python program, although It would be better if I could. [18:50] bdesk: I would need to use the float in a more semantically useful manner than treating it as a black box of 12 bytes. [18:55] bdesk: Until python gets higher precision floats, my preferred interface would be to lose some precision when unpacking the floats. The main thing that I realized from this is that unpacking as a ctypes long double isn't all that useful for someone who wants to be able to do arithmetic on the unpacked result. And if you don't want to do arithmetic on the unpacked result, then you're probably just shuffling the bytes around without caring about their meaning, so there's no need to unpack as anything other than a sequence of 12 bytes. On the other hand, I suppose it's enough to be able to unpack as a ctypes c_longdouble and then convert to a Python float (losing precision) for the arithmetic. Alternatively, we might consider simply unpacking a long double directly into a Python float (and accepting the loss of precision); that seems to be what would be most useful for the use-case above.
msg99711 - (view)	Author: Meador Inge (meador.inge) *	Date: 2010-02-22 04:13
> The main thing that I realized from this is that unpacking as a ctypes long > double isn't all that useful for someone who wants to be able to do arithmetic > on the unpacked result. I agree. Especially since ctypes 'long double' maps to a Python float and '.value' would have to be referenced on the ctype 'long double' instance for doing arithmetic. > And if you don't want to do arithmetic on the unpacked result, then you're > probably just shuffling the bytes around without caring about their meaning, > so there's no need to unpack as anything other than a sequence of 12 bytes. One benefit of having a type code for 'long double' (assuming you are mapping the value to the platform's 'long double') is that the you don't have to know how many bytes are in the underlying representation. As you know, it isn't always just 12 bytes. It depends on the architecture and ABI being used. Which from a quick sample, I am seeing can be anywhere from 8 to 16 bytes: =========================================== \| Compiler \| Arch \| Bytes \| =========================================== \| VC++ 8.0 \| x86 \| 8 \| \| VC++ 9.0 \| x86 \| 8 \| \| GCC 4.2.4 \| x86 \| 12 (default), 16 \| \| GCC 4.2.4 \| x86-64 \| 12, 16 (default) \| \| GCC 4.2.4 \| PPC IBM \| 16 \| \| GCC 4.2.4 \| PPC IEEE \| 16 \| =========================================== > On the other hand, I suppose it's enough to be able to unpack as a ctypes > c_longdouble and then convert to a Python float (losing precision) for the > arithmetic. Alternatively, we might consider simply unpacking a long double > directly into a Python float (and accepting the loss of precision); I guess that would be acceptable. The only thing that I don't like is that since the transformation is lossy, you can't round trip: # this will not hold pack('g', unpack('g', byte_str)[0]) == byte_str > that seems to be what would be most useful for the use-case above. Which use case? From the given IRC trace it seems that 'bdesk' was mainly concerned with (1) pushing bytes around, but (2) thought it "it would be better" to be able to do arithmetic and thought it would be more useful if it were not a "black box of 12 bytes". For use case (1) the loss of precision would probably not be acceptable, due to the round trip issue mentioned above. So using ctypes 'long double' is easier to implement, but is lossy and clunky for arithmetic. Using Python 'float' is easy to implement and easy for arithmetic, but is lossy. Using Decimal is non-lossy and easy for arithmetic, but the implementation would be non-trivial and architecture specific (unless we just picked a fixed number of bytes regardless of the architecture).
msg99771 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-02-22 16:14
> One benefit of having a type code for 'long double' (assuming you are > mapping the value to the platform's 'long double') is that the you > don't have to know how many bytes are in the underlying representation. Agreed: it's nice to have struct.pack already know your machine. Actually, this brings up (yet) another open question: native packing/unpacking of a long double would presumably return something corresponding to the platform long double, as above; but non-native packing/unpacking should do something standard, instead, for the sake of interoperability between platforms. Currently, I believe that packing a Python float always---even in native mode---packs in IEEE 754 format, even when the platform doubles aren't IEEE 754. For native packing/unpacking, I'm slowly becoming convinced that unpacking as a ctypes long double is the only thing that makes any sense, so that we keep round-tripping, as you point out. The user can easily enough extract the Python float for numerical work. I still don't like having the struct module depend on ctypes, though.
msg105952 - (view)	Author: Meador Inge (meador.inge) *	Date: 2010-05-18 04:07
Attached is a patch that implements part of the additions. More specifically, the 'T{}' syntax and the ability to place byte-order specifiers ('<', '>', '@', '^', '!", '=') anywhere in the struct string. The changes dictated by the PEP are so big that it is better to split things up into multiple patches. These two features will lay some ground work and are probably less controversial than the others. Surely some more tweaks will be needed, but I think what I have now is at least good enough for review. I tested on OS X 10.6 and Ubuntu 10.4. I also used valgrind and 'regrtest.py -R:' to check for memory and reference leaks, respectively.
msg105955 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-05-18 07:12
Thanks for this. Any chance you could upload the patch to Rietveld (http://codereview.appspot.com/) for ease of review?
msg105970 - (view)	Author: Meador Inge (meador.inge) *	Date: 2010-05-18 12:22
Sure - http://codereview.appspot.com/1258041
msg106087 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-05-19 19:27
Thanks for the Rietveld upload. I haven't had a chance to review this properly yet, but hope to do so within the next few days. One question: the production list you added to the docs says: format_string: (`byte_order_specifier`? `type_string`)* This suggests that format strings like '<' and '<>b' are invalid; is that correct, or should the production list be something like: format_string: (`byte_order_specifier` \| `type_string`)* ? Whether these cases are valid or not (personally, I think they should be), we should add some tests for them. '<' is currently valid, I believe. The possibility of mixing native size/alignment with standard size/alignment in a single format string makes me a bit uneasy, but I can't see any actual problems that might arise from it (equally, I can't imagine why anyone would want to do it). I wondered briefly whether padding has clear semantics when a '@' appears in the middle of a format string, but I can't see why it wouldn't have.
msg106088 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-05-19 19:38
Travis, this issue is still assigned to you. Do you plan to work on this at some stage, or may I unassign you?
msg106089 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-05-19 19:46
Hmm. Something's not quite right: the _struct module fails to compile for me with this patch. I get: /Users/dickinsm/python/svn/py3k/Modules/_struct.c: In function ‘s_unpack’: /Users/dickinsm/python/svn/py3k/Modules/_struct.c:1730: error: ‘PyStructObject’ has no member named ‘s_codes’ /Users/dickinsm/python/svn/py3k/Modules/_struct.c: In function ‘s_unpack_from’: /Users/dickinsm/python/svn/py3k/Modules/_struct.c:1765: error: ‘PyStructObject’ has no member named ‘s_codes’ The offending lines both look like: assert(soself->s_codes != NULL); presumably that should be: assert(soself->s_tree->s_codes != NULL); After making that change, and successfully rebuilding, this assert triggers: test_705836 (__main__.StructTest) ... Assertion failed: (soself->s_tree->s_codes != NULL), function s_unpack, file /Users/dickinsm/python/svn/py3k/Modules/_struct.c, line 1730.
msg106090 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-05-19 19:51
Ah, it should have been: assert(soself->s_tree != NULL); Got it now. All tests pass. :)
msg106091 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-05-19 19:56
One more design question. For something like: '<HT{>H}H', what endianness should be used when packing/unpacking the last 'H'? Should the switch to '>' within the embedded struct be regarded as local to the struct? With your patch, I get: >>> pack('<HT{>H}H', 1, (2,), 3) b'\x01\x00\x00\x02\x00\x03'
msg106153 - (view)	Author: Meador Inge (meador.inge) *	Date: 2010-05-20 13:50
> is that correct, or should the production list be something like: Yup, you are right. I will change the grammar. > Whether these cases are valid or not (personally, I think they should > be), we should add some tests for them. '<' is currently valid, I > believe. I agree, they should be valid. I will add more test cases. > The possibility of mixing native size/alignment with standard > size/alignment in a single format string makes me a bit uneasy I agree. It is hard for me to see how this might be used. In any case, the relevant part of the PEP that I was following is: "Endian-specification ('!', '@','=','>','<', '^') is also allowed inside the string so that it can change if needed. The previously-specified endian string is in force until changed. The default endian is '@' which means native data-types and alignment. If un-aligned, native data-types are requested, then the endian specification is '^'." However, I am not quite sure how to interpret the last sentence. > Should the switch to '>' within the embedded struct be regarded as > local to the struct? No, there is no notion of scope here. A given specifier is active until the next one is found. > Ah, it should have been: > > assert(soself->s_tree != NULL); D'oh! I missed that when I merge over to py3k -- I started this work on trunk. Thanks.
msg106155 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-05-20 14:17
> > The possibility of mixing native size/alignment with standard > > size/alignment in a single format string makes me a bit uneasy > > I agree. It is hard for me to see how this might be used. Without having anything more constructive to add, I also agree with this gut feeling. Perhaps not all of the PEP needs implementing; we can just add what is genuinely useful.
msg106157 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-05-20 14:26
Thanks for the new patch. > "... If un-aligned, native data-types are requested, then the > endian specification is '^'." > > However, I am not quite sure how to interpret the last sentence. Hmm. Seems like the PEP authors are proposing a new byteorder/alignment/size specifier here: '^' = native byte-order + native size + no alignment. I missed this before. >> Should the switch to '>' within the embedded struct be regarded as >> local to the struct? >No, there is no notion of scope here. A given specifier is active >until the next one is found. Okay. I wonder whether that's the most useful thing to do, though. As a separate issue, I notice that the new 'T{}' code doesn't respect multiplicities, e.g., as in 'H3T{HHL}'. Is that intentional/desirable? >>> struct.pack('H3T{HHL}', 1, (2, 3, 4)) b'\x01\x00\x02\x00\x03\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00' If we don't allow multiplicities, this should produce an exception, I think. If we do allow multiplicities (and I don't immediately see why we shouldn't), then we're going to have to be clear about how endianness behaves in something like: '>H3T{H<H}' So the first inner struct here would be treated as '{>H<H}'. Would the next two be identical to this, or would they be as though the whole thing were '>HT{H<H}T{H<H}T{H<H}', in which case the 2nd and 3rd substructs are both effectively '<H<H', while the first is '>H<H'.
msg106164 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-05-20 16:24
After a bit more thought (and after soliciting a couple of opinions on #python-dev), I'm convinced that endianness changes within a substruct should be local to that substruct: - it makes the meaning of '>2T{H<H}' both unsurprising and easy to understand: i.e., it would be interpreted exactly as '>T{H<H}T{H<H}', and both substructs would behave like '>H<H'. - I suspect it's the behaviour that people expect - it may make dynamic creation of struct format strings easier/less bug-prone. But now I've got a new open issue: how much padding should be inserted/expected (for pack/unpack respectively) between the 'B' and the 'T{...}' in a struct format string of the form 'BT{...}'? For this, I don't think we can hope to exactly comply with the platform ABI in all cases. But I think there's a simple rule that would work 99% of the time, and that does match the ABI on common platforms (though I'll check this), namely that the alignment requirement of the 'T{...}' struct should be the least common multiple of the alignment requirements of any of its elements. (Which will usually translate to the largest of the alignment requirements, since those alignments are usually all going to be powers of 2.) And this is where things get tricky if the alignment/byteorder/size specifier is changed midstream, since then it doesn't seem clear what alignments would contribute to the lcm above. I'm tempted to suggest that for native mode, changing the specifier be disallowed entirely. Travis, any comments on any of this?
msg106168 - (view)	Author: Meador Inge (meador.inge) *	Date: 2010-05-20 17:17
> As a separate issue, I notice that the new 'T{}' code doesn't respect > multiplicities, e.g., as in 'H3T{HHL}'. Is that > intentional/desirable? That could have been an oversight on my part. I don't see any immediate reason why we wouldn't allow it. > But now I've got a new open issue: how much padding should be > inserted/expected (for pack/unpack respectively) between the 'B' and > the 'T{...}' in a struct format string of the form 'BT{...}'? Doesn't that depend on what is in the '...'? For example, I would expect the same padding for 'BT{I}' and 'BI'. In general, I would expect the padding to be the same for 'x+T{y+}' and 'x+y+'. The 'T{...}'s are merely organizational, right? > I'm tempted to suggest that for native mode, changing the specifier be > disallowed entirely. I am tempted to suggest that we just go back to having one specifier at the beginning of the string :). Things seem to be getting complicate without any clear benefits.
msg106173 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-05-20 17:30
> For example, I would expect the same padding for 'BT{I}' and 'BI'. Granted, yes. But I wouldn't expect the same padding for 'BT{BI}' and 'BBI'. 'BT{BI}' should match a C struct which itself has an embedded struct. For C, I get the following results on my machine: #include <stdio.h> /* corresponds to 'T{BI}' / typedef struct { char y; int z; } A; / corresponds to 'BT{BI}' / typedef struct { char x; A yz; } B; / corresponds to 'BBI' / typedef struct { char x; char y; int z; } C; int main(void) { printf("sizeof(A) = %zu\n", sizeof(A)); printf("sizeof(B) = %zu\n", sizeof(B)); printf("sizeof(C) = %zu\n", sizeof(C)); return 0; } / Results on a (64-bit) OS X 10.6 machine: sizeof(A) = 8 sizeof(B) = 12 sizeof(C) = 8 */
msg106175 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-05-20 17:33
(whoops -- used unsigned struct codes to correspond to signed C types there; but it shouldn't make any difference to the sizes and padding). > I am tempted to suggest that we just go back to having one specifier at the beginning of the string :). Things seem to be getting complicate without any clear benefits. Agreed. Though if anyone following this issue wants to make the case that there are benefits to being able to change the endianness midway through a string, please do so!
msg106177 - (view)	Author: Meador Inge (meador.inge) *	Date: 2010-05-20 18:13
> Granted, yes. But I wouldn't expect the same padding for 'BT{BI}' and > 'BBI'. 'BT{BI}' should match a C struct which itself has an embedded > struct. For C, I get the following results on my machine: I wasn't sure. The C99 standard does not specify what the behavior should be. It is implementation defined. I guess most implementations just set the alignment of the struct with the alignment of its most demanding member. I need to change how the alignment for nested structures is computed. Right now alignments are being computed as if the 'T{...}' codes were not there. I will hold off until we decide what that rule should be, but I think the most demanding element rule seems reasonable.
msg106180 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-05-20 18:50
> The C99 standard does not specify what the behavior should be. Right; it's down to the platform ABI. I think the least common multiple of the alignment requirements of the struct members is the way to go, though. It's difficult to imagine an ABI for which this lcm isn't the same thing as the largest struct member alignment, but I don't want to categorically say that such ABIs don't exist. Here's a snippet from the gcc manual [1]: "Note that the alignment of any given struct or union type is required by the ISO C standard to be at least a perfect multiple of the lowest common multiple of the alignments of all of the members of the struct or union in question." I'm not sure I could identify the precise pieces of the standard that imply that requirement, though. [1] http://gcc.gnu.org/onlinedocs/gcc-4.5.0/gcc/Type-Attributes.html#Type-Attributes
msg106181 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-05-20 19:01
Another snippet, from the latest public draft of the System V x86-64 ABI [1]: """Structures and unions assume the alignment of their most strictly aligned compo- nent. Each member is assigned to the lowest available offset with the appropriate alignment. The size of any object is always a multiple of the object‘s alignment.""" I'd be fine with using the largest alignment, as above, instead of computing an lcm; I can't believe it'll ever make a difference in practice. For an empty struct (not allowed in C99, but allowed as a gcc extension, and allowed by the struct module), the alignment would be 1, of course. [1] http://www.x86-64.org/documentation/abi.pdf
msg106188 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-05-20 20:49
One more reference: http://msdn.microsoft.com/en-us/library/9dbwhz68(v=VS.80).aspx gives essentially the same rules for MSVC. "The alignment of the beginning of a structure or a union is the maximum alignment of any individual member. Each member within the structure or union must be placed at its proper alignment as defined in the previous table, which may require implicit internal padding, depending on the previous member."
msg106416 - (view)	Author: Travis Oliphant (teoliphant) *	Date: 2010-05-25 04:26
On May 19, 2010, at 2:38 PM, Mark Dickinson wrote: > > Mark Dickinson <dickinsm@gmail.com> added the comment: > > Travis, this issue is still assigned to you. Do you plan to work on this at some stage, or may I unassign you? > You may unassign it from me. Unfortunately, I don't have time anymore to work on it and I don't see that changing in the coming months. Thanks, -Travis
msg123093 - (view)	Author: Pauli Virtanen (pv) *	Date: 2010-12-02 18:14
For reference, Numpy's PEP 3118 implementation is here: http://github.com/numpy/numpy/blob/master/numpy/core/_internal.py#L357 http://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/buffer.c#L76 It would be a good idea to ensure that the numpy and struct implementations are in agreement about details of the format strings. (I wouldn't take the Numpy implementation as the definitive one, though.) - The "sub-structs" in Numpy arrays (in align=True mode) are aligned according to the maximum alignment of the fields. - I assumed the 'O' format in the PEP is supposed to be similar to Numpy object arrays. This implies some reference counting semantics. The Numpy PEP 3118 implementation assumes the memory contains borrowed references, valid at least until the buffer is released. Unpacking 'O' should probably INCREF whatever PyObject* pointer is there. - I assumed the alignment specifiers were unscoped. I'm not sure however whether this is the best thing to do. - The function pointers and pointers to pointers were not implemented. (Numpy cannot represent those as data types.)
msg123204 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-12-03 08:36
> For reference, Numpy's PEP 3118 implementation is here: Thanks for that, and the other information you give; that's helpful. It sounds like we're on the same page with respect to alignment of substructs. (Bar the mostly academic question of max versus lcm.) I still like the idea of scoped endianness markers in the substructs, but if we have to abandon that for compatibility with NumPy that's okay. - I assumed the 'O' format in the PEP is supposed to be similar to Numpy object arrays. This implies some reference counting semantics. The Numpy PEP 3118 implementation assumes the memory contains borrowed references, valid at least until the buffer is released. Unpacking 'O' should probably INCREF whatever PyObject* pointer is there. I'm still confused about how this could work: when unpacking, how do you know whether the PyObject* pointer points to a valid object or not? You can ensure that the pointer will always point to a valid object by having the pack operation increment reference counts, but then you need a way to automatically decref when the packed string goes out of scope. So the object returned by 'pack' would somehow have to be something other than a plain string, so that it can deal with automatically doing the DECREF of the held PyObject* pointers when it goes out of scope. What's the need to have the 'O' format in the struct module? Is it really necessary there? Can we get away with not implementing it?
msg123205 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-12-03 08:38
> For reference, Numpy's PEP 3118 implementation is here BTW, does this already exist in a released version of NumPy? If not, when is it likely to appear in the wild?
msg123226 - (view)	Author: Pauli Virtanen (pv) *	Date: 2010-12-03 10:33
> I still like the idea of scoped endianness markers in the substructs, > but if we have to abandon that for compatibility with NumPy that's > okay. That, or change the Numpy implementation. I don't believe there's yet much code in the wild that changes the alignment specifier on the fly. [clip: 'O' format code] > So the object returned by 'pack' would somehow > have to be something other than a plain string, so that it can deal > with automatically doing the DECREF of the held PyObject* pointers > when it goes out of scope. Yes, the packed object would need to own the references, and it would be the responsibility of the provider of the buffer to ensure that the pointers are valid. It seems that it's not possible for the `struct` module to correctly implement packing for the 'O' format. Unpacking could be possible, though (but then if you don't have packing, how write tests for it?). Another possibility is to implement the 'O' format unsafely and leave managing the reference counting to whoever uses the `struct` module's capabilities. (And maybe return ctypes pointers on unpacking.) [clip] > What's the need to have the 'O' format in the struct module? Is it > really necessary there? Can we get away with not implementing it? Numpy arrays, when containing Python objects, function as per the 'O' format. However, for the struct module, I don't see what would be the use case for the 'O' format. > BTW, does this already exist in a released version of NumPy? If not, > when is it likely to appear in the wild? It's included since the 1.5.0 release which came out last July. *** I think after the implementation is done, the PEP probably needs to be amended with clarifications (and possibly cutting out what is not really needed).
msg123366 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-12-04 16:02
> Another possibility is to implement the 'O' format unsafely [...] Hmm. I don't much like that idea. Historically, it's supposed to be very difficult to segfault the Python interpreter with pure Python code (well except if you're using ctypes, I guess).
msg125617 - (view)	Author: Meador Inge (meador.inge) *	Date: 2011-01-07 03:59
Attached is the latest version of the struct string patch. I tested on OS X 10.6.5 (64-bit) and Ubuntu 10.04 (32-bit). I also scanned for memory problems with Valgrind. There is one test failing on 32-bit systems ('test_crasher'). This is due to the fact that 'struct.pack("357913941b", ...)' no longer tries to allocate 357913941 format codes. This implementation just allocates one code and assigns a count of 357913941, which is utilized later when packing/unpacking. Some work could be done to add better large memory consumption checks, though. Previous feedback has been incorporated: 1. Multiplicities allowed on struct specifiers. 2. Maximum alignment rule. 3. Struct nesting depth limited (64 levels). 4. The old behavior of only one byte order specified. However, the code is written in a way such that the scoped behavior would be easy to add. As before, there will surely be more iterations, but this is good enough for general review to see if things are headed in the right direction. This is a difficult one for review because the diffs are really large. I placed a review on Rietveld here: http://codereview.appspot.com/3863042/. If anyone has any ideas on how to reduce the number number of diffs (perhaps a way to do multiple smaller pataches), then that would be cool. I don't see an obvious way to do this at this point.
msg130694 - (view)	Author: Meador Inge (meador.inge) *	Date: 2011-03-12 19:33
Is there still any interest in this work?
msg130695 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2011-03-12 19:41
Yes, there's interest (at least here). I've just been really short on Python-time recently, so haven't found time to review your patch.
msg130696 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2011-03-12 19:44
I'm going to unassign for now; I still hope to look at this at some point, but can't see a time in the near future when it's going to happen.
msg143505 - (view)	Author: Meador Inge (meador.inge) *	Date: 2011-09-05 01:44
Is this work something that might be suitable for the features/pep-3118 repo (http://hg.python.org/features/pep-3118/) ?
msg143509 - (view)	Author: Stefan Krah (skrah) *	Date: 2011-09-05 07:53
Yes, definitely. I'm going to push a new memoryview implementation (complete for all 1D/native format cases) in a couple of days. Once that is done, perhaps we could create a memoryview-struct branch on top of that.
msg167963 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2012-08-11 14:56
Following up here after rejecting #15622 as invalid The "unicode" codes in PEP 3118 need to be seriously rethought before any related changes are made in the struct module. 1. The 'c' and 's' codes are currently used for raw bytes data (represented as bytes objects at the Python layer). This means the 'c' code cannot be used as described in PEP 3118 in a world with strict binary/text separation. 2. Any format codes for UCS1, UCS2 and UCS4 are more usefully modelled on 's' than they are on 'c' (so that repeat counts create longer strings rather than lists of strings that each contain a single code point) 3. Given some of the other proposals in PEP 3118, it seems more useful to define an embedded text format as "S{<encoding>}". UCS1 would then be "S{latin-1}", UCS2 would be approximated as "S{utf-16}" and UCS4 would be "S{utf-32}" and arbitrary encodings would also be supported. struct packing would implicitly encode from text to bytes while unpacking would implicitly decode bytes to text. As with 's' a length mismatch in the encoded form would mean an error.
msg187583 - (view)	Author: Paul Hoffman (paulehoffman) *	Date: 2013-04-22 19:35
Following up on http://mail.python.org/pipermail/python-ideas/2011-March/009656.html, I would like to request that struct also handle half-precision floats directly. It's a short change, and half-precision floats are becoming much more popular in applications. Adding this to struct would also maybe need to change math.isinf and math.isnan, but maybe not.
msg187589 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2013-04-22 20:14
Paul: there's already an open issue for adding float16 to the struct module: see issue 11734.
msg187591 - (view)	Author: Paul Hoffman (paulehoffman) *	Date: 2013-04-22 20:18
Whoops, never mind. Thanks for the pointer to 11734.
msg263321 - (view)	Author: Stefan Krah (skrah) *	Date: 2016-04-13 10:20
Here's a grammar that roughly describes the subset that NumPy supports. As for implementing this in the struct module: There is a new data description language on the horizon: http://datashape.readthedocs.org/en/latest/ It does not have all the low-level capabilities (e.g changing alignment on the fly), but it is far more readable. Example: PEP-3118: "(2,3)10f0fZdT{10B:x:(2,3)d:y:Q:z:}B" Datashape: "2 * 3 * (10 * float32, 0 * float32, complex128, {x: 10 * uint8, y: 2 * 3 * float64, z: int64}, uint8)" There are a lot of open questions still. Should "10f" be viewed as an array[10] of float, i.e. equivalent to (10)f? In the context of PEP-3118, I think so.

History
Date	User	Action	Args
2022-04-11 14:56:35	admin	set	github: 47382
2016-04-13 10:20:21	skrah	set	versions: + Python 3.6, - Python 3.3
2016-04-13 10:20:08	skrah	set	files: + grammar.y nosy: + skrah messages: + msg263321
2014-10-14 16:49:42	skrah	set	nosy: - skrah
2013-11-29 11:07:39	skrah	link	issue19803 superseder
2013-04-22 20:18:40	paulehoffman	set	nosy: barry, teoliphant, mark.dickinson, ncoghlan, belopolsky, pitrou, inducer, ajaksu2, MrJean1, benjamin.peterson, pv, Arfrever, noufal, skrah, meador.inge, martin.panter, paulehoffman messages: + msg187591
2013-04-22 20:14:27	mark.dickinson	set	messages: + msg187589
2013-04-22 19:35:25	paulehoffman	set	nosy: + paulehoffman messages: + msg187583
2012-10-27 03:46:16	martin.panter	set	nosy: + martin.panter
2012-08-21 03:14:59	belopolsky	set	nosy: + belopolsky, - Alexander.Belopolsky
2012-08-11 14:56:58	ncoghlan	set	nosy: + ncoghlan messages: + msg167963
2012-08-11 14:40:31	ncoghlan	unlink	issue15622 dependencies
2012-08-11 10:15:40	Arfrever	set	nosy: + Arfrever
2012-08-11 10:08:26	skrah	link	issue15622 dependencies
2011-09-05 07:53:48	skrah	set	nosy: + skrah messages: + msg143509
2011-09-05 01:44:51	meador.inge	set	messages: + msg143505
2011-03-12 19:44:04	mark.dickinson	set	assignee: mark.dickinson -> messages: + msg130696 nosy: barry, teoliphant, mark.dickinson, pitrou, inducer, ajaksu2, MrJean1, benjamin.peterson, pv, noufal, meador.inge, Alexander.Belopolsky
2011-03-12 19:41:51	mark.dickinson	set	nosy: barry, teoliphant, mark.dickinson, pitrou, inducer, ajaksu2, MrJean1, benjamin.peterson, pv, noufal, meador.inge, Alexander.Belopolsky messages: + msg130695
2011-03-12 19:33:21	meador.inge	set	nosy: barry, teoliphant, mark.dickinson, pitrou, inducer, ajaksu2, MrJean1, benjamin.peterson, pv, noufal, meador.inge, Alexander.Belopolsky messages: + msg130694
2011-01-08 13:39:03	pitrou	set	assignee: meador.inge -> mark.dickinson stage: needs patch -> patch review nosy: barry, teoliphant, mark.dickinson, pitrou, inducer, ajaksu2, MrJean1, benjamin.peterson, pv, noufal, meador.inge, Alexander.Belopolsky versions: + Python 3.3, - Python 3.2
2011-01-07 03:59:10	meador.inge	set	files: + struct-string.py3k.3.patch nosy: barry, teoliphant, mark.dickinson, pitrou, inducer, ajaksu2, MrJean1, benjamin.peterson, pv, noufal, meador.inge, Alexander.Belopolsky messages: + msg125617
2010-12-04 16:02:40	mark.dickinson	set	messages: + msg123366
2010-12-03 10:33:46	pv	set	messages: + msg123226
2010-12-03 08:38:53	mark.dickinson	set	messages: + msg123205
2010-12-03 08:36:31	mark.dickinson	set	messages: + msg123204
2010-12-02 18:14:17	pv	set	nosy: + pv messages: + msg123093
2010-08-14 20:13:18	meador.inge	set	priority: critical -> high assignee: teoliphant -> meador.inge stage: test needed -> needs patch
2010-05-25 04:26:31	teoliphant	set	messages: + msg106416
2010-05-20 20:49:08	mark.dickinson	set	messages: + msg106188
2010-05-20 19:01:14	mark.dickinson	set	messages: + msg106181
2010-05-20 18:50:03	mark.dickinson	set	messages: + msg106180
2010-05-20 18:13:23	meador.inge	set	messages: + msg106177
2010-05-20 17:33:57	mark.dickinson	set	messages: + msg106175
2010-05-20 17:30:35	mark.dickinson	set	messages: + msg106173
2010-05-20 17:17:40	meador.inge	set	messages: + msg106168
2010-05-20 16:24:48	mark.dickinson	set	messages: + msg106164
2010-05-20 14:26:13	mark.dickinson	set	messages: + msg106157
2010-05-20 14:17:42	pitrou	set	messages: + msg106155
2010-05-20 13:50:43	meador.inge	set	files: + struct-string.py3k.2.patch messages: + msg106153
2010-05-19 19:56:16	mark.dickinson	set	messages: + msg106091
2010-05-19 19:51:30	mark.dickinson	set	messages: + msg106090
2010-05-19 19:46:19	mark.dickinson	set	messages: + msg106089
2010-05-19 19:38:54	mark.dickinson	set	messages: + msg106088
2010-05-19 19:27:14	mark.dickinson	set	messages: + msg106087
2010-05-18 12:22:25	meador.inge	set	messages: + msg105970
2010-05-18 07:12:57	mark.dickinson	set	messages: + msg105955
2010-05-18 04:08:01	meador.inge	set	files: + struct-string.py3k.patch messages: + msg105952
2010-04-20 18:26:40	noufal	set	nosy: + noufal
2010-03-01 16:24:10	inducer	set	nosy: + inducer
2010-02-26 05:37:20	Alexander.Belopolsky	set	nosy: + Alexander.Belopolsky
2010-02-22 16:14:46	mark.dickinson	set	messages: + msg99771
2010-02-22 04:13:52	meador.inge	set	messages: + msg99711
2010-02-21 19:23:08	mark.dickinson	set	messages: + msg99677
2010-02-21 13:09:18	mark.dickinson	set	messages: + msg99656
2010-02-21 13:06:18	mark.dickinson	set	messages: + msg99655
2010-02-21 13:05:24	mark.dickinson	link	issue2395 superseder
2010-02-21 13:05:24	mark.dickinson	unlink	issue2395 dependencies
2010-02-19 01:02:23	meador.inge	set	messages: + msg99551
2010-02-17 15:18:06	mark.dickinson	set	messages: + msg99474
2010-02-17 15:12:23	mark.dickinson	set	messages: + msg99472
2010-02-17 03:30:54	meador.inge	set	files: - unnamed
2010-02-17 03:30:15	meador.inge	set	files: + unnamed, pep-3118.patch keywords: + patch messages: + msg99460
2010-02-13 12:01:36	mark.dickinson	set	messages: + msg99313
2010-02-13 11:07:31	mark.dickinson	set	messages: + msg99312
2010-02-13 06:06:32	teoliphant	set	messages: + msg99309
2010-02-13 01:35:13	benjamin.peterson	set	messages: + msg99297
2010-02-13 01:29:00	meador.inge	set	nosy: + meador.inge messages: + msg99296
2009-12-24 23:14:07	mark.dickinson	set	nosy: + mark.dickinson
2009-05-16 20:33:49	ajaksu2	link	issue2395 dependencies
2009-05-16 20:33:40	ajaksu2	set	versions: + Python 3.2, - Python 3.1 nosy: + ajaksu2 messages: + msg87921 stage: test needed
2008-08-24 21:38:02	teoliphant	set	messages: + msg71882
2008-08-18 15:51:22	pitrou	set	messages: + msg71342
2008-08-18 15:02:32	benjamin.peterson	set	messages: + msg71338
2008-08-18 09:36:17	pitrou	set	priority: release blocker -> critical nosy: + pitrou messages: + msg71316 components: + Library (Lib) versions: + Python 3.1, - Python 3.0
2008-08-18 03:00:57	barry	set	nosy: + barry messages: + msg71313
2008-07-31 02:17:13	benjamin.peterson	set	priority: critical -> release blocker
2008-06-21 15:59:21	MrJean1	set	nosy: + MrJean1 messages: + msg68507
2008-06-17 22:30:31	benjamin.peterson	create