classification
Title: Document whether it's safe to use bytes for struct format string
Type: Stage:
Components: Extension Modules Versions: Python 3.2, Python 3.3, Python 3.4
process
Status: open Resolution:
Dependencies: 21071 Superseder:
Assigned To: Nosy List: Arfrever, christian.heimes, mark.dickinson, martin.panter, meador.inge, rhettinger, serhiy.storchaka, takluyver, terry.reedy
Priority: normal Keywords: patch

Created on 2012-10-28 12:43 by takluyver, last changed 2017-04-29 02:46 by martin.panter.

Files
File name Uploaded Description Edit
format-bytes.patch martin.panter, 2014-12-18 05:55 review
Messages (13)
msg174042 - (view) Author: Thomas Kluyver (takluyver) * Date: 2012-10-28 12:43
At least in CPython, format strings can be given as bytes, as an alternative to str. E.g.

>>> struct.unpack(b'>hhl', b'\x00\x01\x00\x02\x00\x00\x00\x03')
(1, 2, 3)

Looking at the source code [1], this appears to be consciously accounted for. But it doesn't seem to be mentioned in the documentation. I think the docs should either say it's a possibility, or warn that it's an implementation detail.

[1] http://hg.python.org/cpython/file/cde4b66699fe/Modules/_struct.c#l1340
msg174083 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2012-10-28 22:36
Also it would be nice to clarify if struct.Struct.format is meant to be a byte string. Reading the documentation and examples I expected a character string. It was an issue for me when embedding one structure within another:

HSF_VOL_DESC = Struct("< B 5s B")

# Python 3.2.3's "Struct.format" is actually a byte string
NSR_DESC = Struct(HSF_VOL_DESC.format.decode() + "B")
msg174584 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-11-02 21:28
For 3.3, I verified that adding b prefix to first three doc examples gives same output as without, but also discovered that example outputs are wrong, at least on windows, because of byte ordering issues.

>>> pack('hhl', 1, 2, 3)
b'\x01\x00\x02\x00\x03\x00\x00\x00'
>>> pack(b'hhl', 1, 2, 3)
b'\x01\x00\x02\x00\x03\x00\x00\x00'
>>> unpack(b'hhl', b'\x00\x01\x00\x02\x00\x00\x00\x03')
(256, 512, 50331648)
>>> unpack('hhl', b'\x00\x01\x00\x02\x00\x00\x00\x03')
(256, 512, 50331648)
msg174680 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-11-03 19:35
> but also discovered that example outputs are wrong

That's documented to some extent:  there's a line in the docs that says:
"All examples assume a native byte order, size, and alignment with a big-endian machine".

Given that little-endian machines are much more common that big-endian these days, it may be worth rewriting the examples for little-endian machines.
msg174682 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-11-03 19:40
> Also it would be nice to clarify if struct.Struct.format is meant to be > a byte string.

Hmm.  That seems wrong to me.  After all, the format string is supposed to be a piece of human-readable text rather than a collection of bytes.  I think it's borderline acceptable to allow a bytes instance to be passed in for the format (practicality beats purity and all that), but I'd say that the output format should definitely be unicode.
msg174711 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-11-03 22:13
I think the example should be switched *and* the formats should specify the endianess so the examples work on all systems.
msg176681 - (view) Author: Thomas Kluyver (takluyver) * Date: 2012-11-30 11:04
I'm happy to put together a docs patch, but I don't have any indication of the right answer (is it a safe feature to use, or an implementation detail?) Is there another venue where I should raise the question?
msg176701 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-30 18:55
Python 2 supports only str. Support for unicode objects has been added in r59687 (merged with other unrelated changes in changeset 13aabc23cf2e). Maybe Raymond can explain why the type for the Struct.format was chosen bytes, not unicode.
msg176702 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-30 19:05
No, this is not r59687. I can't found from which revision in 59680-59695 it came.
msg216656 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2014-04-17 05:11
The issue of Struct.format being a byte string has been raised separately in Issue 21071.
msg232767 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2014-12-16 22:34
Actually the “struct” module doc string seems to already hint that format strings can be byte strings:

“Python bytes objects are used to hold the data representing the C struct and also as format strings . . .”
msg232858 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2014-12-18 05:55
Assuming it is intended to support byte strings, here is a patch that documents them being allowed, and adds a test case
msg292554 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-04-29 02:46
I think the direction to take for this depends on the outcome of Issue 21071. First we have to decide if the “format” attribute is blessed as a byte string (and not deprecated), or whether it is deprecated or changed to a text string.

Serhiy pointed out that it is not entirely “safe” because mixing equivalent byte and text formats can generate ByteWarning.
History
Date User Action Args
2017-09-14 03:25:33xiang.zhangunlinkissue19985 dependencies
2017-04-29 02:46:51martin.pantersetdependencies: + struct.Struct.format is bytes, but should be str, - Document whether it's safe to use bytes for struct format string
2017-04-29 02:46:51martin.panterunlinkissue16349 dependencies
2017-04-29 02:46:14martin.pantersetdependencies: + Document whether it's safe to use bytes for struct format string
messages: + msg292554
2017-04-29 02:46:14martin.panterlinkissue16349 dependencies
2016-04-15 04:00:33martin.panterlinkissue19985 dependencies
2014-12-19 00:17:39Arfreversetnosy: + Arfrever
2014-12-18 05:55:28martin.pantersetfiles: + format-bytes.patch
keywords: + patch
messages: + msg232858
2014-12-16 22:34:55martin.pantersetmessages: + msg232767
2014-04-17 05:11:19martin.pantersetmessages: + msg216656
2012-11-30 19:05:18serhiy.storchakasetnosy: + christian.heimes
messages: + msg176702
2012-11-30 18:55:08serhiy.storchakasetversions: + Python 3.2, Python 3.3, Python 3.4
nosy: + rhettinger, serhiy.storchaka

messages: + msg176701

components: + Extension Modules, - Library (Lib)
2012-11-30 11:04:26takluyversetmessages: + msg176681
2012-11-03 22:13:58terry.reedysetmessages: + msg174711
2012-11-03 19:40:04mark.dickinsonsetmessages: + msg174682
2012-11-03 19:35:49mark.dickinsonsetmessages: + msg174680
2012-11-02 21:28:35terry.reedysetnosy: + mark.dickinson, meador.inge, terry.reedy
messages: + msg174584
2012-10-28 22:36:28martin.pantersetnosy: + martin.panter
messages: + msg174083
2012-10-28 12:43:42takluyvercreate