Title: Document that the null character '\0' terminates a struct format spec
Type: behavior Stage:
Components: Documentation, Library (Lib) Versions: Python 3.8, Python 3.7, Python 3.6
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: bup, docs@python, serhiy.storchaka, steven.daprano
Priority: normal Keywords:

Created on 2019-01-10 22:42 by bup, last changed 2019-01-11 07:28 by serhiy.storchaka.

Messages (3)
msg333424 - (view) Author: Dan Snider (bup) * Date: 2019-01-10 22:42
    >>> from struct import calcsize
    >>> calcsize('\144\u0064\000xf\U00000031000\60d\121\U00000051')

I'm sure some people think it's obvious or even expect the null character to signal EOF but it probably isn't obvious at all to those without experience in lower level languages. It actually seems like Python goes out of its way to make sure everything treats the null character no more special than the letter "H", which is good.

At first glance I'd think something like this was just another trivial quirk of the language and not bring it up, but because the documentation doesn't mention it I actually got stuck on something related for half an hour when unit testing some dynamically generated format specs. 

Without going into unnecessary detail, what happened was that a typo in another tangentially related part of the test was enabling the generation of a rogue null byte. I'm bad at those "find face in the crowd" puzzles and this was hardly different, being literally camouflaged within a 300 character format spec containing a random mixture of escaped and non-escaped source characters in the forms: \Uffffffff, \uffff, \777, \xff, \x00, + latin/ascii.

If I'm not the only one who sees this as a slightly bigger deal than poor documentation, the fix is trivial with an extra call to PyBytes_GET_SIZE when null is found. But just because I can't think of a use case in allowing the null character to precede other characters in the format string doesn't mean there isn't one, which is why only documentation is currently selected.
msg333430 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2019-01-11 01:03
I'm not sure whether having NULLs terminate a struct format string is a feature or a bug.

Given that nearly every other string in Python treat NULLs as ordinary characters, I'm inclined to say this is a bug. Or at least an unnecessary restriction that ought to be lifted.
msg333441 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-01-11 07:28
I think the null character is illegal character in the format string, and struct functions should raise a struct.error for it.
Date User Action Args
2019-01-11 07:28:15serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg333441
2019-01-11 01:03:27steven.dapranosettype: behavior

messages: + msg333430
nosy: + steven.daprano
2019-01-10 22:42:10bupcreate