Author bup
Recipients bup, docs@python
Date 2019-01-10.22:42:10
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1547160130.27.0.885629506261.issue35714@roundup.psfhosted.org>
In-reply-to
Content
ie.:
    >>> from struct import calcsize
    >>> calcsize('\144\u0064\000xf\U00000031000\60d\121\U00000051')
    16

I'm sure some people think it's obvious or even expect the null character to signal EOF but it probably isn't obvious at all to those without experience in lower level languages. It actually seems like Python goes out of its way to make sure everything treats the null character no more special than the letter "H", which is good.

At first glance I'd think something like this was just another trivial quirk of the language and not bring it up, but because the documentation doesn't mention it I actually got stuck on something related for half an hour when unit testing some dynamically generated format specs. 

Without going into unnecessary detail, what happened was that a typo in another tangentially related part of the test was enabling the generation of a rogue null byte. I'm bad at those "find face in the crowd" puzzles and this was hardly different, being literally camouflaged within a 300 character format spec containing a random mixture of escaped and non-escaped source characters in the forms: \Uffffffff, \uffff, \777, \xff, \x00, + latin/ascii.

If I'm not the only one who sees this as a slightly bigger deal than poor documentation, the fix is trivial with an extra call to PyBytes_GET_SIZE when null is found. But just because I can't think of a use case in allowing the null character to precede other characters in the format string doesn't mean there isn't one, which is why only documentation is currently selected.
History
Date User Action Args
2019-01-10 22:42:14bupsetrecipients: + bup, docs@python
2019-01-10 22:42:10bupsetmessageid: <1547160130.27.0.885629506261.issue35714@roundup.psfhosted.org>
2019-01-10 22:42:10buplinkissue35714 messages
2019-01-10 22:42:10bupcreate