classification
Title: Document that the null character '\0' terminates a struct format spec
Type: behavior Stage: patch review
Components: Documentation, Library (Lib) Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: ZackerySpytz, bup, docs@python, mark.dickinson, miss-islington, serhiy.storchaka, steven.daprano
Priority: normal Keywords: patch

Created on 2019-01-10 22:42 by bup, last changed 2020-05-26 09:16 by miss-islington.

Pull Requests
URL Status Linked Edit
PR 16928 merged ZackerySpytz, 2019-10-26 05:48
PR 20373 merged miss-islington, 2020-05-25 07:55
PR 20374 closed miss-islington, 2020-05-25 07:55
PR 20375 closed miss-islington, 2020-05-25 07:55
PR 20376 closed miss-islington, 2020-05-25 07:55
PR 20419 merged ZackerySpytz, 2020-05-26 08:32
PR 20420 merged miss-islington, 2020-05-26 08:57
Messages (10)
msg333424 - (view) Author: Dan Snider (bup) * Date: 2019-01-10 22:42
ie.:
    >>> from struct import calcsize
    >>> calcsize('\144\u0064\000xf\U00000031000\60d\121\U00000051')
    16

I'm sure some people think it's obvious or even expect the null character to signal EOF but it probably isn't obvious at all to those without experience in lower level languages. It actually seems like Python goes out of its way to make sure everything treats the null character no more special than the letter "H", which is good.

At first glance I'd think something like this was just another trivial quirk of the language and not bring it up, but because the documentation doesn't mention it I actually got stuck on something related for half an hour when unit testing some dynamically generated format specs. 

Without going into unnecessary detail, what happened was that a typo in another tangentially related part of the test was enabling the generation of a rogue null byte. I'm bad at those "find face in the crowd" puzzles and this was hardly different, being literally camouflaged within a 300 character format spec containing a random mixture of escaped and non-escaped source characters in the forms: \Uffffffff, \uffff, \777, \xff, \x00, + latin/ascii.

If I'm not the only one who sees this as a slightly bigger deal than poor documentation, the fix is trivial with an extra call to PyBytes_GET_SIZE when null is found. But just because I can't think of a use case in allowing the null character to precede other characters in the format string doesn't mean there isn't one, which is why only documentation is currently selected.
msg333430 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2019-01-11 01:03
I'm not sure whether having NULLs terminate a struct format string is a feature or a bug.

Given that nearly every other string in Python treat NULLs as ordinary characters, I'm inclined to say this is a bug. Or at least an unnecessary restriction that ought to be lifted.
msg333441 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-01-11 07:28
I think the null character is illegal character in the format string, and struct functions should raise a struct.error for it.
msg355407 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2019-10-26 08:39
I agree with Serhiy. Any other unrecognised character would raise an error. The null character should do the same.
msg355410 - (view) Author: Zackery Spytz (ZackerySpytz) * (Python triager) Date: 2019-10-26 09:50
I've created a patch to reject null characters in the format string.
msg369859 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-05-25 07:55
New changeset 3f59b55316f4c6ab451997902579aa69020b537c by Zackery Spytz in branch 'master':
bpo-35714: Reject null characters in struct format strings (GH-16928)
https://github.com/python/cpython/commit/3f59b55316f4c6ab451997902579aa69020b537c
msg369950 - (view) Author: miss-islington (miss-islington) Date: 2020-05-26 07:05
New changeset 5221a10dde4a3853fe7ace316d95767648055109 by Miss Islington (bot) in branch '3.9':
bpo-35714: Reject null characters in struct format strings (GH-16928)
https://github.com/python/cpython/commit/5221a10dde4a3853fe7ace316d95767648055109
msg369951 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-05-26 07:10
Zackery, do you mind to create a backport to 3.8?
msg369959 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-05-26 08:57
New changeset 5ff5edfef63b3dbc1abb004b3fa4b3db87e79ff9 by Zackery Spytz in branch '3.8':
[3.8] bpo-35714: Reject null characters in struct format strings (GH-16928) (GH-20419)
https://github.com/python/cpython/commit/5ff5edfef63b3dbc1abb004b3fa4b3db87e79ff9
msg369961 - (view) Author: miss-islington (miss-islington) Date: 2020-05-26 09:16
New changeset 4ea802868460fad54e40cb99eb0ca283b3b293f0 by Miss Islington (bot) in branch '3.7':
[3.8] bpo-35714: Reject null characters in struct format strings (GH-16928) (GH-20419)
https://github.com/python/cpython/commit/4ea802868460fad54e40cb99eb0ca283b3b293f0
History
Date User Action Args
2020-05-26 09:16:42miss-islingtonsetmessages: + msg369961
2020-05-26 08:57:22miss-islingtonsetpull_requests: + pull_request19679
2020-05-26 08:57:18serhiy.storchakasetmessages: + msg369959
2020-05-26 08:32:45ZackerySpytzsetpull_requests: + pull_request19678
2020-05-26 07:10:11serhiy.storchakasetmessages: + msg369951
2020-05-26 07:05:02miss-islingtonsetmessages: + msg369950
2020-05-25 07:55:52miss-islingtonsetpull_requests: + pull_request19639
2020-05-25 07:55:43miss-islingtonsetpull_requests: + pull_request19638
2020-05-25 07:55:35miss-islingtonsetpull_requests: + pull_request19637
2020-05-25 07:55:25miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request19636
2020-05-25 07:55:13serhiy.storchakasetmessages: + msg369859
2019-10-26 09:50:50ZackerySpytzsetnosy: + ZackerySpytz
messages: + msg355410
2019-10-26 08:39:22mark.dickinsonsetnosy: + mark.dickinson
messages: + msg355407
2019-10-26 05:48:41ZackerySpytzsetkeywords: + patch
stage: patch review
pull_requests: + pull_request16457
2019-01-11 07:28:15serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg333441
2019-01-11 01:03:27steven.dapranosettype: behavior

messages: + msg333430
nosy: + steven.daprano
2019-01-10 22:42:10bupcreate