Issue 8949: PyArg_Parse*(): "z" should not accept bytes

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/53195

classification

Title:	PyArg_Parse*(): "z" should not accept bytes
Type:		Stage:
Components:	Interpreter Core	Versions:	Python 3.2

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	lemburg, loewis, vstinner
Priority:	normal	Keywords:	patch

Created on 2010-06-08 21:23 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
pyarg_z_no_bytes-2.patch	vstinner, 2010-06-13 20:20

Messages (12)
msg107351 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-06-08 21:23
"z" is supposed to be the same format than "s" but accepts None.... Except that "z" does also accept bytes. I suppose that "s" was "fixed" between Python2 and Python3, but "z" is not fixed yet. I think that it can be fixed in Python 3.2 without any deprecation process.
msg107357 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-06-08 22:12
Hum, the patch doesn't update the documentation (at least Porting section of the What's new in Python 3.2).
msg107384 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2010-06-09 11:06
"z" should not accept bytes... why not ? "z" is the same as "s" with the addition that passing None as parameter will result in the pointer to get set to NULL. "s" accepts bytes via the buffer interface, so why should "z" behave differently ? If a function should only accept Unicode or None, then "Z" should be used instead.
msg107388 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-06-09 12:02
Le mercredi 09 juin 2010 13:06:53, vous avez écrit : > "s" accepts bytes via the buffer interface No. "s*" and "s#" do accept any buffer compatible object (including bytes and bytearray), but "s" doesn't. I fixed recently the documentation about that.
msg107393 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2010-06-09 12:23
STINNER Victor wrote: > > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > > Le mercredi 09 juin 2010 13:06:53, vous avez écrit : >> "s" accepts bytes via the buffer interface > > No. "s" and "s#" do accept any buffer compatible object (including bytes and > bytearray), but "s" doesn't. > > I fixed recently the documentation about that. Hmm, that sounds like an oversight to me. Why should "s" not accept bytes when "s#" and "s" do ? The only difference between "s" and "s#" is that you work with NUL-terminated strings as opposed to binary or text data with embedded NULs. Even Unicode objects can contain embedded NULs, so from that perspective there's no difference.
msg107579 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-06-11 20:12
Some examples of functions using "s" format: * str.encode(encoding, errors), bytes.decode(encoding, errors): both arguments have to be unicode strings * compile(source, filename, mode, ...): filename and mode have to be unicode strings * crypt.crypt(word, salt): both arguments have to be unicode strings I think that crypt() should also accept bytes, but not str.encode() nor bytes.decode(). Some examples of functions using "z" format: * _locale.bindtextdomain(domain, dirname): dirname uses "z" format and so accepts str, bytes or buffer compatible object. It should use PyUnicode_FSConverter() instead. But I agree that bytes is welcomed here. * readline.(write_history_file\|read_init_file\|read_history_file) functions do use "z" to parse a filename. PyUnicode_FSConverter() would also be better, but in this case "z" is better than "s" :-) I don't know why "s" and "z" are different about bytes, but it will be difficult to change it without changing a lot ot code (all functions using these formats). I tried to reject types different than str for "z": most tests of the test suite fail. I tried to accept bytes for "s" format: "unicode".encode(b'abc') does segfault.
msg107583 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-06-11 20:47
> I tried to reject types different than str for "z": > most tests of the test suite fail Wait, what? No. I modified the wrong line of code :-) The whole test suite pass without any error if "z" doesn't accept bytes anymore.
msg107600 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-06-11 23:00
I patched readline (r81918) and locale (r81914) modules to use PyUnicode_FSConverter(). They don't use "z" format anymore to parse a filename. I checked last functions using "z" format (not "z#" or "z", only "z") and I think that it's ok ("z" can be patched to reject bytes). You can get the full list using: grep -n 'PyArg_Parse[^(]([^"]"[^";:]z[^#]' /.c -- About the difference between s/z and s/s#/z/z#: I don't understand why they are different, but I think that it's too late to change that. Too much functions (and third party modules?) suppose that s/s#/z*/z# supports buffer compatible objects.
msg107748 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-06-13 20:20
Update the patch to update also the new tests that I added in r81973 (to fix #8592).
msg108152 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-06-18 23:46
@lemburg: So what is your opinion on this issue?
msg108311 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2010-06-21 19:57
STINNER Victor wrote: > > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > > @lemburg: So what is your opinion on this issue? You're probably right: it's too late to change "s" to accept buffer interface compatible objects as well. In that case, symmetry is more important, so +1 on rejecting bytes for "z" as well. Please add a note to the NEWS file for this change.
msg108554 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-06-24 22:11
Patch commited to 3.2 (r82200), blocked in 3.1 (r82201).

History
Date	User	Action	Args
2022-04-11 14:57:02	admin	set	github: 53195
2010-06-24 22:11:00	vstinner	set	status: open -> closed resolution: fixed messages: + msg108554
2010-06-21 19:57:37	lemburg	set	messages: + msg108311
2010-06-18 23:46:33	vstinner	set	messages: + msg108152
2010-06-13 20:20:19	vstinner	set	files: - pyarg_z_no_bytes.patch
2010-06-13 20:20:14	vstinner	set	files: + pyarg_z_no_bytes-2.patch messages: + msg107748
2010-06-11 23:39:25	vstinner	link	issue8951 dependencies
2010-06-11 23:00:10	vstinner	set	messages: + msg107600
2010-06-11 20:47:20	vstinner	set	messages: + msg107583
2010-06-11 20:12:28	vstinner	set	messages: + msg107579
2010-06-09 17:25:34	pitrou	set	nosy: + loewis
2010-06-09 12:23:57	lemburg	set	messages: + msg107393
2010-06-09 12:02:04	vstinner	set	messages: + msg107388
2010-06-09 11:06:51	lemburg	set	messages: + msg107384
2010-06-09 10:37:17	vstinner	set	nosy: + lemburg
2010-06-08 22:12:03	vstinner	set	messages: + msg107357
2010-06-08 21:24:15	vstinner	set	files: + pyarg_z_no_bytes.patch keywords: + patch
2010-06-08 21:23:21	vstinner	create