classification
Title: PyArg_Parse*(): "z" should not accept bytes
Type: Stage:
Components: Interpreter Core Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: lemburg, loewis, vstinner
Priority: normal Keywords: patch

Created on 2010-06-08 21:23 by vstinner, last changed 2010-06-24 22:11 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
pyarg_z_no_bytes-2.patch vstinner, 2010-06-13 20:20
Messages (12)
msg107351 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-06-08 21:23
"z" is supposed to be the same format than "s" but accepts None.... Except that "z" does also accept bytes. I suppose that "s" was "fixed" between Python2 and Python3, but "z" is not fixed yet.

I think that it can be fixed in Python 3.2 without any deprecation process.
msg107357 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-06-08 22:12
Hum, the patch doesn't update the documentation (at least Porting section of the What's new in Python 3.2).
msg107384 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-06-09 11:06
"z" should not accept bytes... why not ?

"z" is the same as "s" with the addition that passing None
as parameter will result in the pointer to get set to NULL.

"s" accepts bytes via the buffer interface, so why should
"z" behave differently ?

If a function should only accept Unicode or None, then "Z"
should be used instead.
msg107388 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-06-09 12:02
Le mercredi 09 juin 2010 13:06:53, vous avez écrit :
> "s" accepts bytes via the buffer interface

No. "s*" and "s#" do accept any buffer compatible object (including bytes and 
bytearray), but "s" doesn't.

I fixed recently the documentation about that.
msg107393 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-06-09 12:23
STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
> Le mercredi 09 juin 2010 13:06:53, vous avez écrit :
>> "s" accepts bytes via the buffer interface
> 
> No. "s*" and "s#" do accept any buffer compatible object (including bytes and 
> bytearray), but "s" doesn't.
> 
> I fixed recently the documentation about that.

Hmm, that sounds like an oversight to me. Why should "s" not accept
bytes when "s#" and "s*" do ?

The only difference between "s" and "s#" is that you work with
NUL-terminated strings as opposed to binary or text data with embedded
NULs. Even Unicode objects can contain embedded NULs, so from that
perspective there's no difference.
msg107579 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-06-11 20:12
Some examples of functions using "s" format:
 * str.encode(encoding, errors), bytes.decode(encoding, errors): both arguments have to be unicode strings
 * compile(source, filename, mode, ...): filename and mode have to be unicode strings
 * crypt.crypt(word, salt): both arguments have to be unicode strings

I think that crypt() should also accept bytes, but not str.encode() nor bytes.decode().

Some examples of functions using "z" format:
 * _locale.bindtextdomain(domain, dirname): dirname uses "z" format and so accepts str, bytes or buffer compatible object. It should use PyUnicode_FSConverter() instead. But I agree that bytes is welcomed here.
 * readline.(write_history_file|read_init_file|read_history_file) functions do use "z" to parse a filename. PyUnicode_FSConverter() would also be better, but in this case "z" is better than "s" :-)

I don't know why "s" and "z" are different about bytes, but it will be difficult to change it without changing a lot ot code (all functions using these formats). I tried to reject types different than str for "z": most tests of the test suite fail. I tried to accept bytes for "s" format: "unicode".encode(b'abc') does segfault.
msg107583 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-06-11 20:47
> I tried to reject types different than str for "z": 
> most tests of the test suite fail

Wait, what? No. I modified the wrong line of code :-) The whole test suite pass without any error if "z" doesn't accept bytes anymore.
msg107600 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-06-11 23:00
I patched readline (r81918) and locale (r81914) modules to use PyUnicode_FSConverter(). They don't use "z" format anymore to parse a filename.

I checked last functions using "z" format (not "z#" or "z*", only "z") and I think that it's ok ("z" can be patched to reject bytes). You can get the full list using:

grep -n 'PyArg_Parse[^(]*([^"]*"[^";:]*z[^#*]' */*.c

--

About the difference between s/z and s*/s#/z*/z#: I don't understand why they are different, but I think that it's too late to change that. Too much functions (and third party modules?) suppose that s*/s#/z*/z# supports buffer compatible objects.
msg107748 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-06-13 20:20
Update the patch to update also the new tests that I added in r81973 (to fix #8592).
msg108152 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-06-18 23:46
@lemburg: So what is your opinion on this issue?
msg108311 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-06-21 19:57
STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
> @lemburg: So what is your opinion on this issue?

You're probably right: it's too late to change "s" to accept buffer
interface compatible objects as well.

In that case, symmetry is more important, so +1 on rejecting bytes
for "z" as well.

Please add a note to the NEWS file for this change.
msg108554 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-06-24 22:11
Patch commited to 3.2 (r82200), blocked in 3.1 (r82201).
History
Date User Action Args
2010-06-24 22:11:00vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg108554
2010-06-21 19:57:37lemburgsetmessages: + msg108311
2010-06-18 23:46:33vstinnersetmessages: + msg108152
2010-06-13 20:20:19vstinnersetfiles: - pyarg_z_no_bytes.patch
2010-06-13 20:20:14vstinnersetfiles: + pyarg_z_no_bytes-2.patch

messages: + msg107748
2010-06-11 23:39:25vstinnerlinkissue8951 dependencies
2010-06-11 23:00:10vstinnersetmessages: + msg107600
2010-06-11 20:47:20vstinnersetmessages: + msg107583
2010-06-11 20:12:28vstinnersetmessages: + msg107579
2010-06-09 17:25:34pitrousetnosy: + loewis
2010-06-09 12:23:57lemburgsetmessages: + msg107393
2010-06-09 12:02:04vstinnersetmessages: + msg107388
2010-06-09 11:06:51lemburgsetmessages: + msg107384
2010-06-09 10:37:17vstinnersetnosy: + lemburg
2010-06-08 22:12:03vstinnersetmessages: + msg107357
2010-06-08 21:24:15vstinnersetfiles: + pyarg_z_no_bytes.patch
keywords: + patch
2010-06-08 21:23:21vstinnercreate