classification
Title: 'y' does not check for embedded NUL bytes
Type: behavior Stage: needs patch
Components: Interpreter Core Versions: Python 3.1, Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: loewis Nosy List: loewis, pitrou, vstinner
Priority: normal Keywords: patch

Created on 2010-05-01 18:04 by pitrou, last changed 2010-06-13 20:06 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
y_format.patch vstinner, 2010-06-12 00:38
Messages (6)
msg104734 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-01 18:04
The documentation for the 'y' format (PyArg_ParseTuple and friends) states that:

« The bytes object must not contain embedded NUL bytes; if it does, a TypeError exception is raised. »

But, reading Python/getargs.c, the strlen() check is actually missing in the code for 'y'.
msg104991 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-05 01:08
Same issue for y#:

y# (...) This variant on s# doesn’t accept Unicode objects, only bytes-like objects.

s# (...) The string may contain embedded null bytes.

--

y* might mention that it accepts embedded null bytes.

--

grep 'PyArg_Parse[^"]\+"[^:;)"]*y[^*]' */*.c finds only usage of y# (no usage of y format):

 - mmap_gfind(), mmap_write_method()
 - oss_write(), oss_writeall()
 - in getsockaddrarg() with s->sock_family==AF_PACKET
 - in sock_setsockopt() if the option name is a string
 - socket_inet_ntoa(), socket_inet_ntop()

These functions have to support embedded null bytes. So I think that y# should specify explicitly that embedded null bytes are accepted.
msg104992 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-05 01:09
See also #8215.
msg107368 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-06-09 00:04
See also #8850: Remove "w" format of PyParse_ParseTuple().

--

About "y": the parser HAVE TO check for embedded NUL bytes, because the caller doesn't know the size of the buffer (and strlen() would give the wrong size).
msg107616 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-06-12 00:38
Attached patch fixes the initial problem: raise an error if the byte strings embeds a NUL-byte.
msg107747 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-06-13 20:06
I commited a bigger patch: r81973 not only fixes "y" format, but also "u" and "Z". It does also add a lot of tests in test_getargs2.py for many string formats (not all, eg. "es" is not tested).

Even if I consider this as a bugfix, I don't want to backport to 3.1 because it might break programs which rely on this strange behaviour.
History
Date User Action Args
2010-06-13 20:06:19vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg107747
2010-06-12 00:38:59vstinnersetfiles: + y_format.patch
keywords: + patch
messages: + msg107616
2010-06-09 00:04:33vstinnersetmessages: + msg107368
2010-05-05 01:09:27vstinnersetmessages: + msg104992
2010-05-05 01:08:25vstinnersetnosy: + vstinner
messages: + msg104991
2010-05-01 18:04:33pitroucreate