Message 67729 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	alexandre.vassalotti, bhy, lemburg, loewis
Date	2008-06-05.21:06:57
SpamBayes Score	0.009533206
Marked as misclassified	No
Message-id	<4848556E.5010207@egenix.com>
In-reply-to	<4848517E.4060701@v.loewis.de>

Content
On 2008-06-05 22:50, Martin v. Löwis wrote: >> Note that the function must check the UTF-8 buffer for embedded >> NUL bytes and then raise an exception if it finds one. Otherwise, >> the API would silently cause truncations. > > PyString_AsString doesn't check for null bytes, either, and will also > silently truncate. This has never been a problem, so I fail to see why > it is a problem for Unicode strings. Just because a bug hasn't surfaced yet, doesn't make it a non-issue. The problem is also somewhat different for Unicode: Unlike PyString_AsString() a Unicode API PyUnicode_UTF8() would not provide easy access to the length of the returned char*. And there is no PyString_GET_SIZE() you could use to quickly verify that there are no embedded NULs. Which is why using PyUnicode_AsStringAndSize() is the overall better and safer solution.

On 2008-06-05 22:50, Martin v. Löwis wrote:
>> Note that the function *must* check the UTF-8 buffer for embedded
>> NUL bytes and then raise an exception if it finds one. Otherwise,
>> the API would silently cause truncations.
> 
> PyString_AsString doesn't check for null bytes, either, and will also
> silently truncate. This has never been a problem, so I fail to see why
> it is a problem for Unicode strings.

Just because a bug hasn't surfaced yet, doesn't make it a non-issue.

The problem is also somewhat different for Unicode:

Unlike PyString_AsString() a Unicode API PyUnicode_UTF8() would not
provide easy access to the length of the returned char*.

And there is no PyString_GET_SIZE() you could use to quickly verify that
there are no embedded NULs.

Which is why using PyUnicode_AsStringAndSize() is the overall better
and safer solution.

History
Date	User	Action	Args
2008-06-05 21:07:00	lemburg	set	spambayes_score: 0.00953321 -> 0.009533206 recipients: + lemburg, loewis, alexandre.vassalotti, bhy
2008-06-05 21:06:58	lemburg	link	issue2799 messages
2008-06-05 21:06:57	lemburg	create