Message 98656 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	Rhamphoryncus, ajaksu2, amaury.forgeotdarc, benjamin.peterson, collinwinter, eric.smith, ezio.melotti, ferringb, gvanrossum, jafo, jimjjewett, lemburg, mark.dickinson, orivej, pitrou, rhettinger, terry.reedy
Date	2010-02-01.10:39:16
SpamBayes Score	0.0
Marked as misclassified	No
Message-id	<4B66AF53.3090605@egenix.com>
In-reply-to	<1263164005.3610.11.camel@localhost>

Content
Antoine Pitrou wrote: > > Antoine Pitrou <pitrou@free.fr> added the comment: > >> I find that the null termination for 8-bit strings makes low-level >> parsing operations (e.g., parsing a numeric string) safer and easier: > > Not to mention faster. The new IO library makes use of it (for newline > detection), on both bytestrings and unicode strings. I'd consider that a bug. Esp. the IO lib should be 8-bit clean in the sense that it doesn't add any special meaning to NUL characters or code points. Besides, using a for-loop with a counter is both safer and faster than checking each an every character for NUL. Just think of what can happen if you have buggy code that overwrites the NUL byte in some corner case situation and then use the assumption of having the NUL byte as terminator - a classical buffer overrun. If you're lucky, you get a segfault. If not, you end up with data corruption or manipulation of data which could lead to unwanted code execution. The Python Unicode API deliberately tries to always use the combination of a Py_UNICODE* pointer and a length integer to avoid such issues.

Antoine Pitrou wrote:
> 
> Antoine Pitrou <pitrou@free.fr> added the comment:
> 
>> I find that the null termination for 8-bit strings makes low-level
>> parsing operations (e.g., parsing a numeric string) safer and easier:
> 
> Not to mention faster. The new IO library makes use of it (for newline
> detection), on both bytestrings and unicode strings.

I'd consider that a bug. Esp. the IO lib should be 8-bit clean
in the sense that it doesn't add any special meaning to NUL
characters or code points.

Besides, using a for-loop with a counter is both safer and faster
than checking each an every character for NUL.

Just think of what can happen if you have buggy code that overwrites
the NUL byte in some corner case situation and then use the assumption
of having the NUL byte as terminator - a classical buffer overrun.

If you're lucky, you get a segfault. If not, you end up with
data corruption or manipulation of data which could lead to
unwanted code execution.

The Python Unicode API deliberately tries to always use the combination
of a Py_UNICODE* pointer and a length integer to avoid such issues.

History
Date	User	Action	Args
2010-02-01 10:39:19	lemburg	set	recipients: + lemburg, gvanrossum, collinwinter, rhettinger, terry.reedy, jafo, jimjjewett, amaury.forgeotdarc, mark.dickinson, Rhamphoryncus, pitrou, eric.smith, ferringb, ajaksu2, benjamin.peterson, orivej, ezio.melotti
2010-02-01 10:39:17	lemburg	link	issue1943 messages
2010-02-01 10:39:16	lemburg	create