This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients Rhamphoryncus, ajaksu2, amaury.forgeotdarc, benjamin.peterson, collinwinter, eric.smith, ezio.melotti, ferringb, gvanrossum, jafo, jimjjewett, lemburg, mark.dickinson, orivej, pitrou, rhettinger, terry.reedy
Date 2010-02-01.10:39:16
SpamBayes Score 0.0
Marked as misclassified No
Message-id <4B66AF53.3090605@egenix.com>
In-reply-to <1263164005.3610.11.camel@localhost>
Content
Antoine Pitrou wrote:
> 
> Antoine Pitrou <pitrou@free.fr> added the comment:
> 
>> I find that the null termination for 8-bit strings makes low-level
>> parsing operations (e.g., parsing a numeric string) safer and easier:
> 
> Not to mention faster. The new IO library makes use of it (for newline
> detection), on both bytestrings and unicode strings.

I'd consider that a bug. Esp. the IO lib should be 8-bit clean
in the sense that it doesn't add any special meaning to NUL
characters or code points.

Besides, using a for-loop with a counter is both safer and faster
than checking each an every character for NUL.

Just think of what can happen if you have buggy code that overwrites
the NUL byte in some corner case situation and then use the assumption
of having the NUL byte as terminator - a classical buffer overrun.

If you're lucky, you get a segfault. If not, you end up with
data corruption or manipulation of data which could lead to
unwanted code execution.

The Python Unicode API deliberately tries to always use the combination
of a Py_UNICODE* pointer and a length integer to avoid such issues.
History
Date User Action Args
2010-02-01 10:39:19lemburgsetrecipients: + lemburg, gvanrossum, collinwinter, rhettinger, terry.reedy, jafo, jimjjewett, amaury.forgeotdarc, mark.dickinson, Rhamphoryncus, pitrou, eric.smith, ferringb, ajaksu2, benjamin.peterson, orivej, ezio.melotti
2010-02-01 10:39:17lemburglinkissue1943 messages
2010-02-01 10:39:16lemburgcreate