Message 341363 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	mbiggs
Recipients	docs@python, mbiggs
Date	2019-05-04.00:00:17
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1556928017.33.0.648089706151.issue36789@roundup.psfhosted.org>
In-reply-to

Content
In the Unicode HOWTO: http://docs.python.org/3.3/howto/unicode.html It says the following: "UTF-8 has several convenient properties: (...) 2. A Unicode string is turned into a sequence of bytes containing no embedded zero bytes. This avoids byte-ordering issues, and means UTF-8 strings can be processed by C functions such as strcpy() and sent through protocols that can’t handle zero bytes." This is not right. UTF-8 uses the zero byte to represent the Unicode codepoint U+0000 (the ASCII NULL character). This is a valid character in UTF-8 and is handled just fine by python's UTF-8 string encoding/decoding.

In the Unicode HOWTO: http://docs.python.org/3.3/howto/unicode.html

It says the following:


"UTF-8 has several convenient properties:
(...)
2. A Unicode string is turned into a sequence of bytes containing no embedded zero bytes. This avoids byte-ordering issues, and means UTF-8 strings can be processed by C functions such as strcpy() and sent through protocols that can’t handle zero bytes."

This is not right.  UTF-8 uses the zero byte to represent the Unicode codepoint U+0000 (the ASCII NULL character).  This is a valid character in UTF-8 and is handled just fine by python's UTF-8 string encoding/decoding.

History
Date	User	Action	Args
2019-05-04 00:00:17	mbiggs	set	recipients: + mbiggs, docs@python
2019-05-04 00:00:17	mbiggs	set	messageid: <1556928017.33.0.648089706151.issue36789@roundup.psfhosted.org>
2019-05-04 00:00:17	mbiggs	link	issue36789 messages
2019-05-04 00:00:17	mbiggs	create