Message 213741 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	pitrou
Recipients	akuchling, benjamin.peterson, docs@python, eric.araujo, ezio.melotti, gwideman, lemburg, pitrou, tshepang, vstinner
Date	2014-03-16.17:35:53
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1394991353.94.0.269076567873.issue20906@psf.upfronthosting.co.za>
In-reply-to

Content
Do you want to provide a patch? > In a narrative such as the current article, a code point value is usually written in hexadecimal. I find use of the word "narrative" intimidating in the context of a technical documentation. In general, I find it disappointing that the Unicode HOWTO only gives hexadecimal representations of non-ASCII characters and (almost) never represents them in their true form. This makes things more abstract than necessary. > This is a vague claim. Probably what was intended was: "Many Internet standards define protocols in which the data must contain no zero bytes, or zero bytes have special meaning." Is this actually true? Are there "many" such standards? I think it actually means that Internet protocols assume an ASCII-compatible encoding (which UTF-8 is, but not UTF-16 or UTF-32 - nor EBCDIC :-)). > --> "Non-Unicode code systems usually don't handle all of the characters to be found in Unicode." The term encoding is used pervasively when dealing with the transformation of unicode to/from bytes, so I find it confusing to introduce another term here ("code systems"). I prefer the original sentence.

Do you want to provide a patch?

> In a narrative such as the current article, a code point value is usually written in hexadecimal.

I find use of the word "narrative" intimidating in the context of a technical documentation.

In general, I find it disappointing that the Unicode HOWTO only gives hexadecimal representations of non-ASCII characters and (almost) never represents them in their true form. This makes things more abstract than necessary.

> This is a vague claim. Probably what was intended was: "Many Internet standards define protocols in which the data must contain no zero bytes, or zero bytes have special meaning."  Is this actually true? Are there "many" such standards?

I think it actually means that Internet protocols assume an ASCII-compatible encoding (which UTF-8 is, but not UTF-16 or UTF-32 - nor EBCDIC :-)).

> --> "Non-Unicode code systems usually don't handle all of the characters to be found in Unicode."

The term *encoding* is used pervasively when dealing with the transformation of unicode to/from bytes, so I find it confusing to introduce another term here ("code systems"). I prefer the original sentence.

History
Date	User	Action	Args
2014-03-16 17:35:54	pitrou	set	recipients: + pitrou, lemburg, akuchling, vstinner, benjamin.peterson, ezio.melotti, eric.araujo, docs@python, tshepang, gwideman
2014-03-16 17:35:53	pitrou	set	messageid: <1394991353.94.0.269076567873.issue20906@psf.upfronthosting.co.za>
2014-03-16 17:35:53	pitrou	link	issue20906 messages
2014-03-16 17:35:53	pitrou	create