Message 323976 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	mark.dickinson
Recipients	docs@python, mark.dickinson
Date	2018-08-23.21:14:39
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1535058879.5.0.56676864532.issue34484@psf.upfronthosting.co.za>
In-reply-to

Content
The Unicode HOWTO currently has contains this text in the "Files in an Unknown Encoding" section [1]: > The surrogateescape error handler will decode any non-ASCII bytes as code > points in the Unicode Private Use Area ranging from U+DC80 to U+DCFF. These > private code points will then be turned back into the same bytes when the > surrogateescape error handler is used when encoding the data and writing it > back out. Unless I'm missing something, the subrange U+DC80 to U+DCFF of the low surrogates is not a Private Use Area. There is a kinda-sorta PUA in the high surrogates from U+DB80 to U+DBFF (because the only valid codepoints that use these surrogates in their UTF-16 encoding are the codepoints in planes 15 and 16, which are almost entirely PUA codepoints), but that's not what the surrogateescape handler is using. [1] https://docs.python.org/3/howto/unicode.html#files-in-an-unknown-encoding

The Unicode HOWTO currently has contains this text in the "Files in an Unknown Encoding" section [1]:

> The surrogateescape error handler will decode any non-ASCII bytes as code
> points in the Unicode Private Use Area ranging from U+DC80 to U+DCFF. These
> private code points will then be turned back into the same bytes when the
> surrogateescape error handler is used when encoding the data and writing it
> back out.

Unless I'm missing something, the subrange U+DC80 to U+DCFF of the low surrogates is *not* a Private Use Area. There *is* a kinda-sorta PUA in the high surrogates from U+DB80 to U+DBFF (because the only valid codepoints that use these surrogates in their UTF-16 encoding are the codepoints in planes 15 and 16, which are almost entirely PUA codepoints), but that's not what the surrogateescape handler is using.


[1] https://docs.python.org/3/howto/unicode.html#files-in-an-unknown-encoding

History
Date	User	Action	Args
2018-08-23 21:14:39	mark.dickinson	set	recipients: + mark.dickinson, docs@python
2018-08-23 21:14:39	mark.dickinson	set	messageid: <1535058879.5.0.56676864532.issue34484@psf.upfronthosting.co.za>
2018-08-23 21:14:39	mark.dickinson	link	issue34484 messages
2018-08-23 21:14:39	mark.dickinson	create