This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author mark.dickinson
Recipients docs@python, mark.dickinson
Date 2018-08-23.21:14:39
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1535058879.5.0.56676864532.issue34484@psf.upfronthosting.co.za>
In-reply-to
Content
The Unicode HOWTO currently has contains this text in the "Files in an Unknown Encoding" section [1]:

> The surrogateescape error handler will decode any non-ASCII bytes as code
> points in the Unicode Private Use Area ranging from U+DC80 to U+DCFF. These
> private code points will then be turned back into the same bytes when the
> surrogateescape error handler is used when encoding the data and writing it
> back out.

Unless I'm missing something, the subrange U+DC80 to U+DCFF of the low surrogates is *not* a Private Use Area. There *is* a kinda-sorta PUA in the high surrogates from U+DB80 to U+DBFF (because the only valid codepoints that use these surrogates in their UTF-16 encoding are the codepoints in planes 15 and 16, which are almost entirely PUA codepoints), but that's not what the surrogateescape handler is using.


[1] https://docs.python.org/3/howto/unicode.html#files-in-an-unknown-encoding
History
Date User Action Args
2018-08-23 21:14:39mark.dickinsonsetrecipients: + mark.dickinson, docs@python
2018-08-23 21:14:39mark.dickinsonsetmessageid: <1535058879.5.0.56676864532.issue34484@psf.upfronthosting.co.za>
2018-08-23 21:14:39mark.dickinsonlinkissue34484 messages
2018-08-23 21:14:39mark.dickinsoncreate