Message323976
The Unicode HOWTO currently has contains this text in the "Files in an Unknown Encoding" section [1]:
> The surrogateescape error handler will decode any non-ASCII bytes as code
> points in the Unicode Private Use Area ranging from U+DC80 to U+DCFF. These
> private code points will then be turned back into the same bytes when the
> surrogateescape error handler is used when encoding the data and writing it
> back out.
Unless I'm missing something, the subrange U+DC80 to U+DCFF of the low surrogates is *not* a Private Use Area. There *is* a kinda-sorta PUA in the high surrogates from U+DB80 to U+DBFF (because the only valid codepoints that use these surrogates in their UTF-16 encoding are the codepoints in planes 15 and 16, which are almost entirely PUA codepoints), but that's not what the surrogateescape handler is using.
[1] https://docs.python.org/3/howto/unicode.html#files-in-an-unknown-encoding |
|
Date |
User |
Action |
Args |
2018-08-23 21:14:39 | mark.dickinson | set | recipients:
+ mark.dickinson, docs@python |
2018-08-23 21:14:39 | mark.dickinson | set | messageid: <1535058879.5.0.56676864532.issue34484@psf.upfronthosting.co.za> |
2018-08-23 21:14:39 | mark.dickinson | link | issue34484 messages |
2018-08-23 21:14:39 | mark.dickinson | create | |
|