Title: Unicode HOWTO incorrectly refers to Private Use Area for surrogateescape
Components: Documentation Versions: Python 3.7, Python 3.6
Created on 2018-08-23 21:14 by mark.dickinson, last changed 2022-04-11 14:59 by admin. This issue is now closed.

PR 12155 merged miss-islington, 2019-03-04 04:10
msg323976 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-08-23 21:14
The Unicode HOWTO currently has contains this text in the "Files in an Unknown Encoding" section [1]:

> The surrogateescape error handler will decode any non-ASCII bytes as code
> points in the Unicode Private Use Area ranging from U+DC80 to U+DCFF. These
> private code points will then be turned back into the same bytes when the
> surrogateescape error handler is used when encoding the data and writing it
> back out.

Unless I'm missing something, the subrange U+DC80 to U+DCFF of the low surrogates is *not* a Private Use Area. There *is* a kinda-sorta PUA in the high surrogates from U+DB80 to U+DBFF (because the only valid codepoints that use these surrogates in their UTF-16 encoding are the codepoints in planes 15 and 16, which are almost entirely PUA codepoints), but that's not what the surrogateescape handler is using.

msg323977 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-08-23 21:24
For history, this text was introduced as a result of issue #4163.
msg323978 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-08-23 21:25
Whoops. Sorry, that should be #4153.
msg325432 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2018-09-15 13:46
Corrected in the unicode-howto-update branch being developed for issue #20906.
msg337069 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2019-03-04 04:10
New changeset 97c288df614dd7856f5a0336925f56a7a2a5bc74 by Andrew Kuchling in branch 'master':
bpo-20906: Various revisions to the Unicode howto  (#8394)
msg337105 - (view) Author: miss-islington (miss-islington) Date: 2019-03-04 13:01
New changeset 84fa6b9e5932af981cb299c0c5ac80b9cc37c3fa by Miss Islington (bot) in branch '3.7':
bpo-20906: Various revisions to the Unicode howto  (GH-8394)
msg337222 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2019-03-05 16:32
Thanks for the fix. @akuchling: safe to close this issue?
msg342499 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2019-05-14 18:36
Yes, I think this issue can now be closed.
