classification
Title: Unicode HOWTO incorrectly refers to Private Use Area for surrogateescape
Type: Stage: resolved
Components: Documentation Versions: Python 3.7, Python 3.6
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: akuchling Nosy List: akuchling, docs@python, mark.dickinson, miss-islington
Priority: normal Keywords: patch

Created on 2018-08-23 21:14 by mark.dickinson, last changed 2019-05-14 18:36 by akuchling. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 12155 merged miss-islington, 2019-03-04 04:10
Messages (8)
msg323976 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-08-23 21:14
The Unicode HOWTO currently has contains this text in the "Files in an Unknown Encoding" section [1]:

> The surrogateescape error handler will decode any non-ASCII bytes as code
> points in the Unicode Private Use Area ranging from U+DC80 to U+DCFF. These
> private code points will then be turned back into the same bytes when the
> surrogateescape error handler is used when encoding the data and writing it
> back out.

Unless I'm missing something, the subrange U+DC80 to U+DCFF of the low surrogates is *not* a Private Use Area. There *is* a kinda-sorta PUA in the high surrogates from U+DB80 to U+DBFF (because the only valid codepoints that use these surrogates in their UTF-16 encoding are the codepoints in planes 15 and 16, which are almost entirely PUA codepoints), but that's not what the surrogateescape handler is using.


[1] https://docs.python.org/3/howto/unicode.html#files-in-an-unknown-encoding
msg323977 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-08-23 21:24
For history, this text was introduced as a result of issue #4163.
msg323978 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-08-23 21:25
Whoops. Sorry, that should be #4153.
msg325432 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2018-09-15 13:46
Corrected in the unicode-howto-update branch being developed for issue #20906.
msg337069 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2019-03-04 04:10
New changeset 97c288df614dd7856f5a0336925f56a7a2a5bc74 by Andrew Kuchling in branch 'master':
bpo-20906: Various revisions to the Unicode howto  (#8394)
https://github.com/python/cpython/commit/97c288df614dd7856f5a0336925f56a7a2a5bc74
msg337105 - (view) Author: miss-islington (miss-islington) Date: 2019-03-04 13:01
New changeset 84fa6b9e5932af981cb299c0c5ac80b9cc37c3fa by Miss Islington (bot) in branch '3.7':
bpo-20906: Various revisions to the Unicode howto  (GH-8394)
https://github.com/python/cpython/commit/84fa6b9e5932af981cb299c0c5ac80b9cc37c3fa
msg337222 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2019-03-05 16:32
Thanks for the fix. @akuchling: safe to close this issue?
msg342499 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2019-05-14 18:36
Yes, I think this issue can now be closed.
History
Date User Action Args
2019-05-14 18:36:01akuchlingsetstatus: open -> closed

messages: + msg342499
stage: patch review -> resolved
2019-03-05 16:32:35mark.dickinsonsetmessages: + msg337222
2019-03-04 13:01:50miss-islingtonsetnosy: + miss-islington
messages: + msg337105
2019-03-04 04:10:40miss-islingtonsetkeywords: + patch
pull_requests: + pull_request12154
2019-03-04 04:10:38akuchlingsetmessages: + msg337069
2019-03-02 21:50:48akuchlingsetstage: patch review
2018-09-15 13:46:48akuchlingsetassignee: docs@python -> akuchling

messages: + msg325432
nosy: + akuchling
2018-08-23 21:25:15mark.dickinsonsetmessages: + msg323978
2018-08-23 21:24:39mark.dickinsonsetmessages: + msg323977
2018-08-23 21:14:39mark.dickinsoncreate