Title: Errors in documentation of standard codec error handlers
Type: enhancement Stage:
Components: Documentation Versions: Python 3.3, Python 3.4, Python 3.5
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: RalfM, docs@python, ezio.melotti, martin.panter, mrabarnett, ncoghlan
Priority: normal Keywords:

Created on 2014-01-27 20:41 by RalfM, last changed 2015-02-07 00:01 by ncoghlan.

Messages (5)
msg209477 - (view) Author: (RalfM) Date: 2014-01-27 20:41
The standard library documentation lists the standard codec error handlers in three places:

(a) 2. Build-in Functions, section open()
(b) 7.2 codecs - Codec registry and base classes
(c) 7.2.1 Codec Base Classes

As far as I can judge these lists, (c) looks ok, but (a) and (b) contain two errors:
1. 'surrogatepass' is not mentioned.
2. 'surrogateescape' is described as: 
   'on decoding, replace with code points in the Unicode Private
   Use Area ranging from U+DC80 to U+DCFF. These private code points
   will ...' 
   This is incorrect in so far as U+DC80 to U+DCFF are not private 
   code points, but (low-)surrogate code points. This is correctly
   explained in (c) and in PEP383 (and, of course, in the Unicode 
   standard, chapter 16).

I suggest to correct (a) and (b) by
* adding 'surrogatepass' with the description given in (c),
* changing the description of 'surrogateescape' to something like: 
  'on decoding, replace with surrogate code points ranging from 
  U+DC80 to U+DCFF. These surrogate code points will ...'.

These errors are present in the documentation (more precisely, the .chm files) of at least 
- Python 3.3.3
- Python 3.3.4rc1
- Python 3.4.0b3.
msg209502 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-01-28 05:43
I plan to take a look at the codec docs in general in the next week or so, I'll tackle this as well.
msg235496 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2015-02-06 20:40
The docs for Python 3.5.0a0 still say "Unicode Private Use Area".
msg235500 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-06 21:44
I changed “code point in the Unicode Private Use Area” to “individual surrogate code” in the “codecs” module documentation for Issue 19548. So perhaps (a) still needs addressing, but (b) and (c) are hopefully already fixed.
msg235506 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2015-02-07 00:01
Ah, February 2014, many of my plans went in rather different directions than expected that month, and this was one of them :)

As Martin noted, he already fixed (b) and (c), but we missed that the list of error handlers was also duplicated in the builtin open() docs.

That duplication is likely worthwhile from a docs usability perspective, but we should:

1. Bring it in line with Martin's recent fixes to the codecs module docs
2. Add a comment in the error handler docs noting that the open() docs may need to be updated to reflect changes to error handler semantics
Date User Action Args
2015-02-07 00:01:07ncoghlansetassignee: ncoghlan ->
messages: + msg235506
2015-02-06 21:44:08martin.pantersetnosy: + martin.panter
messages: + msg235500
2015-02-06 20:40:15mrabarnettsetnosy: + mrabarnett

messages: + msg235496
versions: + Python 3.5
2014-02-15 15:48:31ezio.melottisetnosy: + ezio.melotti
2014-01-28 05:43:55ncoghlansetassignee: docs@python -> ncoghlan

messages: + msg209502
nosy: + ncoghlan
2014-01-27 20:41:06RalfMcreate