Message 233940 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	zuo
Recipients	docs@python, zuo
Date	2015-01-13.15:04:26
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1421161467.08.0.0207328104239.issue23232@psf.upfronthosting.co.za>
In-reply-to

Content
To some extent, this issue is a follow-up of Issue 20132. It concerns some parts of functionality + documentation of the 'codecs' module related to registering custom codecs, especially non-string ones (i.e., codecs that encode/decode between arbitrary types, not necessarily the str and bytes types). A few fragments of documented behaviour and/or documentation itself bother me: 0. Ad "7.2.1. Codec Base Classes" "Each codec has to define four interfaces to make it usable as codec in Python: stateless encoder, stateless decoder, stream reader and stream writer. The stream reader and writers typically reuse the stateless encoder/decoder to implement the file protocols. Codec authors also need to define how the codec will handle encoding and decoding errors." IMHO it is still unclear: a) what is the relation between codecs in this meaning and CodecInfo objects? (especially: CodecInfo contains information about six interfaces, not four) b) How codec authors define "how the codec will handle encoding and decoding errors"? What is relation between this and error handling schemes (defined as generic, not per-codec ones) documented below? 1. Ad "7.2.1.1. Error Handlers" and "codecs.strict_errors(exception)" "'strict' Raise UnicodeError (or a subclass); this is the default. Implemented in strict_errors()." "codecs.strict_errors(exception) Implements the 'strict' error handling: each encoding or decoding error raises a UnicodeError." Is it true that always it is a UnicodeError or its subclass and not just ValueError or its subclass? (as it is described in other fragments of the module documentation). Please note, that 'strict' is documented as a universal (and not e.g. text-encoding-only) error handling scheme. So, what about non-string codecs? 2. Ad "codecs.register_error(name, error_handler)" "For encoding, error_handler will be called with a UnicodeEncodeError instance..." "Decoding and translating works similarly, except UnicodeDecodeError or UnicodeTranslateError will be passed..." Again: what about non-string codecs? UnicodeError subclasses do not seem to be appropriate for them. 3. It would be nice to address the Zoinkity's concerns from the Issue 20132 (partially related to the above points): """ One glaring omission is any information about multibyte codecs--the class, its methods, and how to even define one. Also, the primary use for codecs.register would be to append a single codec to the lookup registry. Simple usage of the method only provides lookup for the provided codecs and will not include regularly-accessible ones such as "utf-8". It would be enormously helpful to provide an example of proper, safe usage. """

To some extent, this issue is a follow-up of Issue 20132. It concerns some parts of functionality + documentation of the 'codecs' module related to registering custom codecs, especially non-string ones (i.e., codecs that encode/decode between arbitrary types, not necessarily the str and bytes types).

A few fragments of documented behaviour and/or documentation itself bother me:

0. Ad "7.2.1. Codec Base Classes"

"Each codec has to define four interfaces to make it usable as codec in Python: stateless encoder, stateless decoder, stream reader and stream writer. The stream reader and writers typically reuse the stateless encoder/decoder to implement the file protocols. Codec authors also need to define how the codec will handle encoding and decoding errors."

IMHO it is still unclear:

a) what is the relation between codecs in this meaning and CodecInfo objects? (especially: CodecInfo contains information about six interfaces, not four)

b) How codec authors define "how the codec will handle encoding and decoding errors"? What is relation between this and error handling schemes (defined as generic, not per-codec ones) documented below?

1. Ad "7.2.1.1. Error Handlers" and "codecs.strict_errors(exception)"

"'strict' Raise UnicodeError (or a subclass); this is the default. Implemented in strict_errors()."

"codecs.strict_errors(exception)
Implements the 'strict' error handling: each encoding or decoding error raises a UnicodeError."

Is it true that always it is a UnicodeError or its subclass and not just ValueError or its subclass? (as it is described in other fragments of the module documentation).

Please note, that 'strict' is documented as a universal (and not e.g. text-encoding-only) error handling scheme. So, what about non-string codecs?

2. Ad "codecs.register_error(name, error_handler)"

"For encoding, error_handler will be called with a UnicodeEncodeError instance..." "Decoding and translating works similarly, except UnicodeDecodeError or UnicodeTranslateError will be passed..."

Again: what about non-string codecs? UnicodeError subclasses do not seem to be appropriate for them.

3. It would be nice to address the Zoinkity's concerns from the Issue 20132 (partially related to the above points):

"""
One glaring omission is any information about multibyte codecs--the class, its methods, and how to even define one.

Also, the primary use for codecs.register would be to append a single codec to the lookup registry. Simple usage of the method only provides lookup for the provided codecs and will not include regularly-accessible ones such as "utf-8". It would be enormously helpful to provide an example of proper, safe usage.
"""

History
Date	User	Action	Args
2015-01-13 15:04:27	zuo	set	recipients: + zuo, docs@python
2015-01-13 15:04:27	zuo	set	messageid: <1421161467.08.0.0207328104239.issue23232@psf.upfronthosting.co.za>
2015-01-13 15:04:27	zuo	link	issue23232 messages
2015-01-13 15:04:26	zuo	create