Message 202593 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	zuo
Recipients	docs@python, zuo
Date	2013-11-11.01:29:12
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1384133353.82.0.173341188397.issue19548@psf.upfronthosting.co.za>
In-reply-to

Content
When learning about the 'codecs' module I encountered several places in the docs of the module that, I believe, could be improved to be clearer and easier for codecs-begginers: 1. Ad `codecs.encode` and `codecs.decode` descriptions: I believe it would be worth to mention that, unlike str.encode()/bytes.decode(), these functions (and all their counterparts in the classes the module contains) support not only "traditional str/bytes encodings", but also bytes-to-bytes as well as str-to-str encodings. 2. Ad 'codecs.register': in two places there is such a text: `These have to be factory functions providing the following interface: factory([...] errors='strict')` -- `errors='strict'` may be confusing (at the first sight it may suggest that the only valid value is 'strict'; maybe `factory(errors=<error handler label>)` with an appropriate description below would be better?). 3. Ad `codecs.open`: I believe there should be a reference to the built-in open() as an alternative that is better is most cases. 4. Ad `codecs.BOM`: `These constants define various encodings of the Unicode byte order mark (BOM).` -- the world `encodings` seems to be confusing here; maybe `These constants define various byte sequences being Unicode byte order marks (BOMs) for several encodings. They are used...` would be better? 5. Ad `7.2.1. Codec Base Classes` + `codecs.IncrementalEncoder`/`codecs/IncrementalDecoder`: `Each codec has to define four interfaces to make it usable as codec in Python: stateless encoder, stateless decoder, stream reader and stream writer` -- only four? Not six? What about incremental encoder/decoder??? * Comparing the fragments (and tables) about error halding methods (Codecs Base Classes, IncrementalEncoder, IncrementalDecoder) with similar fragment in the `codecs.register` description and with the `codecs.register_error` description I was confused: is it the matter of a particular codec implementation or of a registered error handler to implement a particular way of error handling? I believe it would be worth to describe clearly relations between these elements of the API. Also more detailed description of differences beetween error handling for encoding and decoding, and translation would be a good thing. 6. Ad `7.2.1.6. StreamReaderWriter Objects` and `7.2.1.7. StreamRecoder Objects`: It would be worth to say explicitly that, contrary to previously described abstract classes (IncrementalEncoder/Decoder, StreamReader/Writer), these classes are concrete ones (if I understand it correctly). 7. Ad `7.2.4. Python Specific Encodings`: * `raw_unicode_encoding` -- see: ticket #19539. * `unicode_encoding` -- `Produce a string that is suitable as Unicode literal in Python source code` but it is not a string; it's a bytes object (which could be used in source code using an `ascii`-compatibile encoding). * `bytes-to-bytes` and `str-to-str` encodings -- maybe it would be nice to mention that these encodings cannot be used with str.encode()/bytes.decode() methods (and to mention again they can be used with the functions/method provided by the `codecs` module).

When learning about the 'codecs' module I encountered several places in the docs of the module that, I believe, could be improved to be clearer and easier for codecs-begginers:

1. Ad `codecs.encode` and `codecs.decode` descriptions: I believe it would be worth to mention that, unlike str.encode()/bytes.decode(), these functions (and all their counterparts in the classes the module contains) support not only "traditional str/bytes encodings", but also bytes-to-bytes as well as str-to-str encodings.

2. Ad 'codecs.register': in two places there is such a text: `These have to be factory functions providing the following interface: factory([...] errors='strict')` -- `errors='strict'` may be confusing (at the first sight it may suggest that the only valid value is 'strict'; maybe `factory(errors=<error handler label>)` with an appropriate description below would be better?).

3. Ad `codecs.open`: I believe there should be a reference to the built-in open() as an alternative that is better is most cases.

4. Ad `codecs.BOM*`: `These constants define various encodings of the Unicode byte order mark (BOM).` -- the world `encodings` seems to be confusing here; maybe `These constants define various byte sequences being Unicode byte order marks (BOMs) for several encodings. They are used...` would be better?

5. Ad `7.2.1. Codec Base Classes` + `codecs.IncrementalEncoder`/`codecs/IncrementalDecoder`:
* `Each codec has to define four interfaces to make it usable as codec in Python: stateless encoder, stateless decoder, stream reader and stream writer` -- only four? Not six? What about incremental encoder/decoder???
* Comparing the fragments (and tables) about error halding methods (Codecs Base Classes, IncrementalEncoder, IncrementalDecoder) with similar fragment in the `codecs.register` description and with the `codecs.register_error` description I was confused: is it the matter of a particular codec implementation or of a registered error handler to implement a particular way of error handling? I believe it would be worth to describe clearly relations between these elements of the API. Also more detailed description of differences beetween error handling for encoding and decoding, and translation would be a good thing.

6. Ad `7.2.1.6. StreamReaderWriter Objects` and `7.2.1.7. StreamRecoder Objects`: It would be worth to say explicitly that, contrary to previously described abstract classes (IncrementalEncoder/Decoder, StreamReader/Writer), these classes are *concrete* ones (if I understand it correctly).

7. Ad `7.2.4. Python Specific Encodings`:
* `raw_unicode_encoding` -- see: ticket #19539.
* `unicode_encoding` -- `Produce a string that is suitable as Unicode literal in Python source code` but it is *not* a string; it's a *bytes* object (which could be used in source code using an `ascii`-compatibile encoding).
* `bytes-to-bytes` and `str-to-str` encodings -- maybe it would be nice to mention that these encodings cannot be used with str.encode()/bytes.decode() methods (and to mention again they *can* be used with the functions/method provided by the `codecs` module).

History
Date	User	Action	Args
2013-11-11 01:29:13	zuo	set	recipients: + zuo, docs@python
2013-11-11 01:29:13	zuo	set	messageid: <1384133353.82.0.173341188397.issue19548@psf.upfronthosting.co.za>
2013-11-11 01:29:13	zuo	link	issue19548 messages
2013-11-11 01:29:12	zuo	create