New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'codecs' module docs improvements #63747
Comments
When learning about the 'codecs' module I encountered several places in the docs of the module that, I believe, could be improved to be clearer and easier for codecs-begginers:
|
s/world/word (sorry, it's late night here) |
|
|
A few more:
|
Another big one: the encodings module API is not documented in the prose docs, and nor is the interface between the default search function and the individual encoding definitions. There's some decent info in help(encoding) though. The interaction with the import system could also be documented better - you can actually blacklist codecs by manipulating sys.modules and the encodings namespace, and you can search additional locations for codec modules by manipulating encodings.__path__ (even without it being declared as a namespace package) |
On 16.11.2013 14:25, Nick Coghlan wrote:
Those were not documented on purpose, since they are an implementation If you document them now, you'll set the implementation in stone, |
Could they be documented with a massive warning in red "Cpython implementation detail - subject to change without notice"? Or documented in a place that is only accessible to developers and not users? Or...??? |
On 16.11.2013 15:03, Mark Lawrence wrote:
The API is documented in encodings/init.py for developers. |
On 16 November 2013 23:33, Marc-Andre Lemburg <report@bugs.python.org> wrote:
Yes, that was what got me thinking along those lines, but to make that |
Addition to the list of improvements:
|
One glaring omission is any information about multibyte codecs--the class, its methods, and how to even define one. Also, the primary use for codecs.register would be to append a single codec to the lookup registry. Simple usage of the method only provides lookup for the provided codecs and will not include regularly-accessible ones such as "utf-8". It would be enormously helpful to provide an example of proper, safe usage. |
Here is a patch addressing many of the points raised. Please have a look and give any feedback. Beware I am not very familiar with the Restructured Text markup and haven’t tried compiling it.
## Jan’s points not yet addressed: ##
## Numbering Nick’s points: ##
[13. Registration not reversible: Added in patch] [14. Added CodecInfo class, pulling out some existing details from register().]
## My (Martin’s) point: ## [17. IncrementalEncoder.reset(): done] ## Zoinkity’s points, not addressed: ##
## Some new points of my own that need fixing: ##
|
Adding patch v2 after learning how to compile the docs and fixing my errors. I also simplified the descriptions of the CodecInfo attributes by defering the constructor signatures to where they are fully defined under “Codec base classes”, and merged the list of error handlers there as well. A side effect of merging error handler lists is that “surrogatepass” is now defined for codecs in general, not just Codec.encode() and decode(). Also I noticed that “unicode_escape” actually does Latin-1 decoding. |
Thanks for those drafts, Martin - they look like a strong improvement to me. While I still had plenty of comments/questions on v2, I think that's more a reflection on how long it has been since we gave these docs a thorough overall review, moreso than a reflection on the proposed changes. Victor - I added you to the nosy list for this one, as I'd specifically like your comments on the StreamReader/Writer docs updates. I'd like to make it clear that these are distinct from the "text encoding only" APIs in the io module, while still accurately describing the behaviour of the standard codecs. |
New patch version addressing many of the comments; thanks for reviewing! Also adds and extends some unit tests to confirm some of the corner cases I am documenting. |
I started making a few edits based on Zuo and Walter's comments while getting this patch ready for merging, and decided the end result could benefit from an additional round of feedback before committing it. This particular patch is also aimed at the Python 3.4 maintenance branch rather than at trunk - the introduction of the new namereplace error handler in 3.5 means that the previous patch didn't apply cleanly to the maintenance branch. While Zoinkity's feedback is also valid (i.e. multibyte codecs aren't documented properly, custom codec registration is both harder than it really should be and not well documented), I think those are better filed and handled as separate issues, rather than trying to handle them here as part of the general "bring the current content of the codec module documentation up to date with the current state of Python 3". |
Adding patch v5, for the 3.4 branch. There is at least one reference that still needs fixing in the default branch that is not applicable to the 3.4 branch. Main changes from Nick’s patch:
|
New changeset 0646eee8296a by Nick Coghlan in branch '3.4': New changeset 4d00d0109147 by Nick Coghlan in branch 'default': |
Thanks for the work on this folks, both Jan for the feedback, Martin for the writing, and everyone else for their comments. I don't believe we addressed all of Jan's comments, but I'd like to request that any further comments be filed as separate issues, now that the larger restructure of the content is out of the way. |
Thanks Nick. Here is a small followup patch for the default (3.5) branch to keep things consistent. |
New changeset 20a5a56ce090 by Nick Coghlan in branch 'default': |
Thanks for the follow-up patch Martin - I missed those when I did the merge forward from 3.4. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: