This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Coverage of codecs.py
Type: Stage:
Components: Library (Lib), Tests Versions: Python 3.3
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: doerwalter, lemburg, martin.panter, ncoghlan, serhiy.storchaka, tleeuwenburg, vstinner
Priority: normal Keywords: patch

Created on 2011-08-22 03:22 by tleeuwenburg, last changed 2022-04-11 14:57 by admin.

Files
File name Uploaded Description Edit
mywork.patch tleeuwenburg, 2011-08-22 03:22 review
issue12808_codecs_coverage.diff ncoghlan, 2011-08-22 05:30 Updated patch, ready for commit review
codecs.diff tleeuwenburg, 2011-08-22 10:12 review
codecs.diff tleeuwenburg, 2011-08-22 23:31 review
codecs.diff tleeuwenburg, 2011-08-26 10:14 review
Messages (13)
msg142661 - (view) Author: Tennessee Leeuwenburg (tleeuwenburg) Date: 2011-08-22 03:22
Increases the coverage of codecs.py by about 3 lines...
msg142662 - (view) Author: Tennessee Leeuwenburg (tleeuwenburg) Date: 2011-08-22 03:23
Also some pep8 compliance outside of the scope of the added lines of code.
msg142676 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2011-08-22 05:30
Tennessee, there were a few issues in the original patch - see the attached update (mostly the use of assert statements instead of methods and the overbroad scope of the context manager when checking for an expected exception).

However, in getting ready to commit this, I noticed something interesting. The docs for the IncrementalEncoder.getstate() API [1] state that the value returned *must* be an integer.

BufferedIncrementalEncoder doesn't obey that limitation - when data is buffered, it returns the raw buffered string instead of encoding it as some kind of integer.

As a separate, but related point, IncrementalDecoder.getstate() includes an explanation on how to save arbitrary state as an integer, but no such explanation (not even a reference to the IncrementalDecoder version) is present in the IncrementalEncoder.getstate() docs.

Adding MAL, since I'd like an expert opinion. Is the API less stringent than the docs state, or should BufferedIncrementalEncoder be fixed to always return the state as an integer?

[1] http://docs.python.org/dev/library/codecs#codecs.IncrementalEncoder.getstate
msg142710 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011-08-22 09:11
Nick Coghlan wrote:
> 
> As a separate, but related point, IncrementalDecoder.getstate() includes an explanation on how to save arbitrary state as an integer, but no such explanation (not even a reference to the IncrementalDecoder version) is present in the IncrementalEncoder.getstate() docs.
> 
> Adding MAL, since I'd like an expert opinion. Is the API less stringent than the docs state, or should BufferedIncrementalEncoder be fixed to always return the state as an integer?
> 
> [1] http://docs.python.org/dev/library/codecs#codecs.IncrementalEncoder.getstate

I'm not sure how that description got into the docs. It must
have been added in the 3.x branch, since the 2.7 version of the
docs doesn't document those method at all.

FWIW: The .getstate() and .setstate() don't restrict the type of data
used for storing the state. The only requirement is that the data
returned by .getstate() can be passed to .setstate() in order to
recreate the internal state used by the codec:

    def getstate(self):
        """
        Return the current state of the encoder.
        """
        return 0

    def setstate(self, state):
        """
        Set the current state of the encoder. state must have been
        returned by getstate().
        """
For practical reasons, the used data should be pickleable.

The interface is very similar to the __getstate__/__setstate__
interface used by pickle.
msg142716 - (view) Author: Tennessee Leeuwenburg (tleeuwenburg) Date: 2011-08-22 10:12
Some more tests, updated initial state of BufferedIncrementalEncoder to be the correct type, updated rst file. Bit tired, hope I got it right!

Thanks for the feedback everyone, helps me to get it done, even if it's more work for you...
msg142722 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011-08-22 12:36
Tennessee Leeuwenburg wrote:
> 
> Tennessee Leeuwenburg <tleeuwenburg@gmail.com> added the comment:
> 
> Some more tests, updated initial state of BufferedIncrementalEncoder to be the correct type, updated rst file. Bit tired, hope I got it right!
> 
> Thanks for the feedback everyone, helps me to get it done, even if it's more work for you...

I think you should simply drop this whole part:

"""
The implementation should make sure that ``0`` is the most common state. (States
that are more complicated than integers can be converted into an integer by
marshaling/pickling the state and encoding the bytes of the resulting string
into an integer).
"""

or, better, rephrase it so that it becomes clear that the codec
implementation may use any type of pickleable object to represent
the internal state. The only requirement is that .setstate()
has to be able to read back the state returned by .getstate().

We have codecs in the stdlib that return integers, bytes/string
and even tuples (see the io module and the UTF-16 codec as example),
so the documentation is clearly wrong.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

________________________________________________________________________
2011-10-04: PyCon DE 2011, Leipzig, Germany                43 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/
msg142774 - (view) Author: Tennessee Leeuwenburg (tleeuwenburg) Date: 2011-08-22 23:31
Thanks for the review. Here is a patch incorporating the two comments being to move some comments.
msg142793 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011-08-23 08:01
Tennessee Leeuwenburg wrote:
> 
> Tennessee Leeuwenburg <tleeuwenburg@gmail.com> added the comment:
> 
> Thanks for the review. Here is a patch incorporating the two comments being to move some comments.

Hmm, the documentation patch doesn't appear to have changed. Did you
upload the right patch ?
msg143003 - (view) Author: Tennessee Leeuwenburg (tleeuwenburg) Date: 2011-08-26 10:14
Here is a stab at updated documentation. I would suggest that if further changes are recommended to the documentation, that a core committer go ahead and make them. I'm absolutely more than happy to keep taking "stabs" at it, but ultimately I probably don't understand the purpose of these classes as well as some of the rest of you, and I don't feel best placed to decide exactly how this should read....
msg221765 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-06-28 01:55
Would someone who's commented previously do a commit review please?
msg221778 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2014-06-28 09:58
The requirement that getstate() returns a (buffer, int) tuple has to do with the fact that for text streams seek() and tell() somehow have to take the state of the codec into account. See _pyio.TextIOWrapper.(seek|tell|_pack_cookie|_unpack_cookie).

However I can't remember the exact history of the specification.
msg221815 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2014-06-28 19:31
The patch would have to be updated to 3.5 (part of it no longer applies), but other than that I think it's fine.

It may make sense to readd the comment for .getstate() to keep the state as simple as possible ("The implementation should make sure that ``0`` is the most common state."), but without requiring a specific number, type, etc.
msg222474 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-07-07 16:48
See also issue20420.
History
Date User Action Args
2022-04-11 14:57:20adminsetgithub: 57017
2019-03-15 22:11:31BreamoreBoysetnosy: - BreamoreBoy
2015-04-08 02:50:14martin.pantersetnosy: + martin.panter
2014-07-07 16:48:37serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg222474
2014-06-28 19:31:47lemburgsetmessages: + msg221815
2014-06-28 09:58:33doerwaltersetnosy: + doerwalter
messages: + msg221778
2014-06-28 01:55:31BreamoreBoysetnosy: + BreamoreBoy
messages: + msg221765
2011-08-28 18:06:32terry.reedysetcomponents: + Library (Lib), Tests
versions: + Python 3.3, - Python 3.4
2011-08-26 10:15:03tleeuwenburgsetfiles: + codecs.diff

messages: + msg143003
2011-08-23 08:02:21vstinnersetnosy: + vstinner
2011-08-23 08:01:40lemburgsetmessages: + msg142793
2011-08-22 23:31:18tleeuwenburgsetfiles: + codecs.diff

messages: + msg142774
2011-08-22 12:36:20lemburgsetmessages: + msg142722
2011-08-22 10:12:32tleeuwenburgsetfiles: + codecs.diff

messages: + msg142716
2011-08-22 09:11:28lemburgsetmessages: + msg142710
2011-08-22 05:30:25ncoghlansetfiles: + issue12808_codecs_coverage.diff
nosy: + lemburg
messages: + msg142676

2011-08-22 03:23:34tleeuwenburgsetmessages: + msg142662
2011-08-22 03:22:51tleeuwenburgcreate