classification
Title: Stream encoder for zlib_codec doesn't use the incremental encoder
Type: behavior Stage:
Components: IO Versions: Python 3.3
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: amcnabb, haypo, jcea
Priority: normal Keywords:

Created on 2012-01-26 19:40 by amcnabb, last changed 2012-02-14 21:48 by haypo.

Messages (4)
msg152029 - (view) Author: Andrew McNabb (amcnabb) Date: 2012-01-26 19:40
The stream encoder for the zlib_codec doesn't use the incremental encoder, so it has limited usefulness in practice. This is easiest to show with an example.

Here is the behavior with the stream encoder:

>>> filelike = io.BytesIO()
>>> wrapped = codecs.getwriter('zlib_codec')(filelike)
>>> wrapped.write(b'hello')
>>> filelike.getvalue()
b'x\x9c\xab\x00\x00\x00y\x00y'
>>> wrapped.write(b'x')
>>> filelike.getvalue()
b'x\x9c\xab\x00\x00\x00y\x00yx\x9c\xab\x00\x00\x00y\x00y'
>>>

However, this is the behavior of the incremental encoder:

>>> ienc = codecs.getincrementalencoder('zlib_codec')()
>>> ienc.encode(b'x')
b'x\x9c'
>>> ienc.encode(b'x', final=True)
b'\xab\xa8\x00\x00\x01j\x00\xf1'
>>>

The stream encoder is apparently encoding each write as an individual block, but the incremental encoder buffers until it gets a block that's large enough to be meaningfully compressed.

Fixing this may require addressing a separate issue with stream encoders. Unlike with the GzipFile module, closing a stream encoder closes the underlying file. If this underlying file is a BytesIO, then closing makes it free its buffer, making it impossible to get at the completed file.
msg153142 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012-02-11 23:34
Andrew, could you possibly write a patch and a test for 3.3?
msg153358 - (view) Author: Andrew McNabb (amcnabb) Date: 2012-02-14 18:43
It looks like encodings/zlib_codec.py defines a custom IncrementalEncoder and IncrementalDecoder, but its StreamWriter and StreamReader rely on the standard implementation of codecs.StreamWriter and codecs.StreamReader.

One solution might be to have zlib_codec.StreamWriter inherit from zlib_codec.IncrementalEncoder instead of from zlib_encoder.Codec. I'm not familiar enough with the codecs library to know whether this is the best approach.

Unfortunately, there are 120 codec files in the encodings directory, and it's unclear how many of them would need to be modified. Based on the number of them that implement StreamWriter as "class StreamWriter(Codec,codecs.StreamWriter)", it looks like it might be a lot of them. Was each of these 120 files hand-written?
msg153372 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-02-14 21:48
See also issue #7475.
History
Date User Action Args
2012-02-14 21:48:22hayposetmessages: + msg153372
2012-02-14 18:53:02r.david.murraysetnosy: + haypo
2012-02-14 18:43:08amcnabbsetmessages: + msg153358
2012-02-11 23:34:41jceasetnosy: + jcea

messages: + msg153142
versions: + Python 3.3, - Python 3.2
2012-01-26 19:40:48amcnabbcreate