Issue13881
Created on 2012-01-26 19:40 by amcnabb, last changed 2012-02-14 21:48 by haypo.
| Messages (4) | |||
|---|---|---|---|
| msg152029 - (view) | Author: Andrew McNabb (amcnabb) | Date: 2012-01-26 19:40 | |
The stream encoder for the zlib_codec doesn't use the incremental encoder, so it has limited usefulness in practice. This is easiest to show with an example.
Here is the behavior with the stream encoder:
>>> filelike = io.BytesIO()
>>> wrapped = codecs.getwriter('zlib_codec')(filelike)
>>> wrapped.write(b'hello')
>>> filelike.getvalue()
b'x\x9c\xab\x00\x00\x00y\x00y'
>>> wrapped.write(b'x')
>>> filelike.getvalue()
b'x\x9c\xab\x00\x00\x00y\x00yx\x9c\xab\x00\x00\x00y\x00y'
>>>
However, this is the behavior of the incremental encoder:
>>> ienc = codecs.getincrementalencoder('zlib_codec')()
>>> ienc.encode(b'x')
b'x\x9c'
>>> ienc.encode(b'x', final=True)
b'\xab\xa8\x00\x00\x01j\x00\xf1'
>>>
The stream encoder is apparently encoding each write as an individual block, but the incremental encoder buffers until it gets a block that's large enough to be meaningfully compressed.
Fixing this may require addressing a separate issue with stream encoders. Unlike with the GzipFile module, closing a stream encoder closes the underlying file. If this underlying file is a BytesIO, then closing makes it free its buffer, making it impossible to get at the completed file.
|
|||
| msg153142 - (view) | Author: Jesús Cea Avión (jcea) * ![]() |
Date: 2012-02-11 23:34 | |
Andrew, could you possibly write a patch and a test for 3.3? |
|||
| msg153358 - (view) | Author: Andrew McNabb (amcnabb) | Date: 2012-02-14 18:43 | |
It looks like encodings/zlib_codec.py defines a custom IncrementalEncoder and IncrementalDecoder, but its StreamWriter and StreamReader rely on the standard implementation of codecs.StreamWriter and codecs.StreamReader. One solution might be to have zlib_codec.StreamWriter inherit from zlib_codec.IncrementalEncoder instead of from zlib_encoder.Codec. I'm not familiar enough with the codecs library to know whether this is the best approach. Unfortunately, there are 120 codec files in the encodings directory, and it's unclear how many of them would need to be modified. Based on the number of them that implement StreamWriter as "class StreamWriter(Codec,codecs.StreamWriter)", it looks like it might be a lot of them. Was each of these 120 files hand-written? |
|||
| msg153372 - (view) | Author: STINNER Victor (haypo) * ![]() |
Date: 2012-02-14 21:48 | |
See also issue #7475. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2012-02-14 21:48:22 | haypo | set | messages: + msg153372 |
| 2012-02-14 18:53:02 | r.david.murray | set | nosy:
+ haypo |
| 2012-02-14 18:43:08 | amcnabb | set | messages: + msg153358 |
| 2012-02-11 23:34:41 | jcea | set | nosy:
+ jcea messages: + msg153142 versions: + Python 3.3, - Python 3.2 |
| 2012-01-26 19:40:48 | amcnabb | create | |
