classification
Title: Inconsistency in the zlib module
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Ellison Marks, gregory.p.smith, martin.panter, xiang.zhang
Priority: normal Keywords:

Created on 2017-04-05 23:28 by Ellison Marks, last changed 2017-05-20 01:49 by xiang.zhang.

Messages (8)
msg291201 - (view) Author: Ellison Marks (Ellison Marks) Date: 2017-04-05 23:28
In the zlib module, three of the methods support the wbits parameter, those being zlib.compressobj, zlib.decompress and zlib.decompressobj. zlib.compress does not support the wbits parameter. Looking at the source for these functions, those that support the wbits parameter use the "advanced" version of the zlib functions, deflateInit2 and inflateInit2, whereas zlib.compress uses the "basic" function deflateInit.

The effect of this is that while you can decode from zlib data with non-default wbits values in one call with zlib.decompress, you cannot encode to zlib data with non-default wbits in one call with zlib.compress.  You need to take to extra step of creating a compression object with the appropriate values, then use that to compress the data. eg:

zlib.compress(data) # can't use wbits here

vs.

compressor = zlib.compressobj(wbits=16+zlib.MAX_WBITS)
compressor.compress(data) + compressor.flush()

Some quick benchmarking shows little speed difference between the two implementations:

$ python -m timeit -s 'import zlib' -s 'import random' -s 'import string' -s 's="".join(random.choice(string.printable) for _ in xrange(10000000))' 'zlib.compress(s)'
10 loops, best of 3: 356 msec per loop

$ python -m timeit -s 'import zlib' -s 'import random' -s 'import string' -s 's="".join(random.choice(string.printable) for _ in xrange(10000000))' 'compressor=zlib.compressobj()' 'compressor.compress(s)+compressor.flush()'
10 loops, best of 3: 364 msec per loop

so I can't see any downside of switching zlib.compress to the "advanced" implementation and exposing the extra parameters to python.
msg291247 - (view) Author: Ellison Marks (Ellison Marks) Date: 2017-04-06 21:58
I made a try at a patch for this.

My C is rudimentary at best, so I was hoping someone could look it over before I submitted a PR?

https://github.com/gotyaoi/cpython/commit/2906fc9069ce6ec4888a547b5088ef9177a21c9a
msg291252 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-04-07 05:32
Such a change in my mind is an enhancement and only could get into 3.7.

IMHO zlib.compress() and zlib.decompress() are for simple usage, zlib.compressobj() and zlib.decompressobj() for advanced usage if you want more controls. Actually I would prefer simplifying zlib.decompress() signature (dropping the wbits parameter) than adding more complex to zlib.compress(). Of course we can't do that to maintain backwards compatibility. :-)

-1 on this change.
msg291253 - (view) Author: Ellison Marks (Ellison Marks) Date: 2017-04-07 05:56
I'm not sure I agree with that. The docs for compressobj just say

"Returns a compression object, to be used for compressing data streams that won’t fit into memory at once."

Which I don't think says much about the complexity aspect. Whether you're compressing a smaller bit of data or a stream, I think the optional parameters in compressobj are just as applicable to compress. When you've got an in-memory chunk of data, it seems to be going out of the way to construct a compressobj just to get at the optional parameters.
msg291311 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-04-08 02:08
I don’t have a strong opinion on adding the missing parameters to the one-shot “compress” function, though it does seem beneficial to have a consistent set of parameters supported across the relevant APIs.
msg293943 - (view) Author: Ellison Marks (Ellison Marks) Date: 2017-05-18 23:31
Erm, is there anyone else we should poke for their opinion then?
msg293962 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2017-05-19 17:09
I think adding these parameters to the zlib.compress() function is a fine addition for 3.7.  They are more convenient for those (few) who do need them.

I'd go ahead and turn it into a cpython PR.
msg293990 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-05-20 01:49
Since Martin and Gregory support it, Ellison if you'd like you could make a PR for it. But code change alone is not enough. You need also adding corresponding tests and docs.
History
Date User Action Args
2017-05-20 01:49:44xiang.zhangsetmessages: + msg293990
2017-05-19 17:09:12gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg293962
2017-05-18 23:31:02Ellison Markssetmessages: + msg293943
2017-04-08 02:08:09martin.pantersetmessages: + msg291311
2017-04-07 05:56:17Ellison Markssetmessages: + msg291253
2017-04-07 05:32:05xiang.zhangsetnosy: + xiang.zhang, martin.panter

messages: + msg291252
versions: - Python 2.7, Python 3.3, Python 3.4, Python 3.5, Python 3.6
2017-04-06 21:58:09Ellison Markssetmessages: + msg291247
2017-04-05 23:28:22Ellison Markscreate