Title: max_wbits set incorrectly to -zlib.MAX_WBITS in tarfile, shouldn't be negative
Type: performance Stage: patch review
Components: Documentation, Library (Lib) Versions: Python 3.4, Python 3.5, Python 2.7
Status: closed Resolution: duplicate
Dependencies: Superseder: raw deflate format and zlib module
View: 5784
Assigned To: docs@python Nosy List: BreamoreBoy, docs@python, edulix, lars.gustaebel, martin.panter, nadeem.vawda, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2014-08-07 11:02 by edulix, last changed 2015-03-23 03:53 by martin.panter. This issue is now closed.

File name Uploaded Description Edit
max_wbits.patch edulix, 2014-08-07 11:02 Patch to remove the minus symbol from zlib.MAX_WBITS
Messages (4)
msg225006 - (view) Author: Eduardo Robles Elvira (edulix) * Date: 2014-08-07 11:02
I think I have found a small typo-bug in, that seems to
be present in cpython upstream, which makes tarfile compression slower.
The issue can be seen here, in line 415 [1] of

        self.cmp = self.zlib.compressobj(9, self.zlib.DEFLATED,

The minus sign doesn't make sense to me, because zlib.MAX_WBITS is 15,
and according to the documentation [2] wbits is by default 15 and
"This should be an integer from 8 to 15. Higher values give better
compression, but use more memory". -15 is not even in range. So I guess the expected value should be the zlib default, 15. Or maybe another value, but it should at least be in range. 

I marked it as "performance" type bug as this only really affects performance.

I might have gotten it all wrong and it's fine the way it is, but I thought it'd be best to report it, as it looked fishy to me.

msg238909 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2015-03-22 14:45
Is the thinking and hence the patch correct here?
msg238962 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-03-23 00:16
I suspect the patch is wrong and the zlib documentation needs fixing instead. See <> for a possible explanation of the negative sign, although it would make more sense for compress() which would actually create a “gzip” header.
msg238976 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-03-23 03:53
Eduardo’s patch causes many tests to fail. Abbreviated test output:

$ ./python -bWall -m test -v test_tarfile
ERROR: test_compare_members (test.test_tarfile.GzipStreamReadTest)
ERROR: test_empty_tarfile (test.test_tarfile.GzipStreamReadTest)
ERROR: test_fileobj_regular_file (test.test_tarfile.GzipStreamReadTest)
ERROR: test_ignore_zeros (test.test_tarfile.GzipStreamReadTest)
ERROR: test_non_existent_tarfile (test.test_tarfile.GzipStreamReadTest)
ERROR: test_null_tarfile (test.test_tarfile.GzipStreamReadTest)
ERROR: test_provoke_stream_error (test.test_tarfile.GzipStreamReadTest)
ERROR: test_read_through (test.test_tarfile.GzipStreamReadTest)
tarfile.ReadError: invalid compressed data

ERROR: test_stream_padding (test.test_tarfile.GzipStreamWriteTest)
zlib.error: Error -3 while decompressing data: invalid stored block lengths

FAIL: test_detect_file (test.test_tarfile.GzipDetectReadTest)
FAIL: test_detect_fileobj (test.test_tarfile.GzipDetectReadTest)

Ran 387 tests in 14.216s

FAILED (failures=2, errors=9, skipped=8)

There is already Issue 5784 open about clarifying the “wbits” parameter, so I think we can close this as a duplicate.
Date User Action Args
2015-03-23 03:53:50martin.pantersetstatus: open -> closed
superseder: raw deflate format and zlib module
resolution: duplicate
messages: + msg238976
2015-03-23 00:16:13martin.pantersetnosy: + martin.panter, docs@python
messages: + msg238962

assignee: docs@python
components: + Documentation
2015-03-22 14:45:46BreamoreBoysetnosy: + BreamoreBoy
messages: + msg238909
2014-08-29 22:29:27pitrousetnosy: + serhiy.storchaka
2014-08-29 20:39:27terry.reedysetnosy: + lars.gustaebel
stage: patch review

versions: - Python 3.1, Python 3.2, Python 3.3
2014-08-07 12:51:07serhiy.storchakasetnosy: + nadeem.vawda
2014-08-07 11:02:32edulixcreate