classification
Title: raw deflate format and zlib module
Type: enhancement Stage: resolved
Components: Documentation Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: akuchling, docs@python, martin.panter, nadeem.vawda, phr, python-dev, serhiy.storchaka, terry.reedy, vstinner
Priority: normal Keywords: patch

Created on 2009-04-17 22:10 by phr, last changed 2016-05-27 23:00 by martin.panter. This issue is now closed.

Files
File name Uploaded Description Edit
patch-5784.txt akuchling, 2015-04-18 12:02 review
patch-5784.txt akuchling, 2015-04-18 16:19 review
zlib-wbits.v3.patch martin.panter, 2015-11-29 05:44 review
Messages (19)
msg86094 - (view) Author: paul rubin (phr) Date: 2009-04-17 22:10
The zlib module doesn't support raw deflate format, so it doesn't
completely interoperate with php's "gzdeflate" function and fails to
decompress some strings that web browsers can decompress.

A workaround is to use a special zlib feature and pass the value -15 as
the "wbits" arg: 

plaintext = zlib.deflate(compressed_text, wbits=-15)

I don't know if it's appropriate to mess with the code, but at minimum I
urge that the workaround be mentioned in the docs.  We had a tremendous
distruption where I work because of a malicious raw-deflated PHP script
that we couldn't decompress with Python for analysis.  We had to resort
to decompressing in a PHP container that (through my slipping up) it
proceeded to escape from.  

Help us Python-Kenobi, save us from PHP ;-)
msg86095 - (view) Author: paul rubin (phr) Date: 2009-04-17 22:28
I should have mentioned, the docs do say "When wbits is negative, the
standard gzip header is suppressed; this is an undocumented feature of
the zlib library, used for compatibility with unzip‘s compression file
format" but this wasn't enough at the time to figure out the issue.  I
suggest adding something like "and the 'raw deflate' format supported by
PHP and some web browsers.  

I better see if I can research the exact situation a bit further, for
the sake of documenting it accurately, if others here think it's a good
idea.
msg109661 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-07-09 01:36
A doc addition would seem like a good idea, so I am changing this to a doc issue for the current versions. Can you suggest specific text and a specific location to place it?

A behavior change could only go into 3.2. I do not know who, if anyone, maintains zlib. A reference to the desired algorithm might be necessary.
msg238975 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-03-23 03:46
According to <http://zlib.net/manual.html#Advanced>, the deflateInit2(windowBits) parameter can be:

* +8 to +15 to include a “zlib” header and trailer
* −8 to −15 to write a raw Deflate stream with no header nor trailer
* 16 + (8 to 15) to include a basic “gzip” header and trailer

The inflateInit2(windowBits) parameter can also be set to the above values to specify what header and trailer to expect, and it can also be set to 0 to read the window size from the “zlib” header itself.

Assuming that the Python module passes “wbits” straight through to the actual “zlib” module, I think these points could be clarified in the Python documentation.
msg241411 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2015-04-18 12:02
Here's a short patch that expands the discussion of wbits, and duplicates it under both the compressobj() and decompress() methods.  Should I avoid the duplication and just have a reference?
msg241414 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-04-18 12:42
Looks good in general (apart from one grammar comment).

It might be best to only include one copy, and reference the others. There are actually three places “wbits” is allowed that I can see:

* compressobj()
* decompress()
* decompressobj()

Maybe just pointing from decompress() and decompressobj() back to the compressobj() description would be good enough. Unless if you know if wbits=0 is also accepted or not for decompression.
msg241421 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2015-04-18 16:19
Thanks! Here's an updated version with some more rewriting -- the list is now in only one place and is linked-to from the decompression documentation.
msg242069 - (view) Author: paul rubin (phr) Date: 2015-04-26 18:47
Hey, thanks for updating this.  I still remember the nasty incident that got me filing this report in the first place.  I'll look at the patch more closely when I get a chance, but the immediate comment I'd make is it's worth adding a sentence saying explicitly to use wbits=-15 if you need to interoperate with some other libraries like PHP, that strip off the header.
msg254825 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-18 00:52
Looking at the current zlib module source code, I confirmed that wbits is passed directly to the deflateInit2(windowBits) and inflateInit2(windowBits) parameters. So the following modes could also be added to the documentation for decompression:

* Zero: automatically determine size from zlib header
* 32 + logarithm: automatically accept either a zlib or gzip header

Also, the compressobj() doc string is out of date. It mentions 8–15 but none of the other options.

Paul: Perhaps it would be better to say “wbits” corresponds to the format of the stream, rather than mentioning just the window size as currently proposed. Then hopefully the reader would look to see what value is needed for the raw Deflate format.
msg255560 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-29 05:44
Here is a patch with the following changes:

* Clarified that wbits affects the container format as well as windows size
* Undid some word wrapping to make the diff simpler
* Added zero and 32 + n for decompression
* Added full list of options under decompressobj(), and link decompress() to that. Otherwise we end up saying decompression generates a header, when it really parses the header.
* Added tests for various wbits values
* Compressing with window bits = 8 not actually supported (Zlib bumps it to 9: <https://github.com/madler/zlib/commit/8e34b3a#diff-8940271ef2146523af486ca4408361daR264>. The change log says “Force windowBits > 8 to avoid a bug in the encoder for a window size of 256 bytes”.)
* Updated doc strings
msg266484 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-05-27 08:45
New changeset 4c88d6d18e85 by Martin Panter in branch '3.5':
Issue #5784: Expand documentation and tests for zlib wbits parameter
https://hg.python.org/cpython/rev/4c88d6d18e85

New changeset 4d4f27fc70d5 by Martin Panter in branch 'default':
Issue #5784: Merge zlib from 3.5
https://hg.python.org/cpython/rev/4d4f27fc70d5

New changeset e5fc74588cea by Martin Panter in branch '2.7':
Issue #5784: Expand documentation and tests for zlib wbits parameter
https://hg.python.org/cpython/rev/e5fc74588cea
msg266485 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-05-27 09:12
Failure on Python 2.7:

http://buildbot.python.org/all/builders/x86%20OpenIndiana%202.7/builds/3348/steps/test/logs/stdio

======================================================================
ERROR: test_wbits (test.test_zlib.CompressObjectTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/export/home/buildbot/32bits/2.7.cea-indiana-x86/build/Lib/test/test_zlib.py", line 534, in test_wbits
    self.assertEqual(zlib.decompress(zlib15, 0), HAMLET_SCENE)
error: Error -2 while preparing to decompress data: inconsistent stream state
msg266487 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-05-27 11:33
New changeset ca49614989dd by Martin Panter in branch '3.5':
Issue #5784: wbits=0 apparently added in zlib v1.2.3.5
https://hg.python.org/cpython/rev/ca49614989dd

New changeset 1771f0ac9fc2 by Martin Panter in branch 'default':
Issue #5784: Merge zlib from 3.5
https://hg.python.org/cpython/rev/1771f0ac9fc2
msg266496 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-05-27 13:47
New changeset 0df93ab07a8f by Martin Panter in branch '2.7':
Issue #5784: Cannot test wbits=0 unless we know we have zlib v1.2.3.5
https://hg.python.org/cpython/rev/0df93ab07a8f
msg266497 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-05-27 14:00
Apparently zlib only supports windowBits (wbits) = 0 since v1.2.3.5. I added a version check in the Python 3 tests, which seems to have solved the buildbot problems (OpenIndiana and OS X buildbots). In Python 2 I removed the test, because Python 2 has no zlib.ZLIB_RUNTIME_VERSION to check.
msg266498 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-05-27 14:08
> In Python 2 I removed the test, because Python 2 has no zlib.ZLIB_RUNTIME_VERSION to check.

It makes sense. You didn't touch the C code of the module, only doc.
msg266505 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-05-27 18:16
I would write the test as:

    v = tuple(map(int, zlib.ZLIB_RUNTIME_VERSION.split(".")))
    supports_wbits_0 = v >= (1, 2, 3, 5)
msg266509 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-05-27 19:57
I agree Serhiy :-) i hesitated to propose exactly the same change.
msg266521 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-05-27 23:00
> v = tuple(map(int, zlib.ZLIB_RUNTIME_VERSION.split(".")))
> supports_wbits_0 = v >= (1, 2, 3, 5)

That was basically my first thought. But I didn’t want to presume that every element of the version is an integer. For instance, the current string in the “develop” branch of zlib has "1.2.8.1-motley".
History
Date User Action Args
2016-05-27 23:00:55martin.pantersetmessages: + msg266521
2016-05-27 19:57:11vstinnersetmessages: + msg266509
2016-05-27 18:16:46serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg266505
2016-05-27 14:08:34vstinnersetmessages: + msg266498
2016-05-27 14:00:48martin.pantersetstatus: open -> closed
resolution: fixed
messages: + msg266497

stage: commit review -> resolved
2016-05-27 13:47:46python-devsetmessages: + msg266496
2016-05-27 11:33:31python-devsetmessages: + msg266487
2016-05-27 09:12:14vstinnersetnosy: + vstinner
messages: + msg266485
2016-05-27 08:45:57python-devsetnosy: + python-dev
messages: + msg266484
2016-05-27 07:22:32martin.pantersetstage: patch review -> commit review
versions: - Python 3.4
2015-11-29 05:44:43martin.pantersetfiles: + zlib-wbits.v3.patch
keywords: + patch
messages: + msg255560

versions: + Python 3.5, Python 3.6, - Python 3.2, Python 3.3
2015-11-18 00:52:06martin.pantersetmessages: + msg254825
2015-04-26 18:47:33phrsetmessages: + msg242069
2015-04-18 16:19:32akuchlingsetfiles: + patch-5784.txt

messages: + msg241421
2015-04-18 12:42:47martin.pantersetmessages: + msg241414
2015-04-18 12:38:14akuchlingsetstage: needs patch -> patch review
2015-04-18 12:02:33akuchlingsetfiles: + patch-5784.txt
nosy: + akuchling
messages: + msg241411

2015-03-23 03:53:50martin.panterlinkissue22163 superseder
2015-03-23 03:46:45martin.pantersetnosy: + martin.panter
messages: + msg238975
2012-11-11 10:11:03serhiy.storchakasetstage: needs patch
versions: + Python 3.4, - Python 3.1
2012-01-26 13:05:18nadeem.vawdasetnosy: + nadeem.vawda
2011-03-09 02:26:09terry.reedysetnosy: terry.reedy, phr, docs@python
versions: + Python 3.3
2010-08-07 18:31:32terry.reedysetversions: - Python 2.6
2010-07-09 01:36:54terry.reedysetversions: + Python 2.6, Python 3.1, Python 2.7, Python 3.2, - Python 2.5
nosy: + terry.reedy, docs@python

messages: + msg109661

assignee: docs@python
components: + Documentation, - Library (Lib)
2009-04-17 22:28:59phrsetmessages: + msg86095
2009-04-17 22:10:48phrsettype: enhancement
components: + Library (Lib)
versions: + Python 2.5
2009-04-17 22:10:20phrcreate