classification
Title: Deprecate the zlib decompressor’s flush() method
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: nadeem.vawda Nosy List: docs@python, martin.panter, nadeem.vawda, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2015-01-09 01:26 by martin.panter, last changed 2016-02-02 11:33 by martin.panter.

Files
File name Uploaded Description Edit
max_length.patch martin.panter, 2015-01-09 01:26 review
deprecate-flush.patch martin.panter, 2015-11-18 07:43 review
deprecate-flush.v2.patch martin.panter, 2016-02-02 11:33 review
Messages (9)
msg233709 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-01-09 01:26
This simple patch documents that max_length has to be non-zero. The implementation actually uses zero as a special value to indicate max_length was not specified.

Also, I wonder what the point of the Decompressor.flush() method is. Reading the module code and zlib manual, it looks like it would return the same data as decompressor.decompress(decompressor.unconsumed_tail), except that it uses the Z_FINISH flag instead of Z_SYNC_FLUSH, so that zlib can optimize by assuming no more data is to be processed. Since unconsumed_tail is read-only and only relevant when using max_length, and flush does not support max_length, I wonder if the flush() method should be deprecated.

To further test this theory, I modified the test_zlib.py file so that all appropriate flush() calls are equivalent to this:

self.assertFalse(dco.flush())

and the tests all still pass.

BTW, it looks to me like zlib_Decompress_flush_impl() is setting avail_out too high (blessing unallocated memory) when the total length is more than half of UNIT_MAX, but I am not in a position to test this.
msg233710 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-01-09 02:08
The processing of unconsumed_tail in flush() was introduced via Issue 16411. Before that I suspect flush() was assumed to only be called when max_length was not used.

The decompress() method changed from Z_NO_FLUSH to Z_SYNC_FLUSH in Feb 2001; see revision 01c2470eeb2e. I guess previously flush() was necessary to get all your data. The max_length parameter was added later in October that year; see Issue 403753 and revision 60e12f83d2d2.
msg254828 - (view) Author: Roundup Robot (python-dev) Date: 2015-11-18 02:59
New changeset 106c49edbb12 by Martin Panter in branch '2.7':
Issue #23200: Document that max_length=0 is not supported
https://hg.python.org/cpython/rev/106c49edbb12

New changeset 660bdfaada64 by Martin Panter in branch '3.4':
Issue #23200: Document that max_length=0 is not supported
https://hg.python.org/cpython/rev/660bdfaada64

New changeset d64bb25cf797 by Martin Panter in branch '3.5':
Issue #23200: Merge zlib doc from 3.4 into 3.5
https://hg.python.org/cpython/rev/d64bb25cf797

New changeset cdd403dd82cb by Martin Panter in branch 'default':
Issue #23200: Merge zlib doc from 3.5
https://hg.python.org/cpython/rev/cdd403dd82cb
msg254836 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-18 07:43
See Issue 224981 (bug) and Issue 403373 (patch) about the Z_SYNC_FLUSH change (the numbers in the commit message are different and do not work).

I committed my doc change to 2.7 and 3.4+, which leaves the main problem of what to do about flush(). I propose deprecate-flush.patch for 3.6, which deprecates this method.

I ran the test suite with -Werror, so I hopefully got all the instances of flush() in the library.
msg254837 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-18 08:07
Besides unconsumed_tail there is an internal buffer. Even if flush() is no longer needed in the zlib decompresser (I don't know), I doubt about bz2 and lzma.
msg254841 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-18 11:05
For the bz2 and lzma modules, neither decompressor classes have a flush() method, only the compressors.
msg255564 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-29 07:11
And regarding other internal buffers and state, calling flush() or decompress() guarantees you get as much data as possible, but there is no error checking. You have to check the eof attribute to ensure everything is done, otherwise I don’t think there is a way to differentiate a truncated stream from a properly ended stream.
msg256142 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-09 07:58
After reading zlib manual it looks to me that your are right, flush() has no other effect except a little less memory consumption. This can be not important in the context of Python. Hope Nadeem will confirm this.
msg259380 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-02-02 11:33
New version of the patch merged with current code and cleaning up the zipfile EOF detection.
History
Date User Action Args
2016-02-02 11:33:17martin.pantersetfiles: + deprecate-flush.v2.patch

messages: + msg259380
2015-12-09 07:58:43serhiy.storchakasetassignee: docs@python -> nadeem.vawda
type: enhancement
messages: + msg256142
2015-11-29 07:11:05martin.pantersetmessages: + msg255564
2015-11-18 11:05:21martin.pantersetmessages: + msg254841
2015-11-18 08:07:07serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg254837
2015-11-18 07:44:01martin.pantersetfiles: + deprecate-flush.patch
versions: + Python 3.6, - Python 3.4
title: Clarify max_length and flush() for zlib decompression -> Deprecate the zlib decompressor’s flush() method
messages: + msg254836

components: + Library (Lib), - Documentation
2015-11-18 02:59:12python-devsetnosy: + python-dev
messages: + msg254828
2015-02-28 12:08:49serhiy.storchakasetnosy: + nadeem.vawda

stage: patch review
2015-01-09 02:08:57martin.pantersetmessages: + msg233710
2015-01-09 01:26:21martin.pantercreate