Issue 23200: Deprecate the zlib decompressor’s flush() method

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/67389

classification

Title:	Deprecate the zlib decompressor’s flush() method
Type:	enhancement	Stage:	patch review
Components:	Library (Lib)	Versions:	Python 3.6

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:	nadeem.vawda	Nosy List:	docs@python, martin.panter, nadeem.vawda, python-dev, serhiy.storchaka
Priority:	normal	Keywords:	patch

Created on 2015-01-09 01:26 by martin.panter, last changed 2022-04-11 14:58 by admin.

Files
File name	Uploaded	Description	Edit
max_length.patch	martin.panter, 2015-01-09 01:26		review
deprecate-flush.patch	martin.panter, 2015-11-18 07:43		review
deprecate-flush.v2.patch	martin.panter, 2016-02-02 11:33		review

Messages (9)
msg233709 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-01-09 01:26
This simple patch documents that max_length has to be non-zero. The implementation actually uses zero as a special value to indicate max_length was not specified. Also, I wonder what the point of the Decompressor.flush() method is. Reading the module code and zlib manual, it looks like it would return the same data as decompressor.decompress(decompressor.unconsumed_tail), except that it uses the Z_FINISH flag instead of Z_SYNC_FLUSH, so that zlib can optimize by assuming no more data is to be processed. Since unconsumed_tail is read-only and only relevant when using max_length, and flush does not support max_length, I wonder if the flush() method should be deprecated. To further test this theory, I modified the test_zlib.py file so that all appropriate flush() calls are equivalent to this: self.assertFalse(dco.flush()) and the tests all still pass. BTW, it looks to me like zlib_Decompress_flush_impl() is setting avail_out too high (blessing unallocated memory) when the total length is more than half of UNIT_MAX, but I am not in a position to test this.
msg233710 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-01-09 02:08
The processing of unconsumed_tail in flush() was introduced via Issue 16411. Before that I suspect flush() was assumed to only be called when max_length was not used. The decompress() method changed from Z_NO_FLUSH to Z_SYNC_FLUSH in Feb 2001; see revision 01c2470eeb2e. I guess previously flush() was necessary to get all your data. The max_length parameter was added later in October that year; see Issue 403753 and revision 60e12f83d2d2.
msg254828 - (view)	Author: Roundup Robot (python-dev)	Date: 2015-11-18 02:59
New changeset 106c49edbb12 by Martin Panter in branch '2.7': Issue #23200: Document that max_length=0 is not supported https://hg.python.org/cpython/rev/106c49edbb12 New changeset 660bdfaada64 by Martin Panter in branch '3.4': Issue #23200: Document that max_length=0 is not supported https://hg.python.org/cpython/rev/660bdfaada64 New changeset d64bb25cf797 by Martin Panter in branch '3.5': Issue #23200: Merge zlib doc from 3.4 into 3.5 https://hg.python.org/cpython/rev/d64bb25cf797 New changeset cdd403dd82cb by Martin Panter in branch 'default': Issue #23200: Merge zlib doc from 3.5 https://hg.python.org/cpython/rev/cdd403dd82cb
msg254836 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-11-18 07:43
See Issue 224981 (bug) and Issue 403373 (patch) about the Z_SYNC_FLUSH change (the numbers in the commit message are different and do not work). I committed my doc change to 2.7 and 3.4+, which leaves the main problem of what to do about flush(). I propose deprecate-flush.patch for 3.6, which deprecates this method. I ran the test suite with -Werror, so I hopefully got all the instances of flush() in the library.
msg254837 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-11-18 08:07
Besides unconsumed_tail there is an internal buffer. Even if flush() is no longer needed in the zlib decompresser (I don't know), I doubt about bz2 and lzma.
msg254841 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-11-18 11:05
For the bz2 and lzma modules, neither decompressor classes have a flush() method, only the compressors.
msg255564 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-11-29 07:11
And regarding other internal buffers and state, calling flush() or decompress() guarantees you get as much data as possible, but there is no error checking. You have to check the eof attribute to ensure everything is done, otherwise I don’t think there is a way to differentiate a truncated stream from a properly ended stream.
msg256142 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-12-09 07:58
After reading zlib manual it looks to me that your are right, flush() has no other effect except a little less memory consumption. This can be not important in the context of Python. Hope Nadeem will confirm this.
msg259380 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-02-02 11:33
New version of the patch merged with current code and cleaning up the zipfile EOF detection.

History
Date	User	Action	Args
2022-04-11 14:58:11	admin	set	github: 67389
2016-02-02 11:33:17	martin.panter	set	files: + deprecate-flush.v2.patch messages: + msg259380
2015-12-09 07:58:43	serhiy.storchaka	set	assignee: docs@python -> nadeem.vawda type: enhancement messages: + msg256142
2015-11-29 07:11:05	martin.panter	set	messages: + msg255564
2015-11-18 11:05:21	martin.panter	set	messages: + msg254841
2015-11-18 08:07:07	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg254837
2015-11-18 07:44:01	martin.panter	set	files: + deprecate-flush.patch versions: + Python 3.6, - Python 3.4 title: Clarify max_length and flush() for zlib decompression -> Deprecate the zlib decompressor’s flush() method messages: + msg254836 components: + Library (Lib), - Documentation
2015-11-18 02:59:12	python-dev	set	nosy: + python-dev messages: + msg254828
2015-02-28 12:08:49	serhiy.storchaka	set	nosy: + nadeem.vawda stage: patch review
2015-01-09 02:08:57	martin.panter	set	messages: + msg233710
2015-01-09 01:26:21	martin.panter	create