Title: zlib does not indicate end of compressed stream properly
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 2.7
Status: closed Resolution: out of date
Dependencies: Superseder: zlib.Decompress.decompress/flush do not raise any exceptions when given truncated input streams
Assigned To: Nosy List: amaury.forgeotdarc, ezio.melotti, nadeem.vawda, solinym, travis
Priority: normal Keywords: patch

Created on 2009-02-10 19:46 by travis, last changed 2022-04-11 14:56 by admin. This issue is now closed.

zlibmodule.diff travis, 2009-02-12 17:00
zlib_finished_test.txt solinym, 2009-08-19 21:39 patch to test for end-of-compressed-stream indicator
zlibmodule.c.diff solinym, 2009-08-21 16:39 diff to zlibmodule.c solinym, 2009-08-21 16:41 diff to solinym, 2009-08-21 20:07 complete version of diff to
msg81590 - (view) Author: Travis Hassloch (travis) Date: 2009-02-10 19:46
Underlying zlib can determine when it has hit the end of a compressed
stream without reading past the end.  Python zlib implementation requires
that one read past the end before it signals the end by putting data in
Decompress.unused_data.  This complicates interfacing with mixed
compressed/uncompressed streams.
msg81780 - (view) Author: Travis Hassloch (travis) Date: 2009-02-12 17:00
Here is a patch which adds a member called is_finished to decompression
objects that allows client code to know when it has reached the end of
the compressed stream.
msg90523 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009-07-14 22:41
Thanks for the patch!
Can you provide tests too?
msg90817 - (view) Author: Travis H. (solinym) Date: 2009-07-22 15:58
What kind of tests did you have in mind?

Unit tests in python, or something else?
msg90820 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009-07-22 20:35
Yes, I think that the right place where to add the tests is
msg91749 - (view) Author: Travis H. (solinym) Date: 2009-08-19 21:39
Attaching unit test diff

Output of "diff -u"
msg91757 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-08-20 00:09
Some comments about the patch:
- In zlibmodule.c, the is_finished member should be an int, and converted 
to a PyObject only when requested.
- The test should check that is_finished is False one byte before the 
compressed part, and becomes True when the decompressor reads the last 
compressed byte.  I don't think that dco.flush() is necessary for the 
- Also, the last check could be more precise: assertEquals(y1 + y2, 
HAMLET_SCENE) and assertEquals(dco.unused_data, HAMLET_SCENE)
msg91832 - (view) Author: Travis H. (solinym) Date: 2009-08-21 16:39
zlibmodule.c.diff Implements all the suggested features, but I'm not
exactly sure whether it handles reference counts properly.
msg91833 - (view) Author: Travis H. (solinym) Date: 2009-08-21 16:41
Diff to tests

Implements all suggested changes save one:

I wasn't sure how to test that is_finished is clear one byte before the
end of the compressed section.  Instead, I test that it is clear before
I call the compression routine.
msg91840 - (view) Author: Travis H. (solinym) Date: 2009-08-21 20:07
Figured out how to test is_finished attribute of the zlib module properly.
msg91846 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-08-21 21:28
Hm, I tried a modified version of your first test, and I found another 
problem with the current zlib library;
starting with the input:
x = x1 + x2 + HAMLET_SCENE    # both compressed and uncompressed data

The following scenario is OK:
dco.decompress(x) # returns HAMLET_SCENE
dco.unused_data   # returns HAMLET_SCENE

But this one:
for c in x:
    dco.decompress(x) # will return HAMLET_SCENE, in several pieces
dco.unused_data   # only one character, the last of (c in x)!

This is a bug IMO: unused_data should accumulate all the extra uncompressed 
msg174057 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012-10-28 16:26
This bug (zlib not providing a way to detect end-of-stream) has already
been fixed - see issue 12646.

I've opened issue 16350 for the unused_data problem.
