classification
Title: zlib does not indicate end of compressed stream properly
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 2.7
process
Status: closed Resolution: out of date
Dependencies: Superseder: zlib.Decompress.decompress/flush do not raise any exceptions when given truncated input streams
View: 12646
Assigned To: Nosy List: amaury.forgeotdarc, ezio.melotti, nadeem.vawda, solinym, travis
Priority: normal Keywords: patch

Created on 2009-02-10 19:46 by travis, last changed 2012-10-28 16:26 by nadeem.vawda. This issue is now closed.

Files
File name Uploaded Description Edit
zlibmodule.diff travis, 2009-02-12 17:00
zlib_finished_test.txt solinym, 2009-08-19 21:39 patch to test for end-of-compressed-stream indicator
zlibmodule.c.diff solinym, 2009-08-21 16:39 diff to zlibmodule.c
test_zlib.py.diff solinym, 2009-08-21 16:41 diff to test_zlib.py
test_zlib.py.diff solinym, 2009-08-21 20:07 complete version of diff to test_zlib.py
Messages (12)
msg81590 - (view) Author: Travis Hassloch (travis) Date: 2009-02-10 19:46
Underlying zlib can determine when it has hit the end of a compressed
stream without reading past the end.  Python zlib implementation requires
that one read past the end before it signals the end by putting data in
Decompress.unused_data.  This complicates interfacing with mixed
compressed/uncompressed streams.
msg81780 - (view) Author: Travis Hassloch (travis) Date: 2009-02-12 17:00
Here is a patch which adds a member called is_finished to decompression
objects that allows client code to know when it has reached the end of
the compressed stream.
msg90523 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009-07-14 22:41
Thanks for the patch!
Can you provide tests too?
msg90817 - (view) Author: Travis H. (solinym) Date: 2009-07-22 15:58
What kind of tests did you have in mind?

Unit tests in python, or something else?
msg90820 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009-07-22 20:35
Yes, I think that the right place where to add the tests is
Lib/test/test_zlib.py
msg91749 - (view) Author: Travis H. (solinym) Date: 2009-08-19 21:39
Attaching unit test diff

Output of "diff -u test_zlib.py~ test_zlib.py"
msg91757 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-08-20 00:09
Some comments about the patch:
- In zlibmodule.c, the is_finished member should be an int, and converted 
to a PyObject only when requested.
- The test should check that is_finished is False one byte before the 
compressed part, and becomes True when the decompressor reads the last 
compressed byte.  I don't think that dco.flush() is necessary for the 
test.
- Also, the last check could be more precise: assertEquals(y1 + y2, 
HAMLET_SCENE) and assertEquals(dco.unused_data, HAMLET_SCENE)
msg91832 - (view) Author: Travis H. (solinym) Date: 2009-08-21 16:39
zlibmodule.c.diff Implements all the suggested features, but I'm not
exactly sure whether it handles reference counts properly.
msg91833 - (view) Author: Travis H. (solinym) Date: 2009-08-21 16:41
Diff to tests

Implements all suggested changes save one:

I wasn't sure how to test that is_finished is clear one byte before the
end of the compressed section.  Instead, I test that it is clear before
I call the compression routine.
msg91840 - (view) Author: Travis H. (solinym) Date: 2009-08-21 20:07
Figured out how to test is_finished attribute of the zlib module properly.
msg91846 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-08-21 21:28
Hm, I tried a modified version of your first test, and I found another 
problem with the current zlib library;
starting with the input:
x = x1 + x2 + HAMLET_SCENE    # both compressed and uncompressed data

The following scenario is OK:
dco.decompress(x) # returns HAMLET_SCENE
dco.unused_data   # returns HAMLET_SCENE

But this one:
for c in x:
    dco.decompress(x) # will return HAMLET_SCENE, in several pieces
dco.unused_data   # only one character, the last of (c in x)!

This is a bug IMO: unused_data should accumulate all the extra uncompressed 
data.
msg174057 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012-10-28 16:26
This bug (zlib not providing a way to detect end-of-stream) has already
been fixed - see issue 12646.

I've opened issue 16350 for the unused_data problem.
History
Date User Action Args
2012-10-28 16:26:20nadeem.vawdasetstatus: open -> closed
superseder: zlib.Decompress.decompress/flush do not raise any exceptions when given truncated input streams
messages: + msg174057

resolution: out of date
stage: test needed -> resolved
2012-01-26 13:03:45nadeem.vawdasetnosy: + nadeem.vawda
2009-08-21 21:28:26amaury.forgeotdarcsetmessages: + msg91846
2009-08-21 20:07:56solinymsetfiles: + test_zlib.py.diff

messages: + msg91840
2009-08-21 16:41:08solinymsetfiles: + test_zlib.py.diff

messages: + msg91833
2009-08-21 16:39:53solinymsetfiles: + zlibmodule.c.diff

messages: + msg91832
2009-08-20 00:09:17amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg91757
2009-08-19 21:39:51solinymsetfiles: + zlib_finished_test.txt

messages: + msg91749
2009-07-22 20:35:20ezio.melottisetmessages: + msg90820
2009-07-22 15:58:31solinymsetnosy: + solinym
messages: + msg90817
2009-07-14 22:41:20ezio.melottisetpriority: normal
versions: + Python 2.7, Python 3.2, - Python 3.0
nosy: + ezio.melotti

messages: + msg90523

stage: test needed
2009-07-14 22:38:08ezio.melottilinkissue6485 superseder
2009-02-12 17:00:08travissetfiles: + zlibmodule.diff
keywords: + patch
messages: + msg81780
2009-02-10 19:46:19traviscreate