classification
Title: Error decompressing valid zlib data
Type: behavior Stage: resolved
Components: Tests Versions: Python 3.1, Python 3.2, Python 2.7, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: gregory.p.smith, matthew.brett, pitrou
Priority: normal Keywords: patch

Created on 2010-05-09 22:44 by matthew.brett, last changed 2010-05-11 23:39 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
mat.bin matthew.brett, 2010-05-09 22:44 binary zlib-compressed data causing decompression error
zlib-8672.patch pitrou, 2010-05-10 23:00
Messages (9)
msg105420 - (view) Author: Matthew Brett (matthew.brett) Date: 2010-05-09 22:44
I have a valid zlib compressed string, attached here as 'mat.bin' (1.7M), that cause and error on zlib.decompress decompression:

>>> import zlib
>>> data = open('mat.bin', 'rb').read()
>>> out = zlib.decompress(data)
Traceback (most recent call last):
  File "<ipython console>", line 1, in <module>
error: Error -5 while decompressing data

I know these data are valid, because I get the string I was expecting with:

>>> dc_obj = zlib.decompressobj()
>>> out = dc_obj.decompress(data)

As expected, there is no remaining data after this read:

>>> assert dc_obj.flush() == ''
>>> 

I believe that the behavior of zlib.decompress(data) and zlib.decompressobj().decompress(data) should be equivalent, and that the error for zlib.decompress(data) is therefore the symptom of a bug.
msg105470 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-10 22:11
After a bit of debugging, it seems your data is not actually a complete zlib stream (*). What did you generate it with?

(*) in technical terms, the zlib never returns Z_STREAM_END when decompressing your data. The decompressobj ignores it, but the top-level decompress() function considers it an error.
msg105474 - (view) Author: Matthew Brett (matthew.brett) Date: 2010-05-10 22:30
Hi,

> Antoine Pitrou <pitrou@free.fr> added the comment:
>
> After a bit of debugging, it seems your data is not actually a complete zlib stream (*). What did you generate it with?
>
> (*) in technical terms, the zlib never returns Z_STREAM_END when decompressing your data. The decompressobj ignores it, but the top-level decompress() function considers it an error.

Thanks for the debugging.  The stream comes from within a matlab 'mat'
file.  I maintain the scipy matlab file readers; the variables within
these files are zlib compressed streams.

 Is there (should there be) a safe and maintained way to allow me to
read a stream that does not return Z_STREAM_END?
msg105475 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-10 22:36
> Thanks for the debugging.  The stream comes from within a matlab 'mat'
> file.  I maintain the scipy matlab file readers; the variables within
> these files are zlib compressed streams.

So this would be a Matlab issue, right?

>  Is there (should there be) a safe and maintained way to allow me to
> read a stream that does not return Z_STREAM_END?

Decompressor objects allow you to do that, but I cannot tell you how
"maintained" it is. If it has to be maintained, we could add an unit
test for it so that regressions get detected. It would be nice if you
could provide a very short zlib stream reproducing the issue.
msg105477 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-10 22:39
I also think we should improve the zlib module's error messages. I've added a patch in issue8681 for that. With that patch, the message you'd've encountered would have been "Error -5 while decompressing data: incomplete or truncated stream", which is quite more informative.
msg105478 - (view) Author: Matthew Brett (matthew.brett) Date: 2010-05-10 22:48
>> Thanks for the debugging.  The stream comes from within a matlab 'mat'
>> file.  I maintain the scipy matlab file readers; the variables within
>> these files are zlib compressed streams.
>
> So this would be a Matlab issue, right?

Yes, except scipy and numpy aim in part to be an open-source
replacement for matlab, so we very much want to be able to read their
files.

>>  Is there (should there be) a safe and maintained way to allow me to
>> read a stream that does not return Z_STREAM_END?
>
> Decompressor objects allow you to do that, but I cannot tell you how
> "maintained" it is. If it has to be maintained, we could add an unit
> test for it so that regressions get detected. It would be nice if you
> could provide a very short zlib stream reproducing the issue

This is the only .mat file stream I have yet come across that causes
the error.  It is possible to knock a portion off the end of a valid
stream to reproduce the problem?
msg105480 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-10 23:00
Ok, it turned out to be quite easy indeed. Here is a patch adding a test.
msg105544 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2010-05-11 21:00
patch looks good.
msg105558 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-11 23:39
The patch was committed in r81094 (2.7), r81095 (2.6), r81096 (3.2) and r81097 (3.1). Thank you!
History
Date User Action Args
2010-05-11 23:39:18pitrousetstatus: open -> closed
resolution: fixed
messages: + msg105558

stage: patch review -> resolved
2010-05-11 21:00:23gregory.p.smithsetmessages: + msg105544
2010-05-10 23:00:44pitrousetnosy: gregory.p.smith, pitrou, matthew.brett
components: + Tests, - Library (Lib)
stage: needs patch -> patch review
2010-05-10 23:00:32pitrousetfiles: + zlib-8672.patch
keywords: + patch
messages: + msg105480
2010-05-10 22:48:39matthew.brettsetmessages: + msg105478
2010-05-10 22:39:50pitrousetnosy: + gregory.p.smith
messages: + msg105477
2010-05-10 22:36:06pitrousetmessages: + msg105475
2010-05-10 22:30:56matthew.brettsetmessages: + msg105474
2010-05-10 22:11:01pitrousetnosy: + pitrou
messages: + msg105470
2010-05-09 22:49:10pitrousetstage: needs patch
components: + Library (Lib), - IO
versions: + Python 2.7, Python 3.2
2010-05-09 22:44:04matthew.brettcreate