This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author tadek
Recipients tadek
Date 2008-05-13.22:16:17
SpamBayes Score 1.6721423e-06
Marked as misclassified No
Message-id <1210716983.8.0.306490627236.issue2846@psf.upfronthosting.co.za>
In-reply-to
Content
There are cases when gzip produces/receives a zero-padded output, for
example when creating a compressed tar archive with a pipe:

tar cz /dev/null > foo.tgz

ls -la foo.tgz
-rw-r----- 1 tadek tadek 10240 May 13 23:40 foo.tgz

tar tvfz foo.tgz
crw-rw-rw- root/root       1,3 2007-10-18 18:27:25 dev/null


This is a known behavior (http://www.gzip.org/#faq8) and recent versions
of gzip handle it gracefully by skipping all zero bytes after the end of
the file (see gzip.c:1394-1406 in the version 1.3.12).

The Python gzip module crashes on those files:

#:~/python2.5/py2.5$ tar cz /dev/null > foo.tgz
tar: Removing leading `/' from member names
#:~/python2.5/py2.5$ bin/python
Python 2.5.2 (r252:60911, May 14 2008, 00:02:24)
[GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import gzip
>>> f=gzip.open("foo.tgz")
>>> f.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tadek/python2.5/py2.5/lib/python2.5/gzip.py", line 220, in
read
    self._read(readsize)
  File "/home/tadek/python2.5/py2.5/lib/python2.5/gzip.py", line 263, in
_read
    self._read_gzip_header()
  File "/home/tadek/python2.5/py2.5/lib/python2.5/gzip.py", line 164, in
_read_gzip_header
    raise IOError, 'Not a gzipped file'
IOError: Not a gzipped file
>>>

The proposed patch fixes this behavior by reading all zero characters at
the end of the file. I tested that it works with: regular archives,
zero-padded archives, concatenated archives and concatenated zero-padded
archives.

Regards,
Tadek
History
Date User Action Args
2008-05-13 22:16:26tadeksetspambayes_score: 1.67214e-06 -> 1.6721423e-06
recipients: + tadek
2008-05-13 22:16:23tadeksetspambayes_score: 1.67214e-06 -> 1.67214e-06
messageid: <1210716983.8.0.306490627236.issue2846@psf.upfronthosting.co.za>
2008-05-13 22:16:22tadeklinkissue2846 messages
2008-05-13 22:16:21tadekcreate