This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Gzip cannot handle zero-padded output + patch
Type: behavior Stage: resolved
Components: Extension Modules Versions: Python 3.1, Python 3.2, Python 2.7, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: brian.curtin Nosy List: brian.curtin, pitrou, tadek
Priority: normal Keywords: needs review, patch

Created on 2008-05-13 22:16 by tadek, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
python2.5.2-gzip.patch tadek, 2008-05-13 22:16 Patch to fix zero-padded archive handling in gzip.
issue2846.diff brian.curtin, 2010-01-13 14:22 change, tests, docs against r77470
Messages (6)
msg66806 - (view) Author: Tadek Pietraszek (tadek) Date: 2008-05-13 22:16
There are cases when gzip produces/receives a zero-padded output, for
example when creating a compressed tar archive with a pipe:

tar cz /dev/null > foo.tgz

ls -la foo.tgz
-rw-r----- 1 tadek tadek 10240 May 13 23:40 foo.tgz

tar tvfz foo.tgz
crw-rw-rw- root/root       1,3 2007-10-18 18:27:25 dev/null


This is a known behavior (http://www.gzip.org/#faq8) and recent versions
of gzip handle it gracefully by skipping all zero bytes after the end of
the file (see gzip.c:1394-1406 in the version 1.3.12).

The Python gzip module crashes on those files:

#:~/python2.5/py2.5$ tar cz /dev/null > foo.tgz
tar: Removing leading `/' from member names
#:~/python2.5/py2.5$ bin/python
Python 2.5.2 (r252:60911, May 14 2008, 00:02:24)
[GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import gzip
>>> f=gzip.open("foo.tgz")
>>> f.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tadek/python2.5/py2.5/lib/python2.5/gzip.py", line 220, in
read
    self._read(readsize)
  File "/home/tadek/python2.5/py2.5/lib/python2.5/gzip.py", line 263, in
_read
    self._read_gzip_header()
  File "/home/tadek/python2.5/py2.5/lib/python2.5/gzip.py", line 164, in
_read_gzip_header
    raise IOError, 'Not a gzipped file'
IOError: Not a gzipped file
>>>

The proposed patch fixes this behavior by reading all zero characters at
the end of the file. I tested that it works with: regular archives,
zero-padded archives, concatenated archives and concatenated zero-padded
archives.

Regards,
Tadek
msg97684 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2010-01-13 04:33
Here tadek's patch updated for trunk, with a test added to it. 

I feel like this should be documented somewhere, but Doc/Library/gzip.rst doesn't feel right. Maybe it just needs a mention in the "What's new" or something?
msg97686 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2010-01-13 05:14
Updated patch with some documentation
msg97694 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-13 07:13
There is no need to write:

       try:
           [...]
       except IOError as err:
           self.fail(err)

Just let the exception be raised and produce an error.
msg97720 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2010-01-13 14:22
Thanks for taking a look! Patch updated with that try/except removed.
msg97721 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-13 14:42
Thank you Brian. I've committed the patch into trunk and py3k. I haven't backported it to 2.6 and 3.1, since it's more a new feature than a bug fix.
History
Date User Action Args
2022-04-11 14:56:34adminsetgithub: 47095
2010-01-13 14:42:16pitrousetstatus: open -> closed
resolution: fixed
messages: + msg97721

stage: patch review -> resolved
2010-01-13 14:22:58brian.curtinsetfiles: + issue2846.diff

messages: + msg97720
2010-01-13 14:21:49brian.curtinsetfiles: - issue2846.diff
2010-01-13 07:13:21pitrousetnosy: + pitrou
messages: + msg97694
2010-01-13 05:14:56brian.curtinsetfiles: + issue2846.diff

messages: + msg97686
2010-01-13 05:13:44brian.curtinsetfiles: - issue2846.diff
2010-01-13 04:33:47brian.curtinsetfiles: + issue2846.diff
priority: normal

assignee: brian.curtin
versions: + Python 2.6, Python 3.1, Python 2.7, Python 3.2, - Python 2.5
keywords: + needs review
nosy: + brian.curtin

messages: + msg97684
stage: patch review
2008-05-13 22:16:22tadekcreate