This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author martin.panter
Recipients Ericg, martin.panter, ned.deily
Date 2015-05-28.00:26:39
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1432772800.88.0.542267139732.issue24301@psf.upfronthosting.co.za>
In-reply-to
Content
I suspect Eric’s file has non-zero, non-gzip garbage bytes appended to the end of it. Assuming I am right, here is way to reproduce that scenario:

>>> from gzip import GzipFile
>>> from io import BytesIO
>>> file = BytesIO()
>>> with GzipFile(fileobj=file, mode="wb") as z:
...     z.write(b"data")
... 
4
>>> file.write(b"garbage")
7
>>> file.seek(0)
0
>>> GzipFile(fileobj=file).read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/proj/python/cpython/Lib/gzip.py", line 274, in read
    return self._buffer.read(size)
  File "/home/proj/python/cpython/Lib/gzip.py", line 461, in read
    if not self._read_gzip_header():
  File "/home/proj/python/cpython/Lib/gzip.py", line 409, in _read_gzip_header
    raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b'ga')

This is a bit different to Issue 1508475. That one is about cases where the “gzip” trailer has been truncated, although the compressed data is probably intact. This case is the converse: extra data has been added.

All of the “gzip”, “bzip2” and XZ Utils (for LZMA) command-line decompressors happily extract the compressed data without an error exit status, but emit warning messages:

gzip: stdin: decompression OK, trailing garbage ignored
bzip2: (stdin): trailing garbage after EOF ignored
xz: (stdin): Unexpected end of input

In Python, the “bzip” and LZMA modules successfully extract the compressed data, and ignore the non-compressed garbage at the end without even a warning. On the other hand, the “gzip” module has special code to ignore trailing zero bytes (Issue 2846), but treats any other trailing non-gzip data as an error.

So I think a strong argument could be made for the ability to extract all the compressed data from even if there is garbage appended. The question is, how would this support be added? Perhaps the mechanism chosen could also be integrated with a fix for Issue 1508475. Some options:

* Silently ignore the condition by default like the other compression modules (consistent, but could silently swallow real errors)
* An optional new GzipFile(strict=False) mode
* Perhaps an exception deferred until close() is called
History
Date User Action Args
2015-05-28 00:26:40martin.pantersetrecipients: + martin.panter, ned.deily, Ericg
2015-05-28 00:26:40martin.pantersetmessageid: <1432772800.88.0.542267139732.issue24301@psf.upfronthosting.co.za>
2015-05-28 00:26:40martin.panterlinkissue24301 messages
2015-05-28 00:26:39martin.pantercreate