msg47892 - (view) |
Author: Bastian Kleineidam (calvin) |
Date: 2005-03-08 13:57 |
The GzipFile algorithm crashes when reading a corrupted
.gz file (attached as t.gz) with a missing CRC checksum
at the end.
Tested with python2.3, python2.4 and CVS python on a
Debian Linux system.
$ python2.3 t.py
Traceback (most recent call last):
File "t.py", line 4, in ?
print gzip.GzipFile('', 'rb', 9, fileobj).read()
File "/usr/lib/python2.3/gzip.py", line 217, in read
self._read(readsize)
File "/usr/lib/python2.3/gzip.py", line 289, in _read
self._read_eof()
File "/usr/lib/python2.3/gzip.py", line 305, in _read_eof
crc32 = read32(self.fileobj)
File "/usr/lib/python2.3/gzip.py", line 40, in read32
return struct.unpack("<l", input.read(4))[0]
struct.error: unpack str size does not match format
The attached patch (against current CVS) tries to cope
with this situation by
a) detecting the missing data by examining the rewind
value and
b) assuming that EOF is reached and returning the
buffered uncompressed data (by raising EOFError)
For history I encountered this kind of bug when
downloading HTML pages with Content-Encoding: gzip. It
seems some versions of the mod_gzip Apache module are
producing corrupted gzip data.
|
msg47893 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2007-03-06 13:02 |
I don't think it is right to treat this as EOF. I haven't fully understood the problem (can you explain the format error encountered: what is expected, what is sent instead?), however, I notice that gzip itself also complains about an "unexpected EOF" error, so I don't think we should silently decompress the file. Producing an exception indicating the proper problem and including the data received so far in it would be more appropriate.
Also, can you provide an example file for the test suite that isn't copyrighted by somebody else?
|
msg56754 - (view) |
Author: Bastian Kleineidam (calvin) |
Date: 2007-10-25 20:58 |
Here is a new test script that works with simple strings and no file
objects. It reproduces the error by cutting off the last two bytes of
the GZIP data.
The resulting struct error is due to the read() methods missing a check
that the requested amount of data is actually returned. In this case
read(4) returned 2 bytes instead of 4, and the struct raises an error.
I think the easiest way to handle this is to introduce a
read_save(fileobj, size) method that checks that the read() data is of
the requested size, else raise an error (perhaps an IOError?).
btw: you can remove the t.{gz,py} files, the test_gzip_error.py replaces
them.
|
msg82185 - (view) |
Author: Daniel Diniz (ajaksu2) *  |
Date: 2009-02-15 23:41 |
Confirmed on trunk with test_gzip_error.py:
struct.error: unpack requires a string argument of length 4
|
msg143719 - (view) |
Author: Yoav Weiss (yeeeev) |
Date: 2011-09-08 08:04 |
What is the reason that the currently submitted patch is not good enough and current stage is "needs patch"?
The current patch seem to solve this issue, which is a very common one when dealing with gzip files coming from the Internet.
In any case, an indication on *why* the current patch is not good enough will help create a better patch that may be good enough.
|
msg179969 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-01-14 19:03 |
At the moment gzip can raise two errors on unexpected EOF: struct.error from struct.unpack() or TypeError from ord(). Both bz2 and lzma raise EOFError in such cases.
The proposed patch converts both truncated gzip errors to EOFError as for bz2 and lzma. Added similar tests for gzip, bz2 and lzma.
I doubt about backward compatibility. It's obvious that struct.error and TypeError are unintentional, and EOFError is purposed for this case. However users can catch undocumented but de facto exceptions and doesn't expect EOFError.
|
msg180265 - (view) |
Author: Nadeem Vawda (nadeem.vawda) *  |
Date: 2013-01-19 21:57 |
I've reviewed the patch and posted some comments on Rietveld.
> I doubt about backward compatibility. It's obvious that struct.error and TypeError are unintentional, and EOFError is purposed for this case. However users can catch undocumented but de facto exceptions and doesn't expect EOFError.
I think it's fine for us to change it to raise EOFError in these cases.
|
msg180288 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-01-20 11:51 |
Here is an updated patch addressing Nadeem Vawda's comments. Thank you.
|
msg180289 - (view) |
Author: Nadeem Vawda (nadeem.vawda) *  |
Date: 2013-01-20 12:00 |
The updated patch looks good to me.
|
msg180399 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-01-22 15:17 |
New changeset 174332b89a0d by Serhiy Storchaka in branch '3.2':
Issue #1159051: GzipFile now raises EOFError when reading a corrupted file
http://hg.python.org/cpython/rev/174332b89a0d
New changeset 87171e88847b by Serhiy Storchaka in branch '3.3':
Issue #1159051: GzipFile now raises EOFError when reading a corrupted file
http://hg.python.org/cpython/rev/87171e88847b
New changeset f2f947cdc5fe by Serhiy Storchaka in branch 'default':
Issue #1159051: GzipFile now raises EOFError when reading a corrupted file
http://hg.python.org/cpython/rev/f2f947cdc5fe
New changeset 214d8909513d by Serhiy Storchaka in branch '2.7':
Issue #1159051: GzipFile now raises EOFError when reading a corrupted file
http://hg.python.org/cpython/rev/214d8909513d
|
msg180513 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-01-24 07:43 |
Actually previous patch doesn't fix original problem, it only ensure that GzipFile consistent with BZ2File and LZMAFile.
To fix original problem we need other patch, and this patch looks as new feature for 3.4. Here is a sample patch for LZMAFile. BZ2File patch will be similar, and GzipFile patch will be more different and complex.
Now error doesn't raised immediately when read the file unexpectedly ended if some data can be read. Instead maximal possible part of read data returned and exception raising deferred to next read (see tests).
Perhaps we need a new flag for constructor or for read() which enables a new behavior (what will be a good name for this?). Or we can use a special value for size argument which means "read to the end as much as possible" (we can differentiate the behavior for size<0 and size=None). Unconditional enabling a new behavior for size >=0 is safe.
|
msg181237 - (view) |
Author: Nadeem Vawda (nadeem.vawda) *  |
Date: 2013-02-03 01:32 |
I think the new behavior should be controlled by a constructor flag, maybe
named "defer_errors". I don't like the idea of adding the flag to read(),
since that makes us diverge from the standard file interface. Making a
distinction between size<0 and size=None seems confusing and error-prone,
not to mention that we (again) would have read() work differently from most
other file classes.
I'd prefer it if the new behavior is not enabled by default for size>=0,
even if this wouldn't break well-behaved code. Having a flag that only
controls the size<0 case is inelegant, and I don't think we should change
the default behavior unless there is a clear benefit to doing so.
|
msg184259 - (view) |
Author: Matthias Klose (doko) *  |
Date: 2013-03-15 21:23 |
this change breaks a test case in the bzr testsuite; will try to get to it next week. See https://launchpad.net/bugs/1116079
|
msg184345 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-03-16 19:46 |
tuned_gzip does dangerous things, it overloads private methods of GzipFile.
From Bazaar 2.3 Release Notes:
* Stop using ``bzrlib.tuned_gzip.GzipFile``. It is incompatible with
python-2.7 and was only used for Knit format repositories, which haven't
been recommended since 2007. The file itself will be removed in the next
release. (John Arbash Meinel)
Current version is 2.6b2. bzrlib.tuned_gzip.GzipFile should be removed two releases ago.
|
msg184688 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-03-19 21:58 |
I will be offline some time. Feel free to revert these changes in 2.7-3.3 if it
is necessary.
|
msg185555 - (view) |
Author: Matthias Klose (doko) *  |
Date: 2013-03-30 13:00 |
another test case failure with this patch:
https://launchpad.net/ubuntu/+archive/test-rebuild-20130329/+build/4416983
reproducible with feedparser 5.1.3 from pypi, on x86 (but not x86_64).
ERROR: test_gzip_struct_error (__main__.TestCompression)
----------------------------------------------------------------------
Traceback (most recent call last):
File "./feedparsertest.py", line 433, in test_gzip_struct_error
f = feedparser.parse('http://localhost:8097/tests/compression/gzip-struct-error.gz')
File "/build/buildd/feedparser-5.1.2/feedparser/feedparser.py", line 3836, in parse
data = gzip.GzipFile(fileobj=_StringIO(data)).read()
File "/usr/lib/python2.7/gzip.py", line 253, in read
while self._read(readsize):
File "/usr/lib/python2.7/gzip.py", line 323, in _read
self._read_eof()
File "/usr/lib/python2.7/gzip.py", line 340, in _read_eof
crc32, isize = struct.unpack("<II", self._read_exact(8))
File "/usr/lib/python2.7/gzip.py", line 189, in _read_exact
raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached
----------------------------------------------------------------------
Ran 4237 tests in 5.190s
FAILED (errors=1)
----------------------------------------
Exception happened during processing of request from ('127.0.0.1', 43939)
|
msg188313 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-05-03 17:39 |
In both cases broken applications use the undocumented implementation details. But such changes are extremely strong for bugfix release and they should not be done without a special need. I propose to revert these changes in 2.7, 3.2 and 3.3 (possibly leaving in the default branch). Unfortunately, I was not online right before the release of the latest bugfix release and failed to do this. Fortunately, it is now possible to fix this in regression fix releases.
|
msg188946 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-05-11 19:08 |
New changeset abc780332b60 by Benjamin Peterson in branch '2.7':
backout 214d8909513d for regressions (#1159051)
http://hg.python.org/cpython/rev/abc780332b60
|
msg189002 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-05-12 10:32 |
New changeset 854ba6f414a8 by Georg Brandl in branch '3.2':
Issue #1159051: Back out a fix for handling corrupted gzip files that
http://hg.python.org/cpython/rev/854ba6f414a8
|
msg189010 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-05-12 10:47 |
New changeset 9c2831fe84e9 by Georg Brandl in branch '3.3':
Back out patch for #1159051, which caused backwards compatibility problems.
http://hg.python.org/cpython/rev/9c2831fe84e9
New changeset 5400e8fbc1de by Georg Brandl in branch 'default':
null-merge reversion of #1159051 patch from 3.3
http://hg.python.org/cpython/rev/5400e8fbc1de
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:10 | admin | set | github: 41668 |
2013-05-12 10:47:24 | python-dev | set | messages:
+ msg189010 |
2013-05-12 10:46:25 | georg.brandl | set | versions:
- Python 2.7, Python 3.2, Python 3.3 |
2013-05-12 10:46:19 | georg.brandl | set | priority: release blocker -> normal |
2013-05-12 10:32:41 | python-dev | set | messages:
+ msg189002 |
2013-05-11 19:08:39 | python-dev | set | messages:
+ msg188946 |
2013-05-03 17:39:58 | serhiy.storchaka | set | priority: normal -> release blocker versions:
+ Python 2.7, Python 3.2, Python 3.3 nosy:
+ benjamin.peterson, georg.brandl
messages:
+ msg188313
type: enhancement -> behavior |
2013-03-30 13:00:16 | doko | set | messages:
+ msg185555 |
2013-03-19 21:58:04 | serhiy.storchaka | set | messages:
+ msg184688 |
2013-03-19 21:06:33 | doko | set | priority: release blocker -> normal |
2013-03-16 19:46:48 | serhiy.storchaka | set | messages:
+ msg184345 |
2013-03-15 21:23:08 | doko | set | priority: normal -> release blocker nosy:
+ larry, doko messages:
+ msg184259
|
2013-02-03 01:32:54 | nadeem.vawda | set | messages:
+ msg181237 |
2013-01-24 07:43:51 | serhiy.storchaka | set | files:
+ lzma_deferred_error.patch type: behavior -> enhancement messages:
+ msg180513
versions:
- Python 2.7, Python 3.2, Python 3.3 |
2013-01-22 15:17:01 | python-dev | set | nosy:
+ python-dev messages:
+ msg180399
|
2013-01-20 12:00:13 | nadeem.vawda | set | messages:
+ msg180289 |
2013-01-20 11:51:52 | serhiy.storchaka | set | files:
+ gzip_eof-3.4_2.patch
messages:
+ msg180288 |
2013-01-19 21:57:12 | nadeem.vawda | set | messages:
+ msg180265 |
2013-01-14 19:03:15 | serhiy.storchaka | set | files:
+ gzip_eof-3.4.patch versions:
+ Python 3.3, Python 3.4, - Python 3.1 nosy:
+ nadeem.vawda
messages:
+ msg179969
stage: needs patch -> patch review |
2013-01-13 20:31:38 | serhiy.storchaka | set | assignee: serhiy.storchaka
nosy:
+ serhiy.storchaka |
2011-09-08 08:04:03 | yeeeev | set | nosy:
+ yeeeev messages:
+ msg143719
|
2010-08-21 12:35:59 | BreamoreBoy | set | stage: needs patch type: behavior versions:
+ Python 3.1, Python 2.7, Python 3.2 |
2009-02-15 23:41:25 | ajaksu2 | set | nosy:
+ ajaksu2 messages:
+ msg82185 |
2007-10-25 20:58:44 | calvin | set | files:
+ test_gzip_error.py messages:
+ msg56754 |
2005-03-08 13:57:41 | calvin | create | |