This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: bz2: regression wrt supporting files with trailing garbage after EOF
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.3, Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: nadeem.vawda Nosy List: Fabio.Erculiani, nadeem.vawda, python-dev, serhiy.storchaka
Priority: high Keywords: 3.2regression

Created on 2013-11-30 10:16 by Fabio.Erculiani, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
sys-libs:zlib-1.2.3-r1~1.tbz2 Fabio.Erculiani, 2013-11-30 10:16
Messages (4)
msg204793 - (view) Author: Fabio Erculiani (Fabio.Erculiani) Date: 2013-11-30 10:16
In Sabayon Linux and Gentoo Linux, distro package metadata is appended at the end of bz2 files. Python 2.7, 3.1, 3.2 bz2 modules were handling the following attached file just fine, trailing garbage was simply ignored like the bunzip2 utility does.

example test code:
f = bz2.BZ2File(path, mode="rb")
data = f.read(1024)
while data:
    data = f.read(1024)
f.close()

The following code doesn't work with Python 3.3.3 anymore, at some point I receive the following exception (that comes from the bz2 module C code):

  File "/usr/lib64/python3.3/bz2.py", line 278, in read
    return self._read_block(size)
  File "/usr/lib64/python3.3/bz2.py", line 239, in _read_block
    while n > 0 and self._fill_buffer():
  File "/usr/lib64/python3.3/bz2.py", line 203, in _fill_buffer
    self._buffer = self._decompressor.decompress(rawblock)
OSError: Invalid data stream

Please restore the compatibility with bz2 files with trailing garbage after EOF.
msg204804 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-30 11:50
decompress() is affected too.

>>> import bz2
>>> bz2.decompress(bz2.compress(b'abcd') + b'xyz')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/serhiy/py/cpython/Lib/bz2.py", line 505, in decompress
    results.append(decomp.decompress(data))
OSError: Invalid data stream

On 3.2 it returns b'abcd'.
msg204946 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2013-12-01 17:58
I'll have a patch for this in the next couple of days (and a similar one
for the lzma module, which has the same issue (even though it's not a
regression in that case)).

In the meanwhile, you can work around this by feeding the compressed data
to a BZ2Decompressor yourself - it stops at the end of the bz2 stream,
with any leftover data stored in its 'unused_data' attribute.
msg205258 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-12-04 22:30
New changeset c5349a560703 by Nadeem Vawda in branch '3.3':
#19839: Fix regression in bz2 module's handling of non-bzip2 data at EOF.
http://hg.python.org/cpython/rev/c5349a560703

New changeset bec2033ee2ec by Nadeem Vawda in branch '3.3':
#19839: Fix lzma module's handling of non-lzma data at EOF.
http://hg.python.org/cpython/rev/bec2033ee2ec

New changeset 1f1498fe50e5 by Nadeem Vawda in branch 'default':
Closes #19839: Fix regression in bz2 module's handling of non-bzip2 data at EOF.
http://hg.python.org/cpython/rev/1f1498fe50e5
History
Date User Action Args
2022-04-11 14:57:54adminsetgithub: 64038
2013-12-04 22:30:13python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg205258

resolution: fixed
stage: needs patch -> resolved
2013-12-01 17:58:30nadeem.vawdasetassignee: nadeem.vawda
messages: + msg204946
stage: needs patch
2013-12-01 00:09:31pitrousetkeywords: + 3.2regression
priority: normal -> high
2013-11-30 11:50:31serhiy.storchakasetmessages: + msg204804
2013-11-30 10:35:46serhiy.storchakasetnosy: + nadeem.vawda, serhiy.storchaka

type: crash -> behavior
versions: + Python 3.4
2013-11-30 10:17:10Fabio.Erculianisettype: crash
2013-11-30 10:16:59Fabio.Erculianicreate