This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: bz2.BZ2DEcompressor.decompress fail on large files
Type: crash Stage: resolved
Components: Extension Modules Versions: Python 3.2, Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: nadeem.vawda Nosy List: Laurent.Gautier, benjamin.peterson, georg.brandl, loewis, nadeem.vawda, python-dev, serhiy.storchaka
Priority: normal Keywords:

Created on 2012-03-24 16:15 by Laurent.Gautier, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
testbz2.py nadeem.vawda, 2012-03-24 17:33
Messages (18)
msg156698 - (view) Author: Laurent Gautier (Laurent.Gautier) Date: 2012-03-24 16:15
The call ends with:
Objects/stringobject.c:3884: bad argument to internal function

sys.version:
'2.7.2 (default, Jun 13 2011, 15:14:50) \n[GCC 4.4.5]'
(on 64bit Linux)
msg156701 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-03-24 16:36
I can't reproduce this. Can you please provide a test script along with input data that allows us to reproduce this error?
msg156705 - (view) Author: Laurent Gautier (Laurent.Gautier) Date: 2012-03-24 16:45
Wow! Quick follow-up.
 
The data file is about 1.6Gb. Is there a preferred way to pass it on (I suspect that the bug tracker is not the preferred way).

The code goes like:

import bz2
f = file("foobar.bz2", mode="rb")
src_buf = f.read()
decomp = bz2.BZ2Decompressor()
tmp = decomp.decompress(src_buf)
msg156709 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012-03-24 17:33
I have been able to reproduce it; see attached script. It happens for
inputs of 2GB (decompressed), but not for ones of 1GB.

It seems that bz2module.c doesn't guard against 32-bit overflows when
handling the size of the decompressed data. This affects both the
BZ2Decompressor object's decompress() method, and the module-level
decompress() function. All python versions prior to 3.3 are affected.
msg156710 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012-03-24 17:35
(the contents of the input file don't matter; I just pulled out a
bunch of zeros from /dev/zero and compressed them with bzip2.)
msg156711 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012-03-24 17:52
This should be fixed for 2.7.3. I'll have a patch ready in the next day
or two.
msg156713 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2012-03-24 19:31
This isn't a regression, is it? If it's not, I don't think it's essential to get into 2.7.3.
msg156714 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012-03-24 19:35
No, it's been around since at least 2.6. I wasn't really sure what the
protocol was for bugs found during the RC process. It'd be nice to get
a fix for this into 2.7.3 (and 3.2.3), but it's not urgent.
msg156715 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-03-24 19:37
Nadeem: the final release candidate of 2.7.3 was already made. Any further change would require another release candidate, which in turn would delay the release further. This has to wait for 2.7.4.
msg156717 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012-03-24 19:38
That's fine by me, then. Sorry for the confusion.
msg173471 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-10-21 19:22
New changeset ebb8c7d79f52 by Nadeem Vawda in branch '3.2':
Issue #14398: Fix size truncation and overflow bugs in bz2 module.
http://hg.python.org/cpython/rev/ebb8c7d79f52

New changeset 25fdf297c077 by Nadeem Vawda in branch '3.3':
Merge #14398: Fix size truncation and overflow bugs in bz2 module.
http://hg.python.org/cpython/rev/25fdf297c077

New changeset d6bf506ea13f by Nadeem Vawda in branch 'default':
Merge #14398: Fix size truncation and overflow bugs in bz2 module.
http://hg.python.org/cpython/rev/d6bf506ea13f
msg173479 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-21 20:20
What about 2.7?
msg173481 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012-10-21 20:30
I'm working on it now. Will push in the next 15 minutes or so.
msg173483 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-10-21 21:09
New changeset f03a335621ce by Nadeem Vawda in branch '2.7':
Issue #14398: Fix size truncation and overflow bugs in bz2 module.
http://hg.python.org/cpython/rev/f03a335621ce
msg173484 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012-10-21 21:12
All fixed, along with some other similar but harder-to-trigger bugs.

Thanks for the bug report, Laurent!
msg187083 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2013-04-16 14:16
Why does only 2.7 have tests?
msg187298 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2013-04-18 21:40
An oversight on my part, I think. I'll add tests for 3.x this weekend.
msg187533 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2013-04-21 22:30
Hmm, so actually most of the bugs fixed in 2.7 and 3.2 weren't present
in 3.3 and 3.4, and those versions already had tests equivalent to the
tests I added for 2.7/3.2.

As for the changes that I did make to 3.3/3.4:

- two of the three cover cases that only occur if the output data is
  larger than ~32GiB. Even if we have a buildbot with enough memory for
  it (which I don't think we do), actually running such tests would take
  forever and then some.

- the third is for a condition that's actually pretty much impossible to
  trigger - grow_buffer() has to be called on a buffer that is already at
  least 8*((size_t)-1)/9 bytes long. On a 64-bit system this is
  astronomically large, while on a 32-bit system the OS will probably
  have reserved more than 1/9th of the virtual address space for itself,
  so it won't be possible to allocate a large enough buffer.
History
Date User Action Args
2022-04-11 14:57:28adminsetgithub: 58606
2013-04-21 22:30:05nadeem.vawdasetstatus: open -> closed

messages: + msg187533
2013-04-18 21:40:44nadeem.vawdasetstatus: closed -> open

messages: + msg187298
2013-04-16 14:16:35benjamin.petersonsetmessages: + msg187083
2012-10-21 21:12:54nadeem.vawdasetstatus: open -> closed
resolution: fixed
messages: + msg173484

stage: needs patch -> resolved
2012-10-21 21:09:42python-devsetmessages: + msg173483
2012-10-21 20:30:19nadeem.vawdasetmessages: + msg173481
2012-10-21 20:20:08serhiy.storchakasetnosy: + serhiy.storchaka

messages: + msg173479
versions: + Python 3.3, Python 3.4
2012-10-21 19:22:09python-devsetnosy: + python-dev
messages: + msg173471
2012-03-24 19:38:53nadeem.vawdasetmessages: + msg156717
2012-03-24 19:37:11loewissetmessages: + msg156715
2012-03-24 19:35:51nadeem.vawdasetpriority: release blocker -> normal

messages: + msg156714
2012-03-24 19:31:42benjamin.petersonsetmessages: + msg156713
2012-03-24 17:52:25nadeem.vawdasetpriority: normal -> release blocker
nosy: + georg.brandl, benjamin.peterson
messages: + msg156711

2012-03-24 17:39:16nadeem.vawdasetversions: + Python 3.2
2012-03-24 17:35:02nadeem.vawdasetmessages: + msg156710
2012-03-24 17:33:46nadeem.vawdasetfiles: + testbz2.py

assignee: nadeem.vawda
components: + Extension Modules

nosy: + nadeem.vawda
messages: + msg156709
stage: needs patch
2012-03-24 16:45:34Laurent.Gautiersetmessages: + msg156705
2012-03-24 16:36:13loewissetnosy: + loewis
messages: + msg156701
2012-03-24 16:15:19Laurent.Gautiercreate