Author martin.panter
Recipients Ben Cipollini, Matthew.Brett, martin.panter, nadeem.vawda, twouters
Date 2015-11-15.11:13:22
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1447586002.89.0.504314186433.issue25626@psf.upfronthosting.co.za>
In-reply-to
Content
Thanks for the report. Can you confirm if this demo illustrates your problem? For me, I only have 2 GiB of memory so I get a MemoryError, which seems reasonable for my situation.

from gzip import GzipFile
from io import BytesIO
file = BytesIO()
writer = GzipFile(fileobj=file, mode="wb")
writer.write(b"data")
writer.close()
file.seek(0)
reader = GzipFile(fileobj=file, mode="rb")
data = reader.read(2**32)  # Ideally this should return b"data"

Assuming that triggers the OverflowError, then the heart of the problem is that the zlib.decompressobj.decompress() method does not accept such large numbers for the length limit:

>>> import zlib
>>> decompressor = zlib.decompressobj(wbits=16 + 15)
>>> decompressor.decompress(file.getvalue(), 2**32)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large for C unsigned int

I think the ideal fix would be to cap the limit at 2**32 - 1 in the zlib library. Would this be okay for a 3.5.1 bug fix release, or would it be considered a feature change?

Failing that, another option would be to cap the limit in the gzip library, and just document the zlib limitation. I already have a patch in Issue 23200 documenting another quirk when max_length=0.

The same problem may also apply to the LZMA and bzip2 modules; I need to check.
History
Date User Action Args
2015-11-15 11:13:22martin.pantersetrecipients: + martin.panter, twouters, nadeem.vawda, Matthew.Brett, Ben Cipollini
2015-11-15 11:13:22martin.pantersetmessageid: <1447586002.89.0.504314186433.issue25626@psf.upfronthosting.co.za>
2015-11-15 11:13:22martin.panterlinkissue25626 messages
2015-11-15 11:13:22martin.pantercreate