Message 254684 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	martin.panter
Recipients	Ben Cipollini, Matthew.Brett, martin.panter, nadeem.vawda, twouters
Date	2015-11-15.11:13:22
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1447586002.89.0.504314186433.issue25626@psf.upfronthosting.co.za>
In-reply-to

Content
Thanks for the report. Can you confirm if this demo illustrates your problem? For me, I only have 2 GiB of memory so I get a MemoryError, which seems reasonable for my situation. from gzip import GzipFile from io import BytesIO file = BytesIO() writer = GzipFile(fileobj=file, mode="wb") writer.write(b"data") writer.close() file.seek(0) reader = GzipFile(fileobj=file, mode="rb") data = reader.read(232) # Ideally this should return b"data" Assuming that triggers the OverflowError, then the heart of the problem is that the zlib.decompressobj.decompress() method does not accept such large numbers for the length limit: >>> import zlib >>> decompressor = zlib.decompressobj(wbits=16 + 15) >>> decompressor.decompress(file.getvalue(), 232) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: Python int too large for C unsigned int I think the ideal fix would be to cap the limit at 2**32 - 1 in the zlib library. Would this be okay for a 3.5.1 bug fix release, or would it be considered a feature change? Failing that, another option would be to cap the limit in the gzip library, and just document the zlib limitation. I already have a patch in Issue 23200 documenting another quirk when max_length=0. The same problem may also apply to the LZMA and bzip2 modules; I need to check.

Thanks for the report. Can you confirm if this demo illustrates your problem? For me, I only have 2 GiB of memory so I get a MemoryError, which seems reasonable for my situation.

from gzip import GzipFile
from io import BytesIO
file = BytesIO()
writer = GzipFile(fileobj=file, mode="wb")
writer.write(b"data")
writer.close()
file.seek(0)
reader = GzipFile(fileobj=file, mode="rb")
data = reader.read(2**32)  # Ideally this should return b"data"

Assuming that triggers the OverflowError, then the heart of the problem is that the zlib.decompressobj.decompress() method does not accept such large numbers for the length limit:

>>> import zlib
>>> decompressor = zlib.decompressobj(wbits=16 + 15)
>>> decompressor.decompress(file.getvalue(), 2**32)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large for C unsigned int

I think the ideal fix would be to cap the limit at 2**32 - 1 in the zlib library. Would this be okay for a 3.5.1 bug fix release, or would it be considered a feature change?

Failing that, another option would be to cap the limit in the gzip library, and just document the zlib limitation. I already have a patch in Issue 23200 documenting another quirk when max_length=0.

The same problem may also apply to the LZMA and bzip2 modules; I need to check.

History
Date	User	Action	Args
2015-11-15 11:13:22	martin.panter	set	recipients: + martin.panter, twouters, nadeem.vawda, Matthew.Brett, Ben Cipollini
2015-11-15 11:13:22	martin.panter	set	messageid: <1447586002.89.0.504314186433.issue25626@psf.upfronthosting.co.za>
2015-11-15 11:13:22	martin.panter	link	issue25626 messages
2015-11-15 11:13:22	martin.panter	create