Message254684
Thanks for the report. Can you confirm if this demo illustrates your problem? For me, I only have 2 GiB of memory so I get a MemoryError, which seems reasonable for my situation.
from gzip import GzipFile
from io import BytesIO
file = BytesIO()
writer = GzipFile(fileobj=file, mode="wb")
writer.write(b"data")
writer.close()
file.seek(0)
reader = GzipFile(fileobj=file, mode="rb")
data = reader.read(2**32) # Ideally this should return b"data"
Assuming that triggers the OverflowError, then the heart of the problem is that the zlib.decompressobj.decompress() method does not accept such large numbers for the length limit:
>>> import zlib
>>> decompressor = zlib.decompressobj(wbits=16 + 15)
>>> decompressor.decompress(file.getvalue(), 2**32)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: Python int too large for C unsigned int
I think the ideal fix would be to cap the limit at 2**32 - 1 in the zlib library. Would this be okay for a 3.5.1 bug fix release, or would it be considered a feature change?
Failing that, another option would be to cap the limit in the gzip library, and just document the zlib limitation. I already have a patch in Issue 23200 documenting another quirk when max_length=0.
The same problem may also apply to the LZMA and bzip2 modules; I need to check. |
|
Date |
User |
Action |
Args |
2015-11-15 11:13:22 | martin.panter | set | recipients:
+ martin.panter, twouters, nadeem.vawda, Matthew.Brett, Ben Cipollini |
2015-11-15 11:13:22 | martin.panter | set | messageid: <1447586002.89.0.504314186433.issue25626@psf.upfronthosting.co.za> |
2015-11-15 11:13:22 | martin.panter | link | issue25626 messages |
2015-11-15 11:13:22 | martin.panter | create | |
|