classification
Title: Within zipfile, use of zlib.crc32 raises OverflowError at argument-parsing time on large strings
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: 27130 Superseder:
Assigned To: Nosy List: Danny.Yoo, gregory.p.smith, martin.panter, nadeem.vawda, serhiy.storchaka
Priority: normal Keywords:

Created on 2015-01-23 23:49 by Danny.Yoo, last changed 2016-08-07 16:32 by gregory.p.smith. This issue is now closed.

Messages (5)
msg234587 - (view) Author: Danny Yoo (Danny.Yoo) Date: 2015-01-23 23:49
Reproduction steps:

---
$ python2.7 -c "import zlib;zlib.crc32('a'*(1<<31))"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
OverflowError: size does not fit in an int
---

We ran into this bug in zlib.crc32 when using zipfile.writestr() with a very large string; as soon as zipfile tried to write the crc checksum, it raised this error.


Python 3 does not appear to suffer from this bug.
msg234588 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2015-01-23 23:56
This bug prevents zipfile's writestr() from writing large data (longer than UINT_MAX) to a 64-bit zip file.

The zlib.crc32 function which, as written, cannot accept input with a size larger than an unsigned int.

https://hg.python.org/cpython/file/94ec4d8cf104/Modules/zlibmodule.c#l964

Python 3 has updated this to call the zlib crc32 function multiple times in this situation:

https://hg.python.org/cpython/file/93888975606b/Modules/zlibmodule.c#l1210

so the fix exists, we just need to do this in 2.7.
msg234589 - (view) Author: Danny Yoo (Danny.Yoo) Date: 2015-01-24 00:52
Unfortunately, fixing just zlib.crc32 isn't quite enough for our purposes.  We still will see OverflowErrow in zipfile if compression is selected.


Demonstration code:

############################################
import zipfile

## Possible workaround: monkey-patch crc32 from binascii?!
import binascii
zipfile.crc32 = binascii.crc32

content = 'a'*(1<<31)
filename = '/tmp/zip_test.zip'

zf = zipfile.ZipFile(filename, "w",
                     compression=zipfile.ZIP_DEFLATED,
                     allowZip64=True)
zf.writestr('big', content)
zf.close()

zf = zipfile.ZipFile(filename, "r", allowZip64=True)
print zf.open('big').read() == content
#############################################


This will raise the following error under Python 2.7.6:

#############################################
$ python zip_test.py
Traceback (most recent call last):
  File "zip_test.py", line 13, in <module>
    zf.writestr('big', content)
  File "/usr/lib/python2.7/zipfile.py", line 1228, in writestr
    bytes = co.compress(bytes) + co.flush()
OverflowError: size does not fit in an int
#############################################



If we use compression=zipfile.ZIP_STORED, we don't see this error, but it kind of misses a major point of using zipfile.
msg266467 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-05-27 02:38
Apparently crc32() was fixed in Python 3 via Issue 10276.

See also Issue 27130 about 64-bit support more generally in zlib.
msg272123 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2016-08-07 16:32
This appears to have been fixed by at least issue27130's https://hg.python.org/cpython/rev/2192edcfea02 recent commits.

greg:cpython/build27$ ./python -c "import zlib;zlib.crc32('a'*(1<<31))"
greg:cpython/build27$ ./python ../zipfile_2gb_test.py
True
greg:cpython/build27$ ls -al /tmp/zip_test.zip
-rw-rw-r-- 1 greg greg 2087407 Aug  7 09:28 /tmp/zip_test.zip
greg:~/sandbox/python/cpython/build27$ unzip -t /tmp/zip_test.zip
Archive:  /tmp/zip_test.zip
    testing: big                      OK
No errors detected in compressed data of /tmp/zip_test.zip.
greg:cpython/build27$ unzip -l /tmp/zip_test.zip
Archive:  /tmp/zip_test.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
2147483648  2016-08-07 09:27   big
---------                     -------
2147483648                     1 file
History
Date User Action Args
2016-08-07 16:32:30gregory.p.smithsetstatus: open -> closed
resolution: fixed
messages: + msg272123

stage: needs patch -> resolved
2016-06-03 19:54:43gregory.p.smithsetdependencies: + zlib: OverflowError while trying to compress 2^32 bytes or more
stage: needs patch
2016-05-27 02:38:15martin.pantersetnosy: + martin.panter
messages: + msg266467
2015-03-20 19:40:27serhiy.storchakasetnosy: + serhiy.storchaka
2015-02-05 19:36:38serhiy.storchakasetnosy: + nadeem.vawda
2015-01-24 00:55:22Danny.Yoosettitle: zlib.crc32 raises OverflowError at argument-parsing time on large strings -> Within zipfile, use of zlib.crc32 raises OverflowError at argument-parsing time on large strings
2015-01-24 00:52:11Danny.Yoosetmessages: + msg234589
2015-01-23 23:56:30gregory.p.smithsetmessages: + msg234588
2015-01-23 23:49:26Danny.Yoocreate