classification
Title: zipfile writes incorrect local file header for large files in zip64
Type: behavior Stage: needs patch
Components: Library (Lib) Versions: Python 3.3, Python 3.2, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Paul, alanmcintyre, amaury.forgeotdarc, craigds, enlavin, eric.araujo, lambacck, nadeem.vawda, segfault42
Priority: normal Keywords: patch

Created on 2010-08-31 01:02 by craigds, last changed 2011-11-03 12:17 by nadeem.vawda.

Files
File name Uploaded Description Edit
zipfile_zip64_header.patch craigds, 2010-08-31 01:02
zipfile-huge-files.diff alanmcintyre, 2010-09-07 04:57 review
Messages (7)
msg115250 - (view) Author: Craig de Stigter (craigds) Date: 2010-08-31 01:02
Steps to reproduce:

# create a large (>4gb) file
f = open('foo.txt', 'wb')
text = 'a' * 1024**2
for i in xrange(5 * 1024):
    f.write(text)
f.close()

# now zip the file
import zipfile
z = zipfile.ZipFile('foo.zip', mode='w', allowZip64=True)
z.write('foo.txt')
z.close()


Now inspect the file headers using a hex editor. The written headers are incorrect. The filesize and compressed size should be written as 0xffffffff and the 'extra field' should contain the actual sizes.


Tested on Python 2.5 but looking at the latest code in 3.2 it still looks broken.

The problem is that the ZipInfo.FileHeader() is written before the filesize is populated, so Zip64 extensions are not written. Later, the sizes in the header are written, but Zip64 extensions are not taken into account and the filesize is just wrapped (7gb becomes 3gb, for instance).

My patch fixes the problem on Python 2.5, it might need minor porting to fix trunk. It works by assigning the uncompressed filesize to the ZipInfo header initially, then writing the header. Then later on, I re-write the header (this is okay since the header size will not have increased.)
msg115466 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-09-03 16:53
A tip about versions: Development happens on the current active branch, py3k (future 3.2 version), and bug or doc fixes are backported to the stable versions 2.7 and 3.1. Security fixes go into 2.6 too.

Can you reproduce your bug in 2.7, 3.1 and 3.2?

Adding Alan to nosy since he’s listed in Misc/maintainers.rst.
msg115514 - (view) Author: Craig de Stigter (craigds) Date: 2010-09-03 21:47
Yes, the bug still exists in Python 3.1.2. However, struct.pack() no longer silently ignores overflow, so I get this error instead:


>>> z.write('foo.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.1/zipfile.py", line 1095, in write
    zinfo.file_size))
struct.error: argument out of range
msg115660 - (view) Author: Alan McIntyre (alanmcintyre) (Python committer) Date: 2010-09-05 17:42
Thanks for the patch, Craig; I should have some time later today or tomorrow to do a review.  Did you have a patch for the test suite(s) as well?  If not, I can just make sure your test case is covered in test_zipfile64.
msg115672 - (view) Author: Craig de Stigter (craigds) Date: 2010-09-05 21:16
Hi, sorry no I haven't had time to add a real test for this
msg115741 - (view) Author: Alan McIntyre (alanmcintyre) (Python committer) Date: 2010-09-07 04:57
Here's an updated patch for the py3k trunk with tests.  This pretty much doubles the runtime of test_zipfile64.py.  The patch also removes some unnecessary code from the existing test_zipfile64 tests.

Note: It looks like writestr will also suffer from a struct.pack overflow if it's given a ZipInfo with the third general purpose flag bit set.  I won't have time to address that until next weekend, probably.
msg146923 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011-11-03 12:17
Issue 6434 was marked as a duplicate of this issue.
History
Date User Action Args
2011-11-03 12:17:55nadeem.vawdasetversions: + Python 3.3, - Python 3.1
nosy: + amaury.forgeotdarc, nadeem.vawda, lambacck, segfault42, enlavin, Paul

messages: + msg146923

stage: needs patch
2011-11-03 12:17:17nadeem.vawdalinkissue6434 superseder
2010-09-07 04:57:46alanmcintyresetfiles: + zipfile-huge-files.diff

messages: + msg115741
2010-09-05 21:16:38craigdssetmessages: + msg115672
2010-09-05 17:42:17alanmcintyresetmessages: + msg115660
2010-09-03 21:47:12craigdssetmessages: + msg115514
2010-09-03 16:53:46eric.araujosetnosy: + eric.araujo, alanmcintyre

messages: + msg115466
versions: - Python 2.6, Python 2.5, Python 3.3
2010-08-31 01:02:17craigdscreate