Title: zipfile writes incorrect local file header for large files in zip64
Components: Library (Lib) Versions: Python 3.2, Python 3.3, Python 3.4, Python 2.7
Assigned To: serhiy.storchaka Nosy List: Kristof.Keppens, Nico.Möller, Paul, Ruben.Gonzalez, alanmcintyre, amaury.forgeotdarc, christian.heimes, craigds, dandrzejewski, enlavin, eric.araujo, gregory.p.smith, jhenry82, lambacck, loewis, nadeem.vawda, python-dev, ronaldoussoren, segfault42, serhiy.storchaka
Created on 2010-08-31 01:02 by craigds, last changed 2022-04-11 14:57 by admin. This issue is now closed.

msg115250 - (view) Author: Craig de Stigter (craigds) Date: 2010-08-31 01:02
Steps to reproduce:

# create a large (>4gb) file
f = open('foo.txt', 'wb')
text = 'a' * 1024**2
for i in xrange(5 * 1024):

# now zip the file
import zipfile
z = zipfile.ZipFile('', mode='w', allowZip64=True)

Now inspect the file headers using a hex editor. The written headers are incorrect. The filesize and compressed size should be written as 0xffffffff and the 'extra field' should contain the actual sizes.

Tested on Python 2.5 but looking at the latest code in 3.2 it still looks broken.

The problem is that the ZipInfo.FileHeader() is written before the filesize is populated, so Zip64 extensions are not written. Later, the sizes in the header are written, but Zip64 extensions are not taken into account and the filesize is just wrapped (7gb becomes 3gb, for instance).

My patch fixes the problem on Python 2.5, it might need minor porting to fix trunk. It works by assigning the uncompressed filesize to the ZipInfo header initially, then writing the header. Then later on, I re-write the header (this is okay since the header size will not have increased.)
msg115466 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-09-03 16:53
A tip about versions: Development happens on the current active branch, py3k (future 3.2 version), and bug or doc fixes are backported to the stable versions 2.7 and 3.1. Security fixes go into 2.6 too.

Can you reproduce your bug in 2.7, 3.1 and 3.2?

Adding Alan to nosy since he’s listed in Misc/maintainers.rst.
msg115514 - (view) Author: Craig de Stigter (craigds) Date: 2010-09-03 21:47
Yes, the bug still exists in Python 3.1.2. However, struct.pack() no longer silently ignores overflow, so I get this error instead:

>>> z.write('foo.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.1/", line 1095, in write
struct.error: argument out of range
msg115660 - (view) Author: Alan McIntyre (alanmcintyre) * (Python committer) Date: 2010-09-05 17:42
Thanks for the patch, Craig; I should have some time later today or tomorrow to do a review.  Did you have a patch for the test suite(s) as well?  If not, I can just make sure your test case is covered in test_zipfile64.
msg115672 - (view) Author: Craig de Stigter (craigds) Date: 2010-09-05 21:16
Hi, sorry no I haven't had time to add a real test for this
msg115741 - (view) Author: Alan McIntyre (alanmcintyre) * (Python committer) Date: 2010-09-07 04:57
Here's an updated patch for the py3k trunk with tests.  This pretty much doubles the runtime of  The patch also removes some unnecessary code from the existing test_zipfile64 tests.

Note: It looks like writestr will also suffer from a struct.pack overflow if it's given a ZipInfo with the third general purpose flag bit set.  I won't have time to address that until next weekend, probably.
msg146923 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011-11-03 12:17
Issue 6434 was marked as a duplicate of this issue.
msg156442 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-03-20 17:52
I am afraid that the problem is more complicated. With the option allowZip64=True all files need to write with this extension, because size of local file header may change and there will be after compression just go back and rewrite it.

Now it appears that the Zip64 option simply does not work.
msg170645 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-09-18 13:44
If I understand you correctly it should be easy to fix. The code in close() has to check if any file is beyond the ZIP64 limit and then write all headers with extra args. Is that correct?
msg171010 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-09-22 17:56
No, on the contrary, it is not such easy to fix, and the patch is incorrect. 
Sorry that it is not clear either. The size of the header with extra args 
depends on the size of the file. The file size can be changed in the process of 
compressing, and compressed size may be larger than uncompressed size, 
exceeding 32-bit boundary. Rewriting the header with extra args, we can 
overwrite compressed data.

I was put off the issue for further more careful research. Thanks for the 

One solution is always (even for smallest files) to write 64-bit sizes when 
allowZip64 is true.
msg171025 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-09-23 09:54
I see two rational solutions of the issue (all written below is applicable only for allowZip64=True):

1) Always write Zip64 extended information extra field. This approach always successful, but the zipfile size will increase by 20 bytes for each file.

The first patch (zipfile_zip64_always.patch) uses this approach.

2) Write Zip64 extended information extra field only if assumed file size is more than a certain limit. In very rare cases this leads to the impossibility of compression of the file which can be compressed the first way. However it produces the same file as before patch in most cases.

The second patch (zipfile_zip64_try.patch) is based on Alan's patch and uses the second approach. The probability of errors is reduced and they are now detected and does not lead to a silent data damage.

Both patches are for Python 3.3. If any patch is good, I'll backport it for the older versions.
msg172648 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-11 15:08
What the conclusion about the patches? Which variant I should backport for older versions?
msg172652 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2012-10-11 15:22
I'd write the extended header when the current file size is larger than the zip64 limit (that is, when 'st.st_size > ZIP64_LIMIT' in the write method.

That way the minimal header size is used whenever possible.

As you noted this can cause problems when the file grows beyond the limit while it is stored in the zipfile, but IMHO storing data while it is modified is asking for problems anyway.

BTW. I haven't actually review the patch yet.
msg175471 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-12 20:38
Please, review the patches.
msg176538 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-28 12:26
Patches updated to resolve merge conflict with issue11981.

Please review and apply any of this patches. This is needed for some
other my zipfile patches.
msg178603 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-30 19:11
What variant of patches should I commit? Or prepare other?
msg179013 - (view) Author: Nico Möller (Nico.Möller) Date: 2013-01-04 10:21
I most definitely need a patch for 2.7.3 

Would be awesome if you could provide a patch for that version.
msg179019 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-04 13:27
Here are second variant patches for 2.7 and 3.2.
msg179987 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-01-14 22:45
New changeset ce869b05762c by Serhiy Storchaka in branch '2.7':
Issue #9720: zipfile now writes correct local headers for files larger than 4 GiB.

New changeset b93848ca7760 by Serhiy Storchaka in branch '3.2':
Issue #9720: zipfile now writes correct local headers for files larger than 4 GiB.

New changeset 656a45738e5e by Serhiy Storchaka in branch '3.3':
Issue #9720: zipfile now writes correct local headers for files larger than 4 GiB.

New changeset 628a6af64a46 by Serhiy Storchaka in branch 'default':
Issue #9720: zipfile now writes correct local headers for files larger than 4 GiB.
msg179989 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-14 22:49
Fixed. Thank you for report, Craig de Stigter.
