classification
Title: zipfile writes incorrect local file header for large files in zip64
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.4, Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Kristof.Keppens, Nico.Möller, Paul, Ruben.Gonzalez, alanmcintyre, amaury.forgeotdarc, christian.heimes, craigds, dandrzejewski, enlavin, eric.araujo, gregory.p.smith, jhenry82, lambacck, loewis, nadeem.vawda, python-dev, ronaldoussoren, segfault42, serhiy.storchaka
Priority: normal Keywords: needs review, patch

Created on 2010-08-31 01:02 by craigds, last changed 2013-01-14 22:49 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
zipfile_zip64_header.patch craigds, 2010-08-31 01:02
zipfile-huge-files.diff alanmcintyre, 2010-09-07 04:57 review
zipfile_zip64_always.patch serhiy.storchaka, 2012-09-23 09:54 Always write Zip64 extra review
zipfile_zip64_try.patch serhiy.storchaka, 2012-09-23 09:55 Try to write Zip64 extra only if needed review
zipfile_zip64_always_2.patch serhiy.storchaka, 2012-11-28 12:25 Always write Zip64 extra review
zipfile_zip64_try_2.patch serhiy.storchaka, 2012-11-28 12:26 Try to write Zip64 extra only if needed review
zipfile_zip64_try_2-2.7.patch serhiy.storchaka, 2013-01-04 13:27 review
zipfile_zip64_try_2-3.2.patch serhiy.storchaka, 2013-01-04 13:27 review
Messages (20)
msg115250 - (view) Author: Craig de Stigter (craigds) Date: 2010-08-31 01:02
Steps to reproduce:

# create a large (>4gb) file
f = open('foo.txt', 'wb')
text = 'a' * 1024**2
for i in xrange(5 * 1024):
    f.write(text)
f.close()

# now zip the file
import zipfile
z = zipfile.ZipFile('foo.zip', mode='w', allowZip64=True)
z.write('foo.txt')
z.close()


Now inspect the file headers using a hex editor. The written headers are incorrect. The filesize and compressed size should be written as 0xffffffff and the 'extra field' should contain the actual sizes.


Tested on Python 2.5 but looking at the latest code in 3.2 it still looks broken.

The problem is that the ZipInfo.FileHeader() is written before the filesize is populated, so Zip64 extensions are not written. Later, the sizes in the header are written, but Zip64 extensions are not taken into account and the filesize is just wrapped (7gb becomes 3gb, for instance).

My patch fixes the problem on Python 2.5, it might need minor porting to fix trunk. It works by assigning the uncompressed filesize to the ZipInfo header initially, then writing the header. Then later on, I re-write the header (this is okay since the header size will not have increased.)
msg115466 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-09-03 16:53
A tip about versions: Development happens on the current active branch, py3k (future 3.2 version), and bug or doc fixes are backported to the stable versions 2.7 and 3.1. Security fixes go into 2.6 too.

Can you reproduce your bug in 2.7, 3.1 and 3.2?

Adding Alan to nosy since he’s listed in Misc/maintainers.rst.
msg115514 - (view) Author: Craig de Stigter (craigds) Date: 2010-09-03 21:47
Yes, the bug still exists in Python 3.1.2. However, struct.pack() no longer silently ignores overflow, so I get this error instead:


>>> z.write('foo.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.1/zipfile.py", line 1095, in write
    zinfo.file_size))
struct.error: argument out of range
msg115660 - (view) Author: Alan McIntyre (alanmcintyre) * (Python committer) Date: 2010-09-05 17:42
Thanks for the patch, Craig; I should have some time later today or tomorrow to do a review.  Did you have a patch for the test suite(s) as well?  If not, I can just make sure your test case is covered in test_zipfile64.
msg115672 - (view) Author: Craig de Stigter (craigds) Date: 2010-09-05 21:16
Hi, sorry no I haven't had time to add a real test for this
msg115741 - (view) Author: Alan McIntyre (alanmcintyre) * (Python committer) Date: 2010-09-07 04:57
Here's an updated patch for the py3k trunk with tests.  This pretty much doubles the runtime of test_zipfile64.py.  The patch also removes some unnecessary code from the existing test_zipfile64 tests.

Note: It looks like writestr will also suffer from a struct.pack overflow if it's given a ZipInfo with the third general purpose flag bit set.  I won't have time to address that until next weekend, probably.
msg146923 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011-11-03 12:17
Issue 6434 was marked as a duplicate of this issue.
msg156442 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-03-20 17:52
I am afraid that the problem is more complicated. With the option allowZip64=True all files need to write with this extension, because size of local file header may change and there will be after compression just go back and rewrite it.

Now it appears that the Zip64 option simply does not work.
msg170645 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-09-18 13:44
Serhiy:
If I understand you correctly it should be easy to fix. The code in close() has to check if any file is beyond the ZIP64 limit and then write all headers with extra args. Is that correct?
msg171010 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-09-22 17:56
No, on the contrary, it is not such easy to fix, and the patch is incorrect. 
Sorry that it is not clear either. The size of the header with extra args 
depends on the size of the file. The file size can be changed in the process of 
compressing, and compressed size may be larger than uncompressed size, 
exceeding 32-bit boundary. Rewriting the header with extra args, we can 
overwrite compressed data.

I was put off the issue for further more careful research. Thanks for the 
reminder.

One solution is always (even for smallest files) to write 64-bit sizes when 
allowZip64 is true.
msg171025 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-09-23 09:54
I see two rational solutions of the issue (all written below is applicable only for allowZip64=True):

1) Always write Zip64 extended information extra field. This approach always successful, but the zipfile size will increase by 20 bytes for each file.

The first patch (zipfile_zip64_always.patch) uses this approach.

2) Write Zip64 extended information extra field only if assumed file size is more than a certain limit. In very rare cases this leads to the impossibility of compression of the file which can be compressed the first way. However it produces the same file as before patch in most cases.

The second patch (zipfile_zip64_try.patch) is based on Alan's patch and uses the second approach. The probability of errors is reduced and they are now detected and does not lead to a silent data damage.

Both patches are for Python 3.3. If any patch is good, I'll backport it for the older versions.
msg172648 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-11 15:08
What the conclusion about the patches? Which variant I should backport for older versions?
msg172652 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2012-10-11 15:22
I'd write the extended header when the current file size is larger than the zip64 limit (that is, when 'st.st_size > ZIP64_LIMIT' in the write method.

That way the minimal header size is used whenever possible.

As you noted this can cause problems when the file grows beyond the limit while it is stored in the zipfile, but IMHO storing data while it is modified is asking for problems anyway.

BTW. I haven't actually review the patch yet.
msg175471 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-12 20:38
Please, review the patches.
msg176538 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-28 12:26
Patches updated to resolve merge conflict with issue11981.

Please review and apply any of this patches. This is needed for some
other my zipfile patches.
msg178603 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-30 19:11
What variant of patches should I commit? Or prepare other?
msg179013 - (view) Author: Nico Möller (Nico.Möller) Date: 2013-01-04 10:21
I most definitely need a patch for 2.7.3 

Would be awesome if you could provide a patch for that version.
msg179019 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-04 13:27
Here are second variant patches for 2.7 and 3.2.
msg179987 - (view) Author: Roundup Robot (python-dev) Date: 2013-01-14 22:45
New changeset ce869b05762c by Serhiy Storchaka in branch '2.7':
Issue #9720: zipfile now writes correct local headers for files larger than 4 GiB.
http://hg.python.org/cpython/rev/ce869b05762c

New changeset b93848ca7760 by Serhiy Storchaka in branch '3.2':
Issue #9720: zipfile now writes correct local headers for files larger than 4 GiB.
http://hg.python.org/cpython/rev/b93848ca7760

New changeset 656a45738e5e by Serhiy Storchaka in branch '3.3':
Issue #9720: zipfile now writes correct local headers for files larger than 4 GiB.
http://hg.python.org/cpython/rev/656a45738e5e

New changeset 628a6af64a46 by Serhiy Storchaka in branch 'default':
Issue #9720: zipfile now writes correct local headers for files larger than 4 GiB.
http://hg.python.org/cpython/rev/628a6af64a46
msg179989 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-14 22:49
Fixed. Thank you for report, Craig de Stigter.
History
Date User Action Args
2013-01-14 22:49:08serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg179989

stage: patch review -> resolved
2013-01-14 22:45:09python-devsetnosy: + python-dev
messages: + msg179987
2013-01-04 13:27:38serhiy.storchakasetfiles: + zipfile_zip64_try_2-2.7.patch, zipfile_zip64_try_2-3.2.patch

messages: + msg179019
2013-01-04 10:21:58Nico.Möllersetnosy: + Nico.Möller
messages: + msg179013
2012-12-30 19:11:38serhiy.storchakasetmessages: + msg178603
2012-12-29 22:08:10serhiy.storchakasetassignee: serhiy.storchaka
2012-11-28 12:26:01serhiy.storchakasetfiles: + zipfile_zip64_always_2.patch, zipfile_zip64_try_2.patch

messages: + msg176538
2012-11-26 20:32:14jhenry82setnosy: + jhenry82
2012-11-12 20:38:26serhiy.storchakasetmessages: + msg175471
2012-10-19 08:54:37Ruben.Gonzalezsetnosy: + Ruben.Gonzalez
2012-10-11 15:22:27ronaldoussorensetmessages: + msg172652
2012-10-11 15:08:29serhiy.storchakasetmessages: + msg172648
versions: + Python 3.4
2012-09-23 09:55:46serhiy.storchakasetfiles: + zipfile_zip64_try.patch
stage: needs patch -> patch review
2012-09-23 09:54:19serhiy.storchakasetfiles: + zipfile_zip64_always.patch
nosy: + loewis, gregory.p.smith, ronaldoussoren
messages: + msg171025

2012-09-22 17:56:23serhiy.storchakasetmessages: + msg171010
2012-09-18 13:44:30christian.heimessetkeywords: + needs review
nosy: + christian.heimes
messages: + msg170645

2012-09-18 13:25:53Kristof.Keppenssetnosy: + Kristof.Keppens
2012-03-20 17:52:08serhiy.storchakasetmessages: + msg156442
2012-03-20 17:13:23serhiy.storchakasetnosy: + serhiy.storchaka
2012-03-20 14:35:55dandrzejewskisetnosy: + dandrzejewski
2011-11-03 12:17:55nadeem.vawdasetversions: + Python 3.3, - Python 3.1
nosy: + amaury.forgeotdarc, nadeem.vawda, lambacck, segfault42, enlavin, Paul

messages: + msg146923

stage: needs patch
2011-11-03 12:17:17nadeem.vawdalinkissue6434 superseder
2010-09-07 04:57:46alanmcintyresetfiles: + zipfile-huge-files.diff

messages: + msg115741
2010-09-05 21:16:38craigdssetmessages: + msg115672
2010-09-05 17:42:17alanmcintyresetmessages: + msg115660
2010-09-03 21:47:12craigdssetmessages: + msg115514
2010-09-03 16:53:46eric.araujosetnosy: + eric.araujo, alanmcintyre

messages: + msg115466
versions: - Python 2.6, Python 2.5, Python 3.3
2010-08-31 01:02:17craigdscreate