classification
Title: Zipfile generates Zipfile error in zip with 0 total number of disk in Zip64 end of central directory locator
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Guillaume.Carre, Ramsey Kant, akuchling, alanmcintyre, cheryl.sabella, miss-islington, serhiy.storchaka, takluyver, twouters
Priority: normal Keywords: patch

Created on 2014-07-29 21:42 by Guillaume.Carre, last changed 2019-05-28 23:33 by miss-islington. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 5985 merged fran6co, 2018-03-05 15:43
PR 13641 merged miss-islington, 2019-05-28 23:15
Messages (12)
msg224257 - (view) Author: Guillaume Carre (Guillaume.Carre) Date: 2014-07-29 21:42
I've got a zip file with a Zip64 end of central directory locator in which:
- total number of disks = 0000
- number of the disk with the start of the zip64 end of central directory = 0000

According to the test line 176 in zipfile.py this fails:
    if diskno != 0 or disks != 1:
        raise BadZipfile("zipfiles that span multiple disks are not supported")

I believe the test should be changed to  
    if diskno != 0 or disks > 1:
msg313639 - (view) Author: Thomas Kluyver (takluyver) * Date: 2018-03-12 11:42
Do you know what tool created the zip file? I can't find anything in the spec (https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT ) to say whether 0 is a valid value for the number of disks.
msg313642 - (view) Author: Thomas Kluyver (takluyver) * Date: 2018-03-12 12:08
I found source code for some other projects handling the same data. They all seem to agree that it should be 1:

- Golang's zip reading code: https://github.com/golang/go/blob/f7ac70a56604033e2b1abc921d3f0f6afc85a7b3/src/archive/zip/reader.go#L536-L538
- A C contrib file with zlib: https://github.com/madler/zlib/blob/cacf7f1d4e3d44d871b605da3b647f07d718623f/contrib/minizip/zip.c#L620-L624
- Code from Info-ZIP, which is used by many Linux distros, is a bit less clear, but it has an illuminating comment:

    if ((G.ecrec.number_this_disk != 0xFFFF) &&
        (G.ecrec.number_this_disk != ecloc64_total_disks - 1)) {
      /* Note: For some unknown reason, the developers at PKWARE decided to
         store the "zip64 total disks" value as a counter starting from 1,
         whereas all other "split/span volume" related fields use 0-based
         volume numbers. Sigh... */

So I think you've got an invalid zip file. If it's otherwise valid, there might be a case for Python tolerating that particular mistake. But it's probably better to fix whatever is making the incorrect zip file, because other tools are also going to reject it.
msg313657 - (view) Author: Guillaume Carre (Guillaume.Carre) Date: 2018-03-12 15:24
Hi,
In my case the zip file was created from windows 7  context menu (send to)
Regards,
Guillaume

On Mon, Mar 12, 2018 at 5:08 AM, Thomas Kluyver <report@bugs.python.org>
wrote:

>
> Thomas Kluyver <thomas@kluyver.me.uk> added the comment:
>
> I found source code for some other projects handling the same data. They
> all seem to agree that it should be 1:
>
> - Golang's zip reading code: https://github.com/golang/go/blob/
> f7ac70a56604033e2b1abc921d3f0f6afc85a7b3/src/archive/zip/
> reader.go#L536-L538
> - A C contrib file with zlib: https://github.com/madler/zlib/blob/
> cacf7f1d4e3d44d871b605da3b647f07d718623f/contrib/minizip/zip.c#L620-L624
> - Code from Info-ZIP, which is used by many Linux distros, is a bit less
> clear, but it has an illuminating comment:
>
>     if ((G.ecrec.number_this_disk != 0xFFFF) &&
>         (G.ecrec.number_this_disk != ecloc64_total_disks - 1)) {
>       /* Note: For some unknown reason, the developers at PKWARE decided to
>          store the "zip64 total disks" value as a counter starting from 1,
>          whereas all other "split/span volume" related fields use 0-based
>          volume numbers. Sigh... */
>
> So I think you've got an invalid zip file. If it's otherwise valid, there
> might be a case for Python tolerating that particular mistake. But it's
> probably better to fix whatever is making the incorrect zip file, because
> other tools are also going to reject it.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue22102>
> _______________________________________
>
msg313659 - (view) Author: Thomas Kluyver (takluyver) * Date: 2018-03-12 16:01
If every Windows 7 computer is generating zipfiles which are invalid in this way, that would be a pretty strong argument for Python (and other tools) to accept it. But if that was the case, I would also expect that there would be many more issues about it.

Are the files you're compressing large (multi-GB)? Python only uses the zip64 format when the files are too big for the older zip format; maybe Windows is doing the same. Even in that case, I'm still surprised that more people don't hit it.
msg313677 - (view) Author: Guillaume Carre (Guillaume.Carre) Date: 2018-03-12 18:44
Yes these were pretty large zip 30 to 60Gb with thousands of small files in
them I've fixed locally on our servers and we've been happy even after
accepting similar sized files from linux machine.
I'm also quite surprised about this not being reported by others.

On Mon, Mar 12, 2018 at 9:01 AM, Thomas Kluyver <report@bugs.python.org>
wrote:

>
> Thomas Kluyver <thomas@kluyver.me.uk> added the comment:
>
> If every Windows 7 computer is generating zipfiles which are invalid in
> this way, that would be a pretty strong argument for Python (and other
> tools) to accept it. But if that was the case, I would also expect that
> there would be many more issues about it.
>
> Are the files you're compressing large (multi-GB)? Python only uses the
> zip64 format when the files are too big for the older zip format; maybe
> Windows is doing the same. Even in that case, I'm still surprised that more
> people don't hit it.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue22102>
> _______________________________________
>
msg340163 - (view) Author: Ramsey Kant (Ramsey Kant) Date: 2019-04-13 15:03
I would second this PR.  The Win32 API that creates ZIP64 files produces ZIP64 files with the "diskno" as 0 and "disks" as 0 (instead of "1" as indicated by the spec).
msg342361 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2019-05-13 17:07
I also ran across this issue today, where the 'disks' value in a Zip file is 0. I'm trying to find out what software was used to create them, but it's quite plausible that it's Windows as Ramsey Kant suggests. So I think this fix should get applied to 3.8 and 3.7. Would it help if I produced a patch?

The PKWare Zip specification that takluyver links above has been updated -- it now has an April 29th updated -- but none of the changes are relevant to this.

Interestingly, the 'ditto' command on MacOS X (which can also unpack zip files) doesn't complain about the disk number either. I was unable to figure out where the source code for ditto is; I couldn't find it on opensource.apple.com.
msg342367 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2019-05-13 17:48
Oh, I missed that there was already a patch. BTW, I found two dissections of zip files that also show disk numbers of 0: the one at https://rzymek.github.io/post/excel-zip64/ is exploring an Excel Zip issue, and the forensic tutorial at https://users.cs.jmu.edu/buchhofp/forensics/formats/pkzip.html is discussing the format.
msg343825 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2019-05-28 23:15
New changeset ab0716ed1ea2957396054730afbb80c1825f9786 by Cheryl Sabella (Francisco Facioni) in branch 'master':
bpo-22102: Fixes zip files with disks set to 0 (GH-5985)
https://github.com/python/cpython/commit/ab0716ed1ea2957396054730afbb80c1825f9786
msg343827 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2019-05-28 23:19
@Guillaume.Carre, thank you for the report and @fran6co, thank you for the contribution.
msg343831 - (view) Author: miss-islington (miss-islington) Date: 2019-05-28 23:33
New changeset 0eb69990c85b6c82c677d5a43e3df28836ae845e by Miss Islington (bot) in branch '3.7':
bpo-22102: Fixes zip files with disks set to 0 (GH-5985)
https://github.com/python/cpython/commit/0eb69990c85b6c82c677d5a43e3df28836ae845e
History
Date User Action Args
2019-05-28 23:33:23miss-islingtonsetnosy: + miss-islington
messages: + msg343831
2019-05-28 23:19:15cheryl.sabellasetstatus: open -> closed
resolution: fixed
messages: + msg343827

stage: patch review -> resolved
2019-05-28 23:15:34miss-islingtonsetpull_requests: + pull_request13536
2019-05-28 23:15:22cheryl.sabellasetnosy: + cheryl.sabella
messages: + msg343825
2019-05-13 17:48:18akuchlingsetmessages: + msg342367
2019-05-13 17:07:11akuchlingsetnosy: + akuchling
messages: + msg342361
2019-04-14 04:39:24xtreaksetnosy: + twouters, alanmcintyre, serhiy.storchaka

versions: - Python 2.7, Python 3.5, Python 3.6, Python 3.7, Python 3.9
2019-04-13 15:03:24Ramsey Kantsetnosy: + Ramsey Kant

messages: + msg340163
versions: + Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 3.9
2018-03-12 18:44:20Guillaume.Carresetmessages: + msg313677
2018-03-12 16:01:19takluyversetmessages: + msg313659
2018-03-12 15:24:21Guillaume.Carresetmessages: + msg313657
2018-03-12 12:08:12takluyversetmessages: + msg313642
2018-03-12 11:42:28takluyversetnosy: + takluyver
messages: + msg313639
2018-03-05 15:43:02fran6cosetkeywords: + patch
stage: patch review
pull_requests: + pull_request5752
2014-07-29 21:42:10Guillaume.Carrecreate