classification
Title: Zipfile generates Zipfile error in zip with 0 total number of disk in Zip64 end of central directory locator
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Guillaume.Carre, takluyver
Priority: normal Keywords: patch

Created on 2014-07-29 21:42 by Guillaume.Carre, last changed 2018-03-12 18:44 by Guillaume.Carre.

Pull Requests
URL Status Linked Edit
PR 5985 open fran6co, 2018-03-05 15:43
Messages (6)
msg224257 - (view) Author: Guillaume Carre (Guillaume.Carre) Date: 2014-07-29 21:42
I've got a zip file with a Zip64 end of central directory locator in which:
- total number of disks = 0000
- number of the disk with the start of the zip64 end of central directory = 0000

According to the test line 176 in zipfile.py this fails:
    if diskno != 0 or disks != 1:
        raise BadZipfile("zipfiles that span multiple disks are not supported")

I believe the test should be changed to  
    if diskno != 0 or disks > 1:
msg313639 - (view) Author: Thomas Kluyver (takluyver) * Date: 2018-03-12 11:42
Do you know what tool created the zip file? I can't find anything in the spec (https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT ) to say whether 0 is a valid value for the number of disks.
msg313642 - (view) Author: Thomas Kluyver (takluyver) * Date: 2018-03-12 12:08
I found source code for some other projects handling the same data. They all seem to agree that it should be 1:

- Golang's zip reading code: https://github.com/golang/go/blob/f7ac70a56604033e2b1abc921d3f0f6afc85a7b3/src/archive/zip/reader.go#L536-L538
- A C contrib file with zlib: https://github.com/madler/zlib/blob/cacf7f1d4e3d44d871b605da3b647f07d718623f/contrib/minizip/zip.c#L620-L624
- Code from Info-ZIP, which is used by many Linux distros, is a bit less clear, but it has an illuminating comment:

    if ((G.ecrec.number_this_disk != 0xFFFF) &&
        (G.ecrec.number_this_disk != ecloc64_total_disks - 1)) {
      /* Note: For some unknown reason, the developers at PKWARE decided to
         store the "zip64 total disks" value as a counter starting from 1,
         whereas all other "split/span volume" related fields use 0-based
         volume numbers. Sigh... */

So I think you've got an invalid zip file. If it's otherwise valid, there might be a case for Python tolerating that particular mistake. But it's probably better to fix whatever is making the incorrect zip file, because other tools are also going to reject it.
msg313657 - (view) Author: Guillaume Carre (Guillaume.Carre) Date: 2018-03-12 15:24
Hi,
In my case the zip file was created from windows 7  context menu (send to)
Regards,
Guillaume

On Mon, Mar 12, 2018 at 5:08 AM, Thomas Kluyver <report@bugs.python.org>
wrote:

>
> Thomas Kluyver <thomas@kluyver.me.uk> added the comment:
>
> I found source code for some other projects handling the same data. They
> all seem to agree that it should be 1:
>
> - Golang's zip reading code: https://github.com/golang/go/blob/
> f7ac70a56604033e2b1abc921d3f0f6afc85a7b3/src/archive/zip/
> reader.go#L536-L538
> - A C contrib file with zlib: https://github.com/madler/zlib/blob/
> cacf7f1d4e3d44d871b605da3b647f07d718623f/contrib/minizip/zip.c#L620-L624
> - Code from Info-ZIP, which is used by many Linux distros, is a bit less
> clear, but it has an illuminating comment:
>
>     if ((G.ecrec.number_this_disk != 0xFFFF) &&
>         (G.ecrec.number_this_disk != ecloc64_total_disks - 1)) {
>       /* Note: For some unknown reason, the developers at PKWARE decided to
>          store the "zip64 total disks" value as a counter starting from 1,
>          whereas all other "split/span volume" related fields use 0-based
>          volume numbers. Sigh... */
>
> So I think you've got an invalid zip file. If it's otherwise valid, there
> might be a case for Python tolerating that particular mistake. But it's
> probably better to fix whatever is making the incorrect zip file, because
> other tools are also going to reject it.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue22102>
> _______________________________________
>
msg313659 - (view) Author: Thomas Kluyver (takluyver) * Date: 2018-03-12 16:01
If every Windows 7 computer is generating zipfiles which are invalid in this way, that would be a pretty strong argument for Python (and other tools) to accept it. But if that was the case, I would also expect that there would be many more issues about it.

Are the files you're compressing large (multi-GB)? Python only uses the zip64 format when the files are too big for the older zip format; maybe Windows is doing the same. Even in that case, I'm still surprised that more people don't hit it.
msg313677 - (view) Author: Guillaume Carre (Guillaume.Carre) Date: 2018-03-12 18:44
Yes these were pretty large zip 30 to 60Gb with thousands of small files in
them I've fixed locally on our servers and we've been happy even after
accepting similar sized files from linux machine.
I'm also quite surprised about this not being reported by others.

On Mon, Mar 12, 2018 at 9:01 AM, Thomas Kluyver <report@bugs.python.org>
wrote:

>
> Thomas Kluyver <thomas@kluyver.me.uk> added the comment:
>
> If every Windows 7 computer is generating zipfiles which are invalid in
> this way, that would be a pretty strong argument for Python (and other
> tools) to accept it. But if that was the case, I would also expect that
> there would be many more issues about it.
>
> Are the files you're compressing large (multi-GB)? Python only uses the
> zip64 format when the files are too big for the older zip format; maybe
> Windows is doing the same. Even in that case, I'm still surprised that more
> people don't hit it.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue22102>
> _______________________________________
>
History
Date User Action Args
2018-03-12 18:44:20Guillaume.Carresetmessages: + msg313677
2018-03-12 16:01:19takluyversetmessages: + msg313659
2018-03-12 15:24:21Guillaume.Carresetmessages: + msg313657
2018-03-12 12:08:12takluyversetmessages: + msg313642
2018-03-12 11:42:28takluyversetnosy: + takluyver
messages: + msg313639
2018-03-05 15:43:02fran6cosetkeywords: + patch
stage: patch review
pull_requests: + pull_request5752
2014-07-29 21:42:10Guillaume.Carrecreate