Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zipfile generates Zipfile error in zip with 0 total number of disk in Zip64 end of central directory locator #66300

Closed
GuillaumeCarre mannequin opened this issue Jul 29, 2014 · 12 comments
Labels
3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@GuillaumeCarre
Copy link
Mannequin

GuillaumeCarre mannequin commented Jul 29, 2014

BPO 22102
Nosy @Yhg1s, @akuchling, @takluyver, @serhiy-storchaka, @csabella, @miss-islington
PRs
  • bpo-22102: Fixes zip files with disks set to 0 #5985
  • [3.7] bpo-22102: Fixes zip files with disks set to 0 (GH-5985) #13641
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-05-28.23:19:15.242>
    created_at = <Date 2014-07-29.21:42:10.253>
    labels = ['3.8', 'type-bug', 'library']
    title = 'Zipfile generates Zipfile error in zip with 0 total number of disk in Zip64 end of central directory locator'
    updated_at = <Date 2019-05-28.23:33:23.991>
    user = 'https://bugs.python.org/GuillaumeCarre'

    bugs.python.org fields:

    activity = <Date 2019-05-28.23:33:23.991>
    actor = 'miss-islington'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-05-28.23:19:15.242>
    closer = 'cheryl.sabella'
    components = ['Library (Lib)']
    creation = <Date 2014-07-29.21:42:10.253>
    creator = 'Guillaume.Carre'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 22102
    keywords = ['patch']
    message_count = 12.0
    messages = ['224257', '313639', '313642', '313657', '313659', '313677', '340163', '342361', '342367', '343825', '343827', '343831']
    nosy_count = 9.0
    nosy_names = ['twouters', 'akuchling', 'alanmcintyre', 'takluyver', 'serhiy.storchaka', 'Guillaume.Carre', 'cheryl.sabella', 'miss-islington', 'Ramsey Kant']
    pr_nums = ['5985', '13641']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue22102'
    versions = ['Python 3.8']

    @GuillaumeCarre
    Copy link
    Mannequin Author

    GuillaumeCarre mannequin commented Jul 29, 2014

    I've got a zip file with a Zip64 end of central directory locator in which:

    • total number of disks = 0000
    • number of the disk with the start of the zip64 end of central directory = 0000

    According to the test line 176 in zipfile.py this fails:
    if diskno != 0 or disks != 1:
    raise BadZipfile("zipfiles that span multiple disks are not supported")

    I believe the test should be changed to
    if diskno != 0 or disks > 1:

    @GuillaumeCarre GuillaumeCarre mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Jul 29, 2014
    @takluyver
    Copy link
    Mannequin

    takluyver mannequin commented Mar 12, 2018

    Do you know what tool created the zip file? I can't find anything in the spec (https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT ) to say whether 0 is a valid value for the number of disks.

    @takluyver
    Copy link
    Mannequin

    takluyver mannequin commented Mar 12, 2018

    I found source code for some other projects handling the same data. They all seem to agree that it should be 1:

    So I think you've got an invalid zip file. If it's otherwise valid, there might be a case for Python tolerating that particular mistake. But it's probably better to fix whatever is making the incorrect zip file, because other tools are also going to reject it.

    @GuillaumeCarre
    Copy link
    Mannequin Author

    GuillaumeCarre mannequin commented Mar 12, 2018

    Hi,
    In my case the zip file was created from windows 7 context menu (send to)
    Regards,
    Guillaume

    On Mon, Mar 12, 2018 at 5:08 AM, Thomas Kluyver <report@bugs.python.org>
    wrote:

    Thomas Kluyver <thomas@kluyver.me.uk> added the comment:

    I found source code for some other projects handling the same data. They
    all seem to agree that it should be 1:

    • Golang's zip reading code: https://github.com/golang/go/blob/
      f7ac70a56604033e2b1abc921d3f0f6afc85a7b3/src/archive/zip/
      reader.go#L536-L538

    • A C contrib file with zlib: https://github.com/madler/zlib/blob/
      cacf7f1d4e3d44d871b605da3b647f07d718623f/contrib/minizip/zip.c#L620-L624

    • Code from Info-ZIP, which is used by many Linux distros, is a bit less
      clear, but it has an illuminating comment:

      if ((G.ecrec.number_this_disk != 0xFFFF) &&
      (G.ecrec.number_this_disk != ecloc64_total_disks - 1)) {
      /* Note: For some unknown reason, the developers at PKWARE decided to
      store the "zip64 total disks" value as a counter starting from 1,
      whereas all other "split/span volume" related fields use 0-based
      volume numbers. Sigh... */

    So I think you've got an invalid zip file. If it's otherwise valid, there
    might be a case for Python tolerating that particular mistake. But it's
    probably better to fix whatever is making the incorrect zip file, because
    other tools are also going to reject it.

    ----------


    Python tracker <report@bugs.python.org>
    <https://bugs.python.org/issue22102\>


    @takluyver
    Copy link
    Mannequin

    takluyver mannequin commented Mar 12, 2018

    If every Windows 7 computer is generating zipfiles which are invalid in this way, that would be a pretty strong argument for Python (and other tools) to accept it. But if that was the case, I would also expect that there would be many more issues about it.

    Are the files you're compressing large (multi-GB)? Python only uses the zip64 format when the files are too big for the older zip format; maybe Windows is doing the same. Even in that case, I'm still surprised that more people don't hit it.

    @GuillaumeCarre
    Copy link
    Mannequin Author

    GuillaumeCarre mannequin commented Mar 12, 2018

    Yes these were pretty large zip 30 to 60Gb with thousands of small files in
    them I've fixed locally on our servers and we've been happy even after
    accepting similar sized files from linux machine.
    I'm also quite surprised about this not being reported by others.

    On Mon, Mar 12, 2018 at 9:01 AM, Thomas Kluyver <report@bugs.python.org>
    wrote:

    Thomas Kluyver <thomas@kluyver.me.uk> added the comment:

    If every Windows 7 computer is generating zipfiles which are invalid in
    this way, that would be a pretty strong argument for Python (and other
    tools) to accept it. But if that was the case, I would also expect that
    there would be many more issues about it.

    Are the files you're compressing large (multi-GB)? Python only uses the
    zip64 format when the files are too big for the older zip format; maybe
    Windows is doing the same. Even in that case, I'm still surprised that more
    people don't hit it.

    ----------


    Python tracker <report@bugs.python.org>
    <https://bugs.python.org/issue22102\>


    @RamseyKant
    Copy link
    Mannequin

    RamseyKant mannequin commented Apr 13, 2019

    I would second this PR. The Win32 API that creates ZIP64 files produces ZIP64 files with the "diskno" as 0 and "disks" as 0 (instead of "1" as indicated by the spec).

    @RamseyKant RamseyKant mannequin added 3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes labels Apr 13, 2019
    @tirkarthi tirkarthi removed 3.7 (EOL) end of life 3.9 only security fixes labels Apr 14, 2019
    @akuchling
    Copy link
    Member

    I also ran across this issue today, where the 'disks' value in a Zip file is 0. I'm trying to find out what software was used to create them, but it's quite plausible that it's Windows as Ramsey Kant suggests. So I think this fix should get applied to 3.8 and 3.7. Would it help if I produced a patch?

    The PKWare Zip specification that takluyver links above has been updated -- it now has an April 29th updated -- but none of the changes are relevant to this.

    Interestingly, the 'ditto' command on MacOS X (which can also unpack zip files) doesn't complain about the disk number either. I was unable to figure out where the source code for ditto is; I couldn't find it on opensource.apple.com.

    @akuchling
    Copy link
    Member

    Oh, I missed that there was already a patch. BTW, I found two dissections of zip files that also show disk numbers of 0: the one at https://rzymek.github.io/post/excel-zip64/ is exploring an Excel Zip issue, and the forensic tutorial at https://users.cs.jmu.edu/buchhofp/forensics/formats/pkzip.html is discussing the format.

    @csabella
    Copy link
    Contributor

    New changeset ab0716e by Cheryl Sabella (Francisco Facioni) in branch 'master':
    bpo-22102: Fixes zip files with disks set to 0 (GH-5985)
    ab0716e

    @csabella
    Copy link
    Contributor

    @Guillaume.Carre, thank you for the report and @fran6co, thank you for the contribution.

    @miss-islington
    Copy link
    Contributor

    New changeset 0eb6999 by Miss Islington (bot) in branch '3.7':
    bpo-22102: Fixes zip files with disks set to 0 (GH-5985)
    0eb6999

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants