Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError in zipfile.ZipFile #83245

Closed
jvoisin mannequin opened this issue Dec 16, 2019 · 8 comments
Closed

ValueError in zipfile.ZipFile #83245

jvoisin mannequin opened this issue Dec 16, 2019 · 8 comments
Labels
3.9 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@jvoisin
Copy link
Mannequin

jvoisin mannequin commented Dec 16, 2019

BPO 39064
Nosy @serhiy-storchaka, @iritkatriel, @dignissimus
PRs
  • bpo-39064: make ZipFile raise BadZipFile instead of ValueError when reading a corrupt file #30863
  • gh-83245: Raise BadZipFile instead of ValueError when reading a corrupt file #32291
  • Files
  • crash-4da08e9ababa495ac51ecad588fd61081a66b5bb6e7a0e791f44907fa274ec62
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2019-12-16.12:58:42.379>
    labels = ['type-bug', 'library', '3.9']
    title = 'ValueError in zipfile.ZipFile'
    updated_at = <Date 2022-04-03.18:36:35.140>
    user = 'https://bugs.python.org/jvoisin'

    bugs.python.org fields:

    activity = <Date 2022-04-03.18:36:35.140>
    actor = 'sam_ezeh'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2019-12-16.12:58:42.379>
    creator = 'jvoisin'
    dependencies = []
    files = ['48782']
    hgrepos = []
    issue_num = 39064
    keywords = ['patch']
    message_count = 8.0
    messages = ['358484', '410707', '410760', '411523', '416632', '416634', '416635', '416636']
    nosy_count = 4.0
    nosy_names = ['serhiy.storchaka', 'jvoisin', 'iritkatriel', 'sam_ezeh']
    pr_nums = ['30863', '32291']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue39064'
    versions = ['Python 3.9']

    @jvoisin
    Copy link
    Mannequin Author

    jvoisin mannequin commented Dec 16, 2019

    The attached file produces the following stacktrace when opened via zipfile.ZipFile, on Python 3.7.5rc1:

    $ cat ziprepro.py 
    import zipfile
    import sys
    
    zipfile.ZipFile(sys.argv[1])
    
    $ python3 ziprepro.py crash-4da08e9ababa495ac51ecad588fd61081a66b5bb6e7a0e791f44907fa274ec62
    Traceback (most recent call last):
      File "ziprepro.py", line 4, in <module>
        zipfile.ZipFile(sys.argv[1])
      File "/usr/lib/python3.7/zipfile.py", line 1225, in __init__
        self._RealGetContents()
      File "/usr/lib/python3.7/zipfile.py", line 1310, in _RealGetContents
        fp.seek(self.start_dir, 0)
    ValueError: cannot fit 'int' into an offset-sized integer
    

    The ValueError exception isn't documented as a possible exception when using zipfile.ZipFile ( https://docs.python.org/3/library/tarfile.html ).

    @jvoisin jvoisin mannequin added 3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Dec 16, 2019
    @iritkatriel
    Copy link
    Member

    It's unlikely that anyone will download a binary from bpo and open it. Can you help us reproduce the issue without that?

    First question is whether you can reproduce this on a version of python that is still in maintenance - 3.9 or higher?

    @jvoisin
    Copy link
    Mannequin Author

    jvoisin mannequin commented Jan 17, 2022

    Yes, I can reproduce it:

    $ python3 --version
    Python 3.9.9
    
    $ python3.9 ziprepo.py ./crash-4da08e9ababa495ac51ecad588fd61081a66b5bb6e7a0e791f44907fa274ec62 
    Traceback (most recent call last):
      File "/home/jvoisin/Downloads/ziprepo.py", line 4, in <module>
        zipfile.ZipFile(sys.argv[1])
      File "/usr/lib/python3.9/zipfile.py", line 1257, in __init__
        self._RealGetContents()
      File "/usr/lib/python3.9/zipfile.py", line 1342, in _RealGetContents
        fp.seek(self.start_dir, 0)
    ValueError: cannot fit 'int' into an offset-sized integer
    $
    

    It's unlikely that anyone will download a binary from bpo and open it. Can you help us reproduce the issue without that?

    The binary is a corrupted zip file to open with zipfile.ZipFile(), it can't be executed on its own.

    @iritkatriel iritkatriel added 3.9 only security fixes and removed 3.7 (EOL) end of life labels Jan 17, 2022
    @iritkatriel
    Copy link
    Member

    It's easy enough to convert the exception type (see patch), but I don't know how to write a unit test for this.

    @serhiy-storchaka
    Copy link
    Member

    Try to create a normal ZIP file (it can be empty), then try to set some byte to FF (or a pair of bytes to FFFF, or 4 consequent bytes to FFFFFFFF, until you get the exactly same error). Then you can just add the binary dump of that file in tests.

    @dignissimus
    Copy link
    Mannequin

    dignissimus mannequin commented Apr 3, 2022

    One way of doing this is by making the central directory offset negative by first taking the zip file containing just an EOCD record and then listing the total size of the central directory records as positive.

    Python 3.11.0a4+ (heads/bpo-39064:eb1935dacf, Apr  3 2022, 19:09:53) [GCC 11.1.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import zipfile
    >>> import io
    >>> b = [80, 75, 5, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    >>> b[12] = 1
    >>> f = io.BytesIO(bytes(b))
    >>> zipfile.ZipFile(f)
    Traceback (most recent call last):
      File "/run/media/sam/OS/Git/cpython/Lib/zipfile.py", line 1370, in _RealGetContents
        fp.seek(self.start_dir, 0)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
    ValueError: negative seek value -1
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/run/media/sam/OS/Git/cpython/Lib/zipfile.py", line 1284, in __init__
        self._RealGetContents()
        ^^^^^^^^^^^^^^^^^^^^^^^
      File "/run/media/sam/OS/Git/cpython/Lib/zipfile.py", line 1372, in _RealGetContents
        raise BadZipFile("Bad offset for central directory")
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    zipfile.BadZipFile: Bad offset for central directory
    >>> 
    

    @iritkatriel
    Copy link
    Member

    Sam, can you put that in a PR please?

    @dignissimus
    Copy link
    Mannequin

    dignissimus mannequin commented Apr 3, 2022

    Yes, of course.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    serhiy-storchaka added a commit that referenced this issue May 23, 2022
    …pt ZIP file (GH-32291)
    
    Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 23, 2022
    … corrupt ZIP file (pythonGH-32291)
    
    Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
    (cherry picked from commit 202ed25)
    
    Co-authored-by: Sam Ezeh <sam.z.ezeh@gmail.com>
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 23, 2022
    … corrupt ZIP file (pythonGH-32291)
    
    Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
    (cherry picked from commit 202ed25)
    
    Co-authored-by: Sam Ezeh <sam.z.ezeh@gmail.com>
    miss-islington added a commit that referenced this issue May 25, 2022
    …a corrupt ZIP file (GH-32291) (GH-93141)
    
    Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
    (cherry picked from commit 202ed25)
    
    
    Co-authored-by: Sam Ezeh <sam.z.ezeh@gmail.com>
    
    Automerge-Triggered-By: GH:serhiy-storchaka
    miss-islington added a commit that referenced this issue May 25, 2022
    …a corrupt ZIP file (GH-32291) (GH-93140)
    
    Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
    (cherry picked from commit 202ed25)
    
    
    Co-authored-by: Sam Ezeh <sam.z.ezeh@gmail.com>
    
    Automerge-Triggered-By: GH:serhiy-storchaka
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    Status: Done
    Development

    No branches or pull requests

    2 participants