Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zipfile.ZipFile is closed when zipfile.Path is closed #88804

Closed
christian-steinmeyer mannequin opened this issue Jul 14, 2021 · 14 comments
Closed

zipfile.ZipFile is closed when zipfile.Path is closed #88804

christian-steinmeyer mannequin opened this issue Jul 14, 2021 · 14 comments
Labels
3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@christian-steinmeyer
Copy link
Mannequin

christian-steinmeyer mannequin commented Jul 14, 2021

BPO 44638
Nosy @jaraco, @miss-islington, @tirkarthi, @jdevries3133, @christian-steinmeyer
PRs
  • bpo-44638: Add a reference to the zipp project and hint as to how to use it. #27188
  • Superseder
  • bpo-40564: Using zipfile.Path with several files prematurely closes zip
  • Files
  • zipfile.zip
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-07-16.01:12:12.249>
    created_at = <Date 2021-07-14.14:48:27.833>
    labels = ['3.8', 'type-bug', 'library']
    title = 'zipfile.ZipFile is closed when zipfile.Path is closed'
    updated_at = <Date 2021-07-16.13:15:04.558>
    user = 'https://github.com/christian-steinmeyer'

    bugs.python.org fields:

    activity = <Date 2021-07-16.13:15:04.558>
    actor = 'miss-islington'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-07-16.01:12:12.249>
    closer = 'jaraco'
    components = ['Library (Lib)']
    creation = <Date 2021-07-14.14:48:27.833>
    creator = 'christian.steinmeyer'
    dependencies = []
    files = ['50150']
    hgrepos = []
    issue_num = 44638
    keywords = []
    message_count = 13.0
    messages = ['397483', '397485', '397520', '397524', '397590', '397591', '397592', '397593', '397594', '397595', '397596', '397601', '397620']
    nosy_count = 5.0
    nosy_names = ['jaraco', 'miss-islington', 'xtreak', 'jack__d', 'christian.steinmeyer']
    pr_nums = ['27188']
    priority = 'normal'
    resolution = 'duplicate'
    stage = 'resolved'
    status = 'closed'
    superseder = '40564'
    type = 'behavior'
    url = 'https://bugs.python.org/issue44638'
    versions = ['Python 3.8']

    @christian-steinmeyer
    Copy link
    Mannequin Author

    christian-steinmeyer mannequin commented Jul 14, 2021

    When executing the code below with the attached zip file (or any other that has one or more files directly at root level), I get a "ValueError: seek of closed file". It seems, the zipfile handle being part of the TestClass instance is being closed, when the zipfile.Path is garbage collected, when it is no longer referenced. Since zipfile.Path even takes a zipfile.Zipfile as an argument, I don't think it is intended? It surprised me at least.

    import zipfile
    
    
    class TestClass:
        def __init__(self, path):
            self.zip_file = zipfile.ZipFile(path)
    
        def iter_dir(self):
            return [each.name for each in zipfile.Path(self.zip_file).iterdir()]
    
        def read(self, filename):
            with self.zip_file.open(filename) as file:
                print(file.read())
    
    root = "zipfile.zip"
    test = TestClass(root)
    files = test.iter_dir()
    test.read(files[0])
    

    @christian-steinmeyer christian-steinmeyer mannequin added 3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Jul 14, 2021
    @tirkarthi
    Copy link
    Member

    This seems similar to https://bugs.python.org/issue40564

    @jdevries3133
    Copy link
    Mannequin

    jdevries3133 mannequin commented Jul 15, 2021

    I'm not able to reproduce this on my machine; the script runs without any issue.

    the TestClass instance is being closed

    What do you mean by this statement? You aren't doing anything to TestClass or its instance ("test") in this script. They remain in scope, so they will always be referenced.

    @christian-steinmeyer
    Copy link
    Mannequin Author

    christian-steinmeyer mannequin commented Jul 15, 2021

    I work on macOS 11.4 (20F71) (Kernel Version: Darwin 20.5.0).
    My python version is 3.8.9 and zipp is at 3.5.0 (but 3.4.1 behaves the same for me).
    For me, this is behavior is reproducible.

    Let me try to clarify what I mean.

    test = TestClass(root)  # this creates a zipfile handle  (an instance of zipfile.ZipFile) at test.zip_file
    
    files = test.iter_dir()  # this creates multiple instances of zipfile.Path() as part of the list comprehension and these are deferenced afterwards. I found that test.zip_file.fp is closed after this line executes, which to me suggests that the closing of the zipfile.Path also closes the zipfile.ZipFile that was used to create the zipfile.Path.
    
    test.read(files[0])  # this should in theory try to read from the test.zip_file for the first time, but fails because it is closed as per the above.
    Here is the full stack trace:
    Traceback (most recent call last):
      File "test.py", line 20, in <module>
        test.read(files[0])
      File "test.py", line 12, in read
        with self.zip_file.open(filename) as file:
      File "/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/zipfile.py", line 1530, in open
        fheader = zef_file.read(sizeFileHeader)
      File "/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/zipfile.py", line 763, in read
        self._file.seek(self._pos)
    ValueError: seek of closed file

    @jaraco
    Copy link
    Member

    jaraco commented Jul 16, 2021

    I was able to replicate the error using the script as posted:

    draft $ cat > issue44638.py
    import zipfile
    
    
    class TestClass:
        def __init__(self, path):
            self.zip_file = zipfile.ZipFile(path)
    
        def iter_dir(self):
            return [each.name for each in zipfile.Path(self.zip_file).iterdir()]
    
        def read(self, filename):
            with self.zip_file.open(filename) as file:
                print(file.read())
    
    root = "zipfile.zip"
    test = TestClass(root)
    files = test.iter_dir()
    test.read(files[0])
    draft $ python -m zipfile -c zipfile.zip issue44638.py
    draft $ python issue44638.py
    Traceback (most recent call last):
      File "/Users/jaraco/draft/issue44638.py", line 18, in <module>
        test.read(files[0])
      File "/Users/jaraco/draft/issue44638.py", line 12, in read
        with self.zip_file.open(filename) as file:
      File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/zipfile.py", line 1518, in open
        fheader = zef_file.read(sizeFileHeader)
      File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/zipfile.py", line 741, in read
        self._file.seek(self._pos)
    ValueError: seek of closed file
    

    @jaraco
    Copy link
    Member

    jaraco commented Jul 16, 2021

    Here's a much simpler repro that avoids the class construction but triggers the same error:

    import zipfile
    
    
    zip_file = zipfile.ZipFile('zipfile.zip')
    names = [each.name for each in zipfile.Path(zip_file).iterdir()]
    with zip_file.open(names[0]) as file:
        print(file.read())
    

    @jaraco
    Copy link
    Member

    jaraco commented Jul 16, 2021

    Even simpler:

    import zipfile
    
    
    zip_file = zipfile.ZipFile('zipfile.zip')
    names = [each.name for each in zipfile.Path(zip_file).iterdir()]
    zip_file.open(names[0])
    

    @jaraco
    Copy link
    Member

    jaraco commented Jul 16, 2021

    This also reproduces the failure:

    zip_file = zipfile.ZipFile('zipfile.zip')
    path = zipfile.Path(zip_file)
    name = zip_file.namelist()[0]
    del path
    zip_file.open(name)
    

    Removing del path bypasses the issue. Something about the destructor for zipfile.Path is causing the closing of the handle for zip_file.

    @jaraco
    Copy link
    Member

    jaraco commented Jul 16, 2021

    Even simpler:

    zip_file = zipfile.ZipFile('zipfile.zip')
    name = zip_file.namelist()[0]
    zipfile.Path(zip_file)
    zip_file.open(name)
    

    @jaraco
    Copy link
    Member

    jaraco commented Jul 16, 2021

    Changing the repro to:

    import zipfile
    
    try:
        import zipp
    except ImportError:
        import zipfile as zipp
    
    zip_file = zipfile.ZipFile('zipfile.zip')
    name = zip_file.namelist()[0]
    zipp.Path(zip_file)
    zip_file.open(name)
    

    I'm able now to test against zipfile or zipp. And I notice that the issue occurs only on zipp<3.2 or Python<3.10.

    draft $ pip-run -q 'zipp<3.3' -- issue44638.py
    draft $ pip-run -q 'zipp<3.2' -- issue44638.py
    Traceback (most recent call last):
      File "/Users/jaraco/draft/issue44638.py", line 11, in <module>
        zip_file.open(name)
      File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/zipfile.py", line 1518, in open
        fheader = zef_file.read(sizeFileHeader)
      File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/zipfile.py", line 741, in read
        self._file.seek(self._pos)
    ValueError: seek of closed file
    
    draft $ python3.10 issue44638.py
    draft $ python3.9 issue44638.py
    Traceback (most recent call last):
      File "/Users/jaraco/draft/issue44638.py", line 11, in <module>
        zip_file.open(name)
      File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/zipfile.py", line 1518, in open
        fheader = zef_file.read(sizeFileHeader)
      File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/zipfile.py", line 741, in read
        self._file.seek(self._pos)
    ValueError: seek of closed file
    

    Looking at the changelog (https://zipp.readthedocs.io/en/latest/history.html#v3-2-0), it's clear now that this issue is a duplicate of bpo-40564 and the problem goes away using the original repro and Python 3.10:

    draft $ cat > issue44638.py
    import zipfile
    
    
    class TestClass:
        def __init__(self, path):
            self.zip_file = zipfile.ZipFile(path)
    
        def iter_dir(self):
            return [each.name for each in zipfile.Path(self.zip_file).iterdir()]
    
        def read(self, filename):
            with self.zip_file.open(filename) as file:
                print(file.read())
    
    root = "zipfile.zip"
    test = TestClass(root)
    files = test.iter_dir()
    test.read(files[0])
    draft $ python3.10 issue44638.py
    b'import zipfile\n\n\nclass TestClass:\n    def __init__(self, path):\n        self.zip_file = zipfile.ZipFile(path)\n\n    def iter_dir(self):\n        return [each.name for each in zipfile.Path(self.zip_file).iterdir()]\n\n    def read(self, filename):\n        with self.zip_file.open(filename) as file:\n            print(file.read())\n\nroot = "zipfile.zip"\ntest = TestClass(root)\nfiles = test.iter_dir()\ntest.read(files[0])\n'
    
    

    The solution is to use zipp>=3.2 or Python 3.10.

    @jaraco jaraco closed this as completed Jul 16, 2021
    @jaraco jaraco closed this as completed Jul 16, 2021
    @jaraco
    Copy link
    Member

    jaraco commented Jul 16, 2021

    My python version is 3.8.9 and zipp is at 3.5.0 (but 3.4.1 behaves the same for me).

    It's not enough to have zipp 3.5.0. You need to use zipp.Path over zipfile.Path.

    @christian-steinmeyer
    Copy link
    Mannequin Author

    christian-steinmeyer mannequin commented Jul 16, 2021

    Thank you for the in depth look Jason!
    Especially that last comment was very useful to me. Perhaps it would make sense to add something like this to the documentation of zipfile.
    I'm not sure what would be the best hint, but perhaps in zipfile.Path's documentation a hint that zipp.Path can be used to access newer functionality even for older python versions (if what I understand is correct) might be useful to others as well. Because as of now, I cannot find an equivalent hint yet.

    @miss-islington
    Copy link
    Contributor

    New changeset 29358e9 by Jason R. Coombs in branch 'main':
    bpo-44638: Add a reference to the zipp project and hint as to how to use it. (GH-27188)
    29358e9

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @smheidrich
    Copy link
    Contributor

    If a backport isn't desired (as mentioned in #84744 (comment)), would it at least be possible to put a note about this bug into the zipfile.Path docs of Python versions < 3.10? Would've saved me a couple of hours just now and I'm probably not the only one :-)

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants