classification
Title: zipfile.ZipFile is closed when zipfile.Path is closed
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.8
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Using zipfile.Path with several files prematurely closes zip
View: 40564
Assigned To: Nosy List: christian.steinmeyer, jack__d, jaraco, miss-islington, xtreak
Priority: normal Keywords:

Created on 2021-07-14 14:48 by christian.steinmeyer, last changed 2021-07-16 13:15 by miss-islington. This issue is now closed.

Files
File name Uploaded Description Edit
zipfile.zip christian.steinmeyer, 2021-07-14 14:48
Pull Requests
URL Status Linked Edit
PR 27188 merged jaraco, 2021-07-16 12:57
Messages (13)
msg397483 - (view) Author: Christian Steinmeyer (christian.steinmeyer) Date: 2021-07-14 14:48
When executing the code below with the attached zip file (or any other that has one or more files directly at root level), I get a "ValueError: seek of closed file". It seems, the zipfile handle being part of the `TestClass` instance is being closed, when the `zipfile.Path` is garbage collected, when it is no longer referenced. Since `zipfile.Path` even takes a `zipfile.Zipfile` as an argument, I don't think it is intended? It surprised me at least.


```
import zipfile


class TestClass:
    def __init__(self, path):
        self.zip_file = zipfile.ZipFile(path)

    def iter_dir(self):
        return [each.name for each in zipfile.Path(self.zip_file).iterdir()]

    def read(self, filename):
        with self.zip_file.open(filename) as file:
            print(file.read())

root = "zipfile.zip"
test = TestClass(root)
files = test.iter_dir()
test.read(files[0])
```
msg397485 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2021-07-14 15:26
This seems similar to https://bugs.python.org/issue40564
msg397520 - (view) Author: Jack DeVries (jack__d) * Date: 2021-07-15 00:43
I'm not able to reproduce this on my machine; the script runs without any issue.

> the `TestClass` instance is being closed

What do you mean by this statement? You aren't doing anything to TestClass or its instance ("test") in this script. They remain in scope, so they will always be referenced.
msg397524 - (view) Author: Christian Steinmeyer (christian.steinmeyer) Date: 2021-07-15 07:11
I work on macOS 11.4 (20F71) (Kernel Version: Darwin 20.5.0).
My python version is 3.8.9 and zipp is at 3.5.0 (but 3.4.1 behaves the same for me).
For me, this is behavior is reproducible.

Let me try to clarify what I mean. 

test = TestClass(root)  # this creates a zipfile handle  (an instance of zipfile.ZipFile) at test.zip_file

files = test.iter_dir()  # this creates multiple instances of zipfile.Path() as part of the list comprehension and these are deferenced afterwards. I found that test.zip_file.fp is closed after this line executes, which to me suggests that the closing of the zipfile.Path also closes the zipfile.ZipFile that was used to create the zipfile.Path.

test.read(files[0])  # this should in theory try to read from the test.zip_file for the first time, but fails because it is closed as per the above.

Here is the full stack trace:
Traceback (most recent call last):
  File "test.py", line 20, in <module>
    test.read(files[0])
  File "test.py", line 12, in read
    with self.zip_file.open(filename) as file:
  File "/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/zipfile.py", line 1530, in open
    fheader = zef_file.read(sizeFileHeader)
  File "/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/zipfile.py", line 763, in read
    self._file.seek(self._pos)
ValueError: seek of closed file
msg397590 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2021-07-16 00:26
I was able to replicate the error using the script as posted:

```
draft $ cat > issue44638.py
import zipfile


class TestClass:
    def __init__(self, path):
        self.zip_file = zipfile.ZipFile(path)

    def iter_dir(self):
        return [each.name for each in zipfile.Path(self.zip_file).iterdir()]

    def read(self, filename):
        with self.zip_file.open(filename) as file:
            print(file.read())

root = "zipfile.zip"
test = TestClass(root)
files = test.iter_dir()
test.read(files[0])
draft $ python -m zipfile -c zipfile.zip issue44638.py
draft $ python issue44638.py
Traceback (most recent call last):
  File "/Users/jaraco/draft/issue44638.py", line 18, in <module>
    test.read(files[0])
  File "/Users/jaraco/draft/issue44638.py", line 12, in read
    with self.zip_file.open(filename) as file:
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/zipfile.py", line 1518, in open
    fheader = zef_file.read(sizeFileHeader)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/zipfile.py", line 741, in read
    self._file.seek(self._pos)
ValueError: seek of closed file
```
msg397591 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2021-07-16 00:39
Here's a much simpler repro that avoids the class construction but triggers the same error:

```
import zipfile


zip_file = zipfile.ZipFile('zipfile.zip')
names = [each.name for each in zipfile.Path(zip_file).iterdir()]
with zip_file.open(names[0]) as file:
    print(file.read())
```
msg397592 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2021-07-16 00:40
Even simpler:

```
import zipfile


zip_file = zipfile.ZipFile('zipfile.zip')
names = [each.name for each in zipfile.Path(zip_file).iterdir()]
zip_file.open(names[0])
```
msg397593 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2021-07-16 00:55
This also reproduces the failure:

```
zip_file = zipfile.ZipFile('zipfile.zip')
path = zipfile.Path(zip_file)
name = zip_file.namelist()[0]
del path
zip_file.open(name)
```

Removing `del path` bypasses the issue. Something about the destructor for zipfile.Path is causing the closing of the handle for zip_file.
msg397594 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2021-07-16 01:00
Even simpler:

```
zip_file = zipfile.ZipFile('zipfile.zip')
name = zip_file.namelist()[0]
zipfile.Path(zip_file)
zip_file.open(name)
```
msg397595 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2021-07-16 01:11
Changing the repro to:

```
import zipfile

try:
    import zipp
except ImportError:
    import zipfile as zipp

zip_file = zipfile.ZipFile('zipfile.zip')
name = zip_file.namelist()[0]
zipp.Path(zip_file)
zip_file.open(name)
```

I'm able now to test against zipfile or zipp. And I notice that the issue occurs only on zipp<3.2 or Python<3.10.

```
draft $ pip-run -q 'zipp<3.3' -- issue44638.py
draft $ pip-run -q 'zipp<3.2' -- issue44638.py
Traceback (most recent call last):
  File "/Users/jaraco/draft/issue44638.py", line 11, in <module>
    zip_file.open(name)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/zipfile.py", line 1518, in open
    fheader = zef_file.read(sizeFileHeader)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/zipfile.py", line 741, in read
    self._file.seek(self._pos)
ValueError: seek of closed file
```

```
draft $ python3.10 issue44638.py
draft $ python3.9 issue44638.py
Traceback (most recent call last):
  File "/Users/jaraco/draft/issue44638.py", line 11, in <module>
    zip_file.open(name)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/zipfile.py", line 1518, in open
    fheader = zef_file.read(sizeFileHeader)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/zipfile.py", line 741, in read
    self._file.seek(self._pos)
ValueError: seek of closed file
```

Looking at the changelog (https://zipp.readthedocs.io/en/latest/history.html#v3-2-0), it's clear now that this issue is a duplicate of bpo-40564 and the problem goes away using the original repro and Python 3.10:

```
draft $ cat > issue44638.py
import zipfile


class TestClass:
    def __init__(self, path):
        self.zip_file = zipfile.ZipFile(path)

    def iter_dir(self):
        return [each.name for each in zipfile.Path(self.zip_file).iterdir()]

    def read(self, filename):
        with self.zip_file.open(filename) as file:
            print(file.read())

root = "zipfile.zip"
test = TestClass(root)
files = test.iter_dir()
test.read(files[0])
draft $ python3.10 issue44638.py
b'import zipfile\n\n\nclass TestClass:\n    def __init__(self, path):\n        self.zip_file = zipfile.ZipFile(path)\n\n    def iter_dir(self):\n        return [each.name for each in zipfile.Path(self.zip_file).iterdir()]\n\n    def read(self, filename):\n        with self.zip_file.open(filename) as file:\n            print(file.read())\n\nroot = "zipfile.zip"\ntest = TestClass(root)\nfiles = test.iter_dir()\ntest.read(files[0])\n'

```

The solution is to use zipp>=3.2 or Python 3.10.
msg397596 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2021-07-16 01:16
> My python version is 3.8.9 and zipp is at 3.5.0 (but 3.4.1 behaves the same for me).

It's not enough to have `zipp` 3.5.0. You need to use `zipp.Path` over `zipfile.Path`.
msg397601 - (view) Author: Christian Steinmeyer (christian.steinmeyer) Date: 2021-07-16 07:01
Thank you for the in depth look Jason!
Especially that last comment was very useful to me. Perhaps it would make sense to add something like this to the documentation of zipfile.
I'm not sure what would be the best hint, but perhaps in zipfile.Path's documentation a hint that zipp.Path can be used to access newer functionality even for older python versions (if what I understand is correct) might be useful to others as well. Because as of now, I cannot find an equivalent hint yet.
msg397620 - (view) Author: miss-islington (miss-islington) Date: 2021-07-16 13:15
New changeset 29358e93f2bb60983271c14ce4c2f3eab35a60ca by Jason R. Coombs in branch 'main':
bpo-44638: Add a reference to the zipp project and hint as to how to use it. (GH-27188)
https://github.com/python/cpython/commit/29358e93f2bb60983271c14ce4c2f3eab35a60ca
History
Date User Action Args
2021-07-16 13:15:04miss-islingtonsetnosy: + miss-islington
messages: + msg397620
2021-07-16 12:57:57jaracosetpull_requests: + pull_request25724
2021-07-16 07:01:22christian.steinmeyersetmessages: + msg397601
2021-07-16 01:16:45jaracosetmessages: + msg397596
2021-07-16 01:12:12jaracosetstatus: open -> closed
superseder: Using zipfile.Path with several files prematurely closes zip
resolution: duplicate
stage: resolved
2021-07-16 01:11:49jaracosetmessages: + msg397595
2021-07-16 01:00:15jaracosetmessages: + msg397594
2021-07-16 00:55:55jaracosetmessages: + msg397593
2021-07-16 00:40:57jaracosetmessages: + msg397592
2021-07-16 00:39:26jaracosetmessages: + msg397591
2021-07-16 00:26:56jaracosetmessages: + msg397590
2021-07-15 07:11:29christian.steinmeyersetmessages: + msg397524
2021-07-15 00:43:44jack__dsetnosy: + jack__d
messages: + msg397520
2021-07-14 15:26:08xtreaksetnosy: + jaraco, xtreak
messages: + msg397485
2021-07-14 14:48:27christian.steinmeyercreate