classification
Title: Using zipfile.Path with several files prematurely closes zip
Type: behavior Stage:
Components: IO Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: bustawin, jaraco, xtreak
Priority: normal Keywords:

Created on 2020-05-08 15:47 by bustawin, last changed 2020-09-09 23:04 by jaraco.

Files
File name Uploaded Description Edit
zipfile.zip bustawin, 2020-05-08 15:47
Messages (8)
msg368449 - (view) Author: Xavier (bustawin) Date: 2020-05-08 15:47
Given a .zip file with more than one inner file, when reading those inner files using zipfile.Path the zip module closes the .zip file prematurely, causing an error.

Given the following code (example zipfile is attached, but any should work).

with zipfile.ZipFile('foo.zip') as file:
    for name in ['file-1.txt', 'file-2.txt']:
        p = zipfile.Path(file, name)
        with p.open() as inner:
            print(list(inner)) # ValueError: seek of closed file

But the following code does not fail:

with zipfile.ZipFile('foo.zip') as file:
    for name in ['file-1.txt', 'file-2.txt']:
        with file.open(name) as inner:
            print(list(inner)) # Works!


Python 3.8.2 macOS 10.15.4.
msg375919 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2020-08-26 08:12
Bisection show this to be caused due to e5bd73632e77dc5ab0cab77e48e94ca5e354be8a . See also issue41640 for a similar report.
msg375961 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-08-26 21:22
I've attempted to replicate the issue in the [zipp](https://github.com/jaraco/zipp) test suite with this test:

```diff
diff --git a/test_zipp.py b/test_zipp.py
index a6fbf39..dc7c8aa 100644
--- a/test_zipp.py
+++ b/test_zipp.py
@@ -259,3 +259,10 @@ class TestPath(unittest.TestCase):
     def test_implied_dirs_performance(self):
         data = ['/'.join(string.ascii_lowercase + str(n)) for n in range(10000)]
         zipp.CompleteDirs._implied_dirs(data)
+
+    def test_read_does_not_close(self):
+        for alpharep in self.zipfile_alpharep():
+            for rep in range(2):
+                p_ = zipp.Path(alpharep, 'a.txt')
+                with p_.open() as inner:
+                    print(list(inner))
```

But the test passes.
msg375962 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-08-26 21:44
I am able to replicate the failure using the ondisk fixture:

```diff
diff --git a/test_zipp.py b/test_zipp.py
index a6fbf39..539d0a9 100644
--- a/test_zipp.py
+++ b/test_zipp.py
@@ -259,3 +259,11 @@ class TestPath(unittest.TestCase):
     def test_implied_dirs_performance(self):
         data = ['/'.join(string.ascii_lowercase + str(n)) for n in range(10000)]
         zipp.CompleteDirs._implied_dirs(data)
+
+    def test_read_does_not_close(self):
+        for alpharep in self.zipfile_ondisk():
+            with zipfile.ZipFile(alpharep) as file:
+                for rep in range(2):
+                    p_ = zipp.Path(file, 'a.txt')
+                    with p_.open() as inner:
+                        print(list(inner))
```
msg375963 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-08-26 21:55
I suspect the issue lies in how the CompleteDirs.make [replaces one instance with another](https://github.com/jaraco/zipp/blob/8ad959e61cd4be40baab93528775c2bf03c8f4e1/zipp.py#L112-L114) in order to provide a complete list of implied directories and to memoize the names lookup.

Because constructing a zipfile.Path object around a zipfile.ZipFile copies the underlying state, closing one will have the affect of closing the other.

I believe this issue is the same root issue as issue41350.
msg375964 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-08-26 22:06
I see a few options here:

- Implement CompleteDirs/FastLookup as adapters instead of subclasses, allowing the original ZipFile object to represent the state in a single place. This approach would likely be slower due to the indirection on all operations through the wrapper.
- Instead of constructing a new object and copying the state, CompleteDirs.make could mutate the existing ZipFile class, replacing `source.__class__` with the new class. This approach is messy and the caller would still need to be aware that this change could be applied to the zipfile object.
- Consider this use-case unsupported and document that any ZipFile object passed into Path is no longer viable and should not be referenced for another purpose.
- Eliminate the class-based performance optimizations and replace them with some functional/imperative form. This approach may provide less separation of concerns.
- Disable the close-on-delete behavior for the subclasses when state is copied from another.
msg375965 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-08-26 22:26
Implementing that last option:

```
diff --git a/zipp.py b/zipp.py
index 697d4a9..f244cba 100644
--- a/zipp.py
+++ b/zipp.py
@@ -111,6 +111,7 @@ class CompleteDirs(zipfile.ZipFile):
 
         res = cls.__new__(cls)
         vars(res).update(vars(source))
+        res.close = lambda: None
         return res
 
 
```

Does appear to address the issue. I'm not super happy about the implementation, though.
msg376668 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-09-09 23:04
In jaraco/zipp, I've implemented three of the options above:

- https://github.com/jaraco/zipp/tree/bugfix/bpo-40564-option-1
- https://github.com/jaraco/zipp/tree/bugfix/bpo-40564-option-2
- https://github.com/jaraco/zipp/tree/bugfix/bpo-40564-option-5
History
Date User Action Args
2020-09-09 23:04:27jaracosetmessages: + msg376668
2020-08-29 12:59:33xtreaklinkissue41640 superseder
2020-08-26 22:26:58jaracosetmessages: + msg375965
2020-08-26 22:06:56jaracosetmessages: + msg375964
2020-08-26 21:55:17jaracosetmessages: + msg375963
2020-08-26 21:44:02jaracosetmessages: + msg375962
2020-08-26 21:22:55jaracosetmessages: + msg375961
2020-08-26 08:12:56xtreaksetnosy: + xtreak
messages: + msg375919
2020-05-08 16:52:16brett.cannonsetnosy: + jaraco
2020-05-08 15:47:01bustawincreate