This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Using zipfile.Path with several files prematurely closes zip
Type: behavior Stage: resolved
Components: IO Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: bustawin, jaraco, xtreak
Priority: normal Keywords: patch

Created on 2020-05-08 15:47 by bustawin, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
zipfile.zip bustawin, 2020-05-08 15:47
Pull Requests
URL Status Linked Edit
PR 22371 merged jaraco, 2020-09-23 03:19
Messages (12)
msg368449 - (view) Author: Xavier (bustawin) Date: 2020-05-08 15:47
Given a .zip file with more than one inner file, when reading those inner files using zipfile.Path the zip module closes the .zip file prematurely, causing an error.

Given the following code (example zipfile is attached, but any should work).

with zipfile.ZipFile('foo.zip') as file:
    for name in ['file-1.txt', 'file-2.txt']:
        p = zipfile.Path(file, name)
        with p.open() as inner:
            print(list(inner)) # ValueError: seek of closed file

But the following code does not fail:

with zipfile.ZipFile('foo.zip') as file:
    for name in ['file-1.txt', 'file-2.txt']:
        with file.open(name) as inner:
            print(list(inner)) # Works!


Python 3.8.2 macOS 10.15.4.
msg375919 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2020-08-26 08:12
Bisection show this to be caused due to e5bd73632e77dc5ab0cab77e48e94ca5e354be8a . See also issue41640 for a similar report.
msg375961 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-08-26 21:22
I've attempted to replicate the issue in the [zipp](https://github.com/jaraco/zipp) test suite with this test:

```diff
diff --git a/test_zipp.py b/test_zipp.py
index a6fbf39..dc7c8aa 100644
--- a/test_zipp.py
+++ b/test_zipp.py
@@ -259,3 +259,10 @@ class TestPath(unittest.TestCase):
     def test_implied_dirs_performance(self):
         data = ['/'.join(string.ascii_lowercase + str(n)) for n in range(10000)]
         zipp.CompleteDirs._implied_dirs(data)
+
+    def test_read_does_not_close(self):
+        for alpharep in self.zipfile_alpharep():
+            for rep in range(2):
+                p_ = zipp.Path(alpharep, 'a.txt')
+                with p_.open() as inner:
+                    print(list(inner))
```

But the test passes.
msg375962 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-08-26 21:44
I am able to replicate the failure using the ondisk fixture:

```diff
diff --git a/test_zipp.py b/test_zipp.py
index a6fbf39..539d0a9 100644
--- a/test_zipp.py
+++ b/test_zipp.py
@@ -259,3 +259,11 @@ class TestPath(unittest.TestCase):
     def test_implied_dirs_performance(self):
         data = ['/'.join(string.ascii_lowercase + str(n)) for n in range(10000)]
         zipp.CompleteDirs._implied_dirs(data)
+
+    def test_read_does_not_close(self):
+        for alpharep in self.zipfile_ondisk():
+            with zipfile.ZipFile(alpharep) as file:
+                for rep in range(2):
+                    p_ = zipp.Path(file, 'a.txt')
+                    with p_.open() as inner:
+                        print(list(inner))
```
msg375963 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-08-26 21:55
I suspect the issue lies in how the CompleteDirs.make [replaces one instance with another](https://github.com/jaraco/zipp/blob/8ad959e61cd4be40baab93528775c2bf03c8f4e1/zipp.py#L112-L114) in order to provide a complete list of implied directories and to memoize the names lookup.

Because constructing a zipfile.Path object around a zipfile.ZipFile copies the underlying state, closing one will have the affect of closing the other.

I believe this issue is the same root issue as issue41350.
msg375964 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-08-26 22:06
I see a few options here:

- Implement CompleteDirs/FastLookup as adapters instead of subclasses, allowing the original ZipFile object to represent the state in a single place. This approach would likely be slower due to the indirection on all operations through the wrapper.
- Instead of constructing a new object and copying the state, CompleteDirs.make could mutate the existing ZipFile class, replacing `source.__class__` with the new class. This approach is messy and the caller would still need to be aware that this change could be applied to the zipfile object.
- Consider this use-case unsupported and document that any ZipFile object passed into Path is no longer viable and should not be referenced for another purpose.
- Eliminate the class-based performance optimizations and replace them with some functional/imperative form. This approach may provide less separation of concerns.
- Disable the close-on-delete behavior for the subclasses when state is copied from another.
msg375965 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-08-26 22:26
Implementing that last option:

```
diff --git a/zipp.py b/zipp.py
index 697d4a9..f244cba 100644
--- a/zipp.py
+++ b/zipp.py
@@ -111,6 +111,7 @@ class CompleteDirs(zipfile.ZipFile):
 
         res = cls.__new__(cls)
         vars(res).update(vars(source))
+        res.close = lambda: None
         return res
 
 
```

Does appear to address the issue. I'm not super happy about the implementation, though.
msg376668 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-09-09 23:04
In jaraco/zipp, I've implemented three of the options above:

- https://github.com/jaraco/zipp/tree/bugfix/bpo-40564-option-1
- https://github.com/jaraco/zipp/tree/bugfix/bpo-40564-option-2
- https://github.com/jaraco/zipp/tree/bugfix/bpo-40564-option-5
msg377323 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-09-22 14:09
I've also created this alternative to option 2:

- https://github.com/jaraco/zipp/tree/bugfix/bpo-40564-mixin

This alternative approach uses a mix-in rather than subclasses, creating a new class on-demand. I was hoping this approach would enable just augmenting the instance rather than affecting `source.__class__`, but when I got to the implementation, I found that `source.__class__` had to be mutated regardless.

This approach does have an advantage over option 2 in that it would support other ZipFile subclasses for source. It has the disadvantage in that it creates a new subclass for every instance created.

I've thought about it a lot and while I'm not particularly happy with any of the approaches, I'm leaning toward option 2.
msg377384 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-09-23 13:29
I've released zipp 3.2.0 that includes a fix for this issue (option 2). Please test it out.

Because this change affects the user's expectation about the effect of what's passed in, I'm hesitant to backport it to 3.8 and 3.9. It's too late to go into 3.9.0, so the earliest it can appear is in 3.9.1.

Please advise - does the undesirable behavior warrant a bugfix backport, or is it sufficient to direct users impacted by this issue prior to Python 3.10 to use the `zipp` backport?
msg377883 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-10-03 14:58
New changeset ebbe8033b1c61854c4b623aaf9c3e170d179f875 by Jason R. Coombs in branch 'master':
bpo-40564: Avoid copying state from extant ZipFile. (GH-22371)
https://github.com/python/cpython/commit/ebbe8033b1c61854c4b623aaf9c3e170d179f875
msg377884 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-10-03 14:59
Fix merged into master. Please use zipp 3.2.0 on Python 3.9 and earlier for the improved behavior or report back here if a backport is needed (and why).
History
Date User Action Args
2022-04-11 14:59:30adminsetgithub: 84744
2021-11-27 16:55:03iritkatriellinkissue23991 superseder
2021-07-16 01:12:12jaracolinkissue44638 superseder
2021-04-24 19:09:29jaracolinkissue41350 superseder
2020-10-03 14:59:58jaracosetresolution: fixed
2020-10-03 14:59:40jaracosetstatus: open -> closed

stage: patch review -> resolved
messages: + msg377884
versions: + Python 3.10, - Python 3.8
2020-10-03 14:58:43jaracosetmessages: + msg377883
2020-09-23 13:29:58jaracosetmessages: + msg377384
2020-09-23 03:19:57jaracosetkeywords: + patch
stage: patch review
pull_requests: + pull_request21408
2020-09-22 14:09:07jaracosetmessages: + msg377323
2020-09-09 23:04:27jaracosetmessages: + msg376668
2020-08-29 12:59:33xtreaklinkissue41640 superseder
2020-08-26 22:26:58jaracosetmessages: + msg375965
2020-08-26 22:06:56jaracosetmessages: + msg375964
2020-08-26 21:55:17jaracosetmessages: + msg375963
2020-08-26 21:44:02jaracosetmessages: + msg375962
2020-08-26 21:22:55jaracosetmessages: + msg375961
2020-08-26 08:12:56xtreaksetnosy: + xtreak
messages: + msg375919
2020-05-08 16:52:16brett.cannonsetnosy: + jaraco
2020-05-08 15:47:01bustawincreate