Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using zipfile.Path with several files prematurely closes zip #84744

Closed
bustawin mannequin opened this issue May 8, 2020 · 12 comments
Closed

Using zipfile.Path with several files prematurely closes zip #84744

bustawin mannequin opened this issue May 8, 2020 · 12 comments
Labels
3.10 only security fixes topic-IO type-bug An unexpected behavior, bug, or error

Comments

@bustawin
Copy link
Mannequin

bustawin mannequin commented May 8, 2020

BPO 40564
Nosy @jaraco, @tirkarthi, @bustawin
PRs
  • bpo-40564: Avoid copying state from extant ZipFile. #22371
  • Files
  • zipfile.zip
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-10-03.14:59:40.890>
    created_at = <Date 2020-05-08.15:47:01.274>
    labels = ['type-bug', '3.10', 'expert-IO']
    title = 'Using zipfile.Path with several files prematurely closes zip'
    updated_at = <Date 2020-10-03.14:59:58.354>
    user = 'https://github.com/bustawin'

    bugs.python.org fields:

    activity = <Date 2020-10-03.14:59:58.354>
    actor = 'jaraco'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-10-03.14:59:40.890>
    closer = 'jaraco'
    components = ['IO']
    creation = <Date 2020-05-08.15:47:01.274>
    creator = 'bustawin'
    dependencies = []
    files = ['49142']
    hgrepos = []
    issue_num = 40564
    keywords = ['patch']
    message_count = 12.0
    messages = ['368449', '375919', '375961', '375962', '375963', '375964', '375965', '376668', '377323', '377384', '377883', '377884']
    nosy_count = 3.0
    nosy_names = ['jaraco', 'xtreak', 'bustawin']
    pr_nums = ['22371']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue40564'
    versions = ['Python 3.10']

    @bustawin
    Copy link
    Mannequin Author

    bustawin mannequin commented May 8, 2020

    Given a .zip file with more than one inner file, when reading those inner files using zipfile.Path the zip module closes the .zip file prematurely, causing an error.

    Given the following code (example zipfile is attached, but any should work).

    with zipfile.ZipFile('foo.zip') as file:
        for name in ['file-1.txt', 'file-2.txt']:
            p = zipfile.Path(file, name)
            with p.open() as inner:
                print(list(inner)) # ValueError: seek of closed file

    But the following code does not fail:

    with zipfile.ZipFile('foo.zip') as file:
        for name in ['file-1.txt', 'file-2.txt']:
            with file.open(name) as inner:
                print(list(inner)) # Works!

    Python 3.8.2 macOS 10.15.4.

    @bustawin bustawin mannequin added 3.8 only security fixes topic-IO type-bug An unexpected behavior, bug, or error labels May 8, 2020
    @tirkarthi
    Copy link
    Member

    Bisection show this to be caused due to e5bd736 . See also bpo-41640 for a similar report.

    @jaraco
    Copy link
    Member

    jaraco commented Aug 26, 2020

    I've attempted to replicate the issue in the zipp test suite with this test:

    diff --git a/test_zipp.py b/test_zipp.py
    index a6fbf39..dc7c8aa 100644
    --- a/test_zipp.py
    +++ b/test_zipp.py
    @@ -259,3 +259,10 @@ class TestPath(unittest.TestCase):
         def test_implied_dirs_performance(self):
             data = ['/'.join(string.ascii_lowercase + str(n)) for n in range(10000)]
             zipp.CompleteDirs._implied_dirs(data)
    +
    +    def test_read_does_not_close(self):
    +        for alpharep in self.zipfile_alpharep():
    +            for rep in range(2):
    +                p_ = zipp.Path(alpharep, 'a.txt')
    +                with p_.open() as inner:
    +                    print(list(inner))

    But the test passes.

    @jaraco
    Copy link
    Member

    jaraco commented Aug 26, 2020

    I am able to replicate the failure using the ondisk fixture:

    diff --git a/test_zipp.py b/test_zipp.py
    index a6fbf39..539d0a9 100644
    --- a/test_zipp.py
    +++ b/test_zipp.py
    @@ -259,3 +259,11 @@ class TestPath(unittest.TestCase):
         def test_implied_dirs_performance(self):
             data = ['/'.join(string.ascii_lowercase + str(n)) for n in range(10000)]
             zipp.CompleteDirs._implied_dirs(data)
    +
    +    def test_read_does_not_close(self):
    +        for alpharep in self.zipfile_ondisk():
    +            with zipfile.ZipFile(alpharep) as file:
    +                for rep in range(2):
    +                    p_ = zipp.Path(file, 'a.txt')
    +                    with p_.open() as inner:
    +                        print(list(inner))

    @jaraco
    Copy link
    Member

    jaraco commented Aug 26, 2020

    I suspect the issue lies in how the CompleteDirs.make replaces one instance with another in order to provide a complete list of implied directories and to memoize the names lookup.

    Because constructing a zipfile.Path object around a zipfile.ZipFile copies the underlying state, closing one will have the affect of closing the other.

    I believe this issue is the same root issue as bpo-41350.

    @jaraco
    Copy link
    Member

    jaraco commented Aug 26, 2020

    I see a few options here:

    • Implement CompleteDirs/FastLookup as adapters instead of subclasses, allowing the original ZipFile object to represent the state in a single place. This approach would likely be slower due to the indirection on all operations through the wrapper.
    • Instead of constructing a new object and copying the state, CompleteDirs.make could mutate the existing ZipFile class, replacing source.__class__ with the new class. This approach is messy and the caller would still need to be aware that this change could be applied to the zipfile object.
    • Consider this use-case unsupported and document that any ZipFile object passed into Path is no longer viable and should not be referenced for another purpose.
    • Eliminate the class-based performance optimizations and replace them with some functional/imperative form. This approach may provide less separation of concerns.
    • Disable the close-on-delete behavior for the subclasses when state is copied from another.

    @jaraco
    Copy link
    Member

    jaraco commented Aug 26, 2020

    Implementing that last option:

    diff --git a/zipp.py b/zipp.py
    index 697d4a9..f244cba 100644
    --- a/zipp.py
    +++ b/zipp.py
    @@ -111,6 +111,7 @@ class CompleteDirs(zipfile.ZipFile):
     
             res = cls.__new__(cls)
             vars(res).update(vars(source))
    +        res.close = lambda: None
             return res
     
     
    

    Does appear to address the issue. I'm not super happy about the implementation, though.

    @jaraco
    Copy link
    Member

    jaraco commented Sep 9, 2020

    @jaraco
    Copy link
    Member

    jaraco commented Sep 22, 2020

    I've also created this alternative to option 2:

    This alternative approach uses a mix-in rather than subclasses, creating a new class on-demand. I was hoping this approach would enable just augmenting the instance rather than affecting source.__class__, but when I got to the implementation, I found that source.__class__ had to be mutated regardless.

    This approach does have an advantage over option 2 in that it would support other ZipFile subclasses for source. It has the disadvantage in that it creates a new subclass for every instance created.

    I've thought about it a lot and while I'm not particularly happy with any of the approaches, I'm leaning toward option 2.

    @jaraco
    Copy link
    Member

    jaraco commented Sep 23, 2020

    I've released zipp 3.2.0 that includes a fix for this issue (option 2). Please test it out.

    Because this change affects the user's expectation about the effect of what's passed in, I'm hesitant to backport it to 3.8 and 3.9. It's too late to go into 3.9.0, so the earliest it can appear is in 3.9.1.

    Please advise - does the undesirable behavior warrant a bugfix backport, or is it sufficient to direct users impacted by this issue prior to Python 3.10 to use the zipp backport?

    @jaraco
    Copy link
    Member

    jaraco commented Oct 3, 2020

    New changeset ebbe803 by Jason R. Coombs in branch 'master':
    bpo-40564: Avoid copying state from extant ZipFile. (GH-22371)
    ebbe803

    @jaraco
    Copy link
    Member

    jaraco commented Oct 3, 2020

    Fix merged into master. Please use zipp 3.2.0 on Python 3.9 and earlier for the improved behavior or report back here if a backport is needed (and why).

    @jaraco jaraco added 3.10 only security fixes and removed 3.8 only security fixes labels Oct 3, 2020
    @jaraco jaraco closed this as completed Oct 3, 2020
    @jaraco jaraco added 3.10 only security fixes and removed 3.8 only security fixes labels Oct 3, 2020
    @jaraco jaraco closed this as completed Oct 3, 2020
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes topic-IO type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants