Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zipfile: Raise ValueError for i/o operations on closed zipfile.ZipExtFile #81704

Closed
danifus mannequin opened this issue Jul 9, 2019 · 6 comments
Closed

zipfile: Raise ValueError for i/o operations on closed zipfile.ZipExtFile #81704

danifus mannequin opened this issue Jul 9, 2019 · 6 comments
Labels
3.9 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@danifus
Copy link
Mannequin

danifus mannequin commented Jul 9, 2019

BPO 37523
Nosy @serhiy-storchaka, @danifus
PRs
  • bpo-37523: Raise ValueError for I/O operations on a closed zipfile.ZipExtFile #14658
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-11-30.08:32:21.123>
    created_at = <Date 2019-07-09.07:06:51.578>
    labels = ['type-bug', 'library', '3.9']
    title = 'zipfile: Raise ValueError for i/o operations on closed zipfile.ZipExtFile'
    updated_at = <Date 2019-11-30.08:32:21.116>
    user = 'https://github.com/danifus'

    bugs.python.org fields:

    activity = <Date 2019-11-30.08:32:21.116>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-11-30.08:32:21.123>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2019-07-09.07:06:51.578>
    creator = 'dhillier'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 37523
    keywords = ['patch']
    message_count = 6.0
    messages = ['347524', '355458', '355615', '355621', '355631', '357657']
    nosy_count = 2.0
    nosy_names = ['serhiy.storchaka', 'dhillier']
    pr_nums = ['14658']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue37523'
    versions = ['Python 3.9']

    @danifus
    Copy link
    Mannequin Author

    danifus mannequin commented Jul 9, 2019

    After closing a file object opened from a ZipFile, attempting i/o operations raises AttributeError because the underlying fd has been set to None. We should be raising ValueErrors consistent with io.FileIO behaviour.

    Similar inconsistencies exist for the following operations on a closed ZipExtFile:

    • seek
    • seekable
    • read
    • readable
    • tell

    @danifus danifus mannequin added 3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Jul 9, 2019
    @serhiy-storchaka
    Copy link
    Member

    Could you please test what performance effect it has on read()?

    @danifus
    Copy link
    Mannequin Author

    danifus mannequin commented Oct 29, 2019

    Here's the script I used for profiling and the results I observed with and without the closed check in read:

    import zipfile
    
    test_zip = "time_test.zip"
    test_name = "test_name.txt"
    
    with zipfile.ZipFile(test_zip, "w") as zf:
        zf.writestr(test_name, "Hi there! " * 300)
    
    with zipfile.ZipFile(test_zip) as zf:
        for i in range(100000):
            zf.read(test_name)

    # Current code (no closed check), three different profiling sessions:
    # ncalls tottime percall cumtime percall filename:lineno(function)
    \bpo-100000 0.612 0.000 6.638 0.000 zipfile.py:884(read)
    \bpo-100000 0.598 0.000 6.489 0.000 zipfile.py:884(read)
    \bpo-100000 0.600 0.000 6.485 0.000 zipfile.py:884(read)

    # With closed check, three different profiling sessions:
    # ncalls tottime percall cumtime percall filename:lineno(function)
    \bpo-100000 0.632 0.000 6.587 0.000 zipfile.py:884(read)
    \bpo-100000 0.623 0.000 6.564 0.000 zipfile.py:884(read)
    \bpo-100000 0.638 0.000 6.700 0.000 zipfile.py:884(read)

    -------

    I based this change on the what BytesIO does: https://github.com/python/cpython/blob/master/Lib/_pyio.py#L912

    Let me know if you want me to make any changes.

    @serhiy-storchaka
    Copy link
    Member

    Thank you Daniel. But profiling is not good for benchmarking, it adds too much overhead. It can help to find narrow places, but for measuring effect of optimization you need to measure the raw time on non-debug build.

    Semantically your changes are correct. I just want to know whether this adds a significant overhead. If it is, we should consider some kind of optimizations. It does not matter for Python implementation of BytesIO because in normal we use the C implementation.

    @danifus
    Copy link
    Mannequin Author

    danifus mannequin commented Oct 29, 2019

    Good point. Thanks for the advice. I've updated it to use timeit. Does that give a better indication?

    import zipfile
    
    test_zip = "time_test.zip"
    test_name = "test_name.txt"

    # with zipfile.ZipFile(test_zip, "w") as zf:
    # zf.writestr(test_name, "Hi there! " * 300)

    def test():
        with zipfile.ZipFile(test_zip) as zf:
            for i in range(100000):
                zf.read(test_name)
    
    
    if __name__ == "__main__":
        import timeit
        print(timeit.repeat("test()", setup="from __main__ import test", number=1, repeat=10))

    On my machine (running a usual workload) the best of 10 was:

    master:
    3.812s

    check closed:
    3.874s

    But I think my machine had a quite a bit of variance between runs.

    @serhiy-storchaka
    Copy link
    Member

    New changeset 8d62df6 by Serhiy Storchaka (Daniel Hillier) in branch 'master':
    bpo-37523: Raise ValueError for I/O operations on a closed zipfile.ZipExtFile. (GH-14658)
    8d62df6

    @serhiy-storchaka serhiy-storchaka added 3.9 only security fixes and removed 3.7 (EOL) end of life 3.8 only security fixes labels Nov 30, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant