This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: zipfile: Raise ValueError for i/o operations on closed zipfile.ZipExtFile
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: dhillier, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2019-07-09 07:06 by dhillier, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 14658 merged dhillier, 2019-07-09 07:27
Messages (6)
msg347524 - (view) Author: Daniel Hillier (dhillier) * Date: 2019-07-09 07:06
After closing a file object opened from a ZipFile, attempting i/o operations raises AttributeError because the underlying fd has been set to None. We should be raising ValueErrors consistent with io.FileIO behaviour.

Similar inconsistencies exist for the following operations on a closed ZipExtFile:
- seek
- seekable
- read
- readable
- tell
msg355458 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-10-27 09:18
Could you please test what performance effect it has on read()?
msg355615 - (view) Author: Daniel Hillier (dhillier) * Date: 2019-10-29 05:46
Here's the script I used for profiling and the results I observed with and without the closed check in read:

import zipfile

test_zip = "time_test.zip"
test_name = "test_name.txt"

with zipfile.ZipFile(test_zip, "w") as zf:
    zf.writestr(test_name, "Hi there! " * 300)

with zipfile.ZipFile(test_zip) as zf:
    for i in range(100000):
        zf.read(test_name)

# Current code (no closed check), three different profiling sessions:
# ncalls  tottime  percall  cumtime  percall filename:lineno(function)
# 100000    0.612    0.000    6.638    0.000 zipfile.py:884(read)
# 100000    0.598    0.000    6.489    0.000 zipfile.py:884(read)
# 100000    0.600    0.000    6.485    0.000 zipfile.py:884(read)

# With closed check, three different profiling sessions:
# ncalls  tottime  percall  cumtime  percall filename:lineno(function)
# 100000    0.632    0.000    6.587    0.000 zipfile.py:884(read)
# 100000    0.623    0.000    6.564    0.000 zipfile.py:884(read)
# 100000    0.638    0.000    6.700    0.000 zipfile.py:884(read)

-------

I based this change on the what BytesIO does: https://github.com/python/cpython/blob/master/Lib/_pyio.py#L912

Let me know if you want me to make any changes.
msg355621 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-10-29 07:01
Thank you Daniel. But profiling is not good for benchmarking, it adds too much overhead. It can help to find narrow places, but for measuring effect of optimization you need to measure the raw time on non-debug build.

Semantically your changes are correct. I just want to know whether this adds a significant overhead. If it is, we should consider some kind of optimizations. It does not matter for Python implementation of BytesIO because in normal we use the C implementation.
msg355631 - (view) Author: Daniel Hillier (dhillier) * Date: 2019-10-29 08:45
Good point. Thanks for the advice. I've updated it to use timeit. Does that give a better indication?

import zipfile

test_zip = "time_test.zip"
test_name = "test_name.txt"

# with zipfile.ZipFile(test_zip, "w") as zf:
#     zf.writestr(test_name, "Hi there! " * 300)

def test():
    with zipfile.ZipFile(test_zip) as zf:
        for i in range(100000):
            zf.read(test_name)


if __name__ == "__main__":
    import timeit
    print(timeit.repeat("test()", setup="from __main__ import test", number=1, repeat=10))


On my machine (running a usual workload) the best of 10 was:

master:
3.812s

check closed:
3.874s

But I think my machine had a quite a bit of variance between runs.
msg357657 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-11-30 08:30
New changeset 8d62df60d8733d0fa9aee14ef746d0009a7a9726 by Serhiy Storchaka (Daniel Hillier) in branch 'master':
bpo-37523: Raise ValueError for I/O operations on a closed zipfile.ZipExtFile. (GH-14658)
https://github.com/python/cpython/commit/8d62df60d8733d0fa9aee14ef746d0009a7a9726
History
Date User Action Args
2022-04-11 14:59:17adminsetgithub: 81704
2019-11-30 08:32:21serhiy.storchakasetstatus: open -> closed
stage: patch review -> resolved
resolution: fixed
versions: + Python 3.9, - Python 3.7, Python 3.8
2019-11-30 08:30:53serhiy.storchakasetmessages: + msg357657
2019-10-29 08:45:45dhilliersetmessages: + msg355631
2019-10-29 07:01:37serhiy.storchakasetmessages: + msg355621
2019-10-29 05:46:54dhilliersetmessages: + msg355615
2019-10-27 09:18:00serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg355458
2019-07-09 07:27:20dhilliersetkeywords: + patch
stage: patch review
pull_requests: + pull_request14465
2019-07-09 07:06:51dhilliercreate