New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zipfile: Raise ValueError for i/o operations on closed zipfile.ZipExtFile #81704
Comments
After closing a file object opened from a ZipFile, attempting i/o operations raises AttributeError because the underlying fd has been set to None. We should be raising ValueErrors consistent with io.FileIO behaviour. Similar inconsistencies exist for the following operations on a closed ZipExtFile:
|
Could you please test what performance effect it has on read()? |
Here's the script I used for profiling and the results I observed with and without the closed check in read: import zipfile
test_zip = "time_test.zip"
test_name = "test_name.txt"
with zipfile.ZipFile(test_zip, "w") as zf:
zf.writestr(test_name, "Hi there! " * 300)
with zipfile.ZipFile(test_zip) as zf:
for i in range(100000):
zf.read(test_name) # Current code (no closed check), three different profiling sessions: # With closed check, three different profiling sessions: ------- I based this change on the what BytesIO does: https://github.com/python/cpython/blob/master/Lib/_pyio.py#L912 Let me know if you want me to make any changes. |
Thank you Daniel. But profiling is not good for benchmarking, it adds too much overhead. It can help to find narrow places, but for measuring effect of optimization you need to measure the raw time on non-debug build. Semantically your changes are correct. I just want to know whether this adds a significant overhead. If it is, we should consider some kind of optimizations. It does not matter for Python implementation of BytesIO because in normal we use the C implementation. |
Good point. Thanks for the advice. I've updated it to use timeit. Does that give a better indication? import zipfile
test_zip = "time_test.zip"
test_name = "test_name.txt" # with zipfile.ZipFile(test_zip, "w") as zf: def test():
with zipfile.ZipFile(test_zip) as zf:
for i in range(100000):
zf.read(test_name)
if __name__ == "__main__":
import timeit
print(timeit.repeat("test()", setup="from __main__ import test", number=1, repeat=10)) On my machine (running a usual workload) the best of 10 was: master: check closed: But I think my machine had a quite a bit of variance between runs. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: