New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZipFile unzip is unbuffered #54585
Comments
The Unzip module is always unbuffered (tested v.3.1.2 Windows XP, 32-bit). This means that if one has to do many small reads it is a lot slower than reading a chunk of data to a buffer and then reading from that buffer. It seems logical that the unzip module should default to buffered reading and/or have a buffered argument. Likewise, the documentation should clarify that there is no buffering involved when doing a read, which runs contrary to the default behavior of a normal read. start Zipfile read |
I should clarify that this is the zipfile constructor I am using: zipfile.ZipFile(filename, mode='r', allowZip64=True); |
Actually reading from the zip file is buffered (at least 4 KiB of uncompressed data at a time). Can you give tests, scripts and data, which show the problem? |
See attached, which will open a zipfile that contains one file and reads it a bunch of times using unbuffered and buffered idioms. This was tested on windows using python 3.2 You're in charge of coming up with a file to test it on. Sorry. Example output: Enter filename: test.zip |
This is not because zipfile module is unbuffered. This is the difference between expensive function call and cheap bytes slicing. Replace Here is a patch, which is speeds up (+20%) the reading from a zip file by small chunks. Microbenchmark: ./python -m zipfile -c test.zip python Python 3.3 (vanilla): 1 loops, best of 3: 36.4 sec per loop |
Thank you, Martin, now I understood why not work Rietveld review. |
The patch updated to reflect Martin's stylistic comments. Sorry for the delay, Martin. I have not received an email with your review from 2012-05-13, and only today accidentally discovered your comments in Rietveld. It seems to have been some bug in Rietveld. |
Martin, now the patch is good? |
Any chance to commit the patch before final feature freeze? |
Patch looks fine to me. Antoine, can you commit this? I'm currently away from the computer that |
New changeset 0e8285321659 by Antoine Pitrou in branch 'default': |
Ok, done. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: