classification
Title: low performance of zipfile readline()
Type: behavior Stage:
Components: Library (Lib) Versions: Python 2.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: volker_siepmann (1)
Priority: Keywords

Created on 2009-10-27 07:13 by volker_siepmann, last changed 2009-10-27 07:13 by volker_siepmann.

Messages (1)
msg94545 - (view) Author: Volker Siepmann (volker_siepmann) Date: 2009-10-27 07:13
The readline() function in zipfile (in ZipExtFile) reads chunks of max
100 bytes (zipfile.py, line 525) into the linebuffer. A file of 500
MBytes therefore yields 5 million chunks.
Changing the value 100 to 10000 bytes boosts performance by magnitudes,
while it only requires 10k of memory.

My fix in zipfile.py, line 525:

buf = self.read(min(size, 10000)) # was 100 before

Best regards / Volker Siepmann
History
Date User Action Args
2009-10-27 07:13:49volker_siepmanncreate