classification
Title: low performance of zipfile readline()
Type: behavior Stage:
Components: Library (Lib) Versions: Python 2.6
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Cannot use both read and readline method in same ZipExtFile object
View: 7610
Assigned To: Nosy List: amaury.forgeotdarc, volker_siepmann
Priority: normal Keywords:

Created on 2009-10-27 07:13 by volker_siepmann, last changed 2010-02-09 16:08 by amaury.forgeotdarc. This issue is now closed.

Messages (2)
msg94545 - (view) Author: Volker Siepmann (volker_siepmann) Date: 2009-10-27 07:13
The readline() function in zipfile (in ZipExtFile) reads chunks of max
100 bytes (zipfile.py, line 525) into the linebuffer. A file of 500
MBytes therefore yields 5 million chunks.
Changing the value 100 to 10000 bytes boosts performance by magnitudes,
while it only requires 10k of memory.

My fix in zipfile.py, line 525:

buf = self.read(min(size, 10000)) # was 100 before

Best regards / Volker Siepmann
msg99121 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-02-09 16:08
Already fixed with issue7610.
History
Date User Action Args
2010-02-09 16:08:54amaury.forgeotdarcsetstatus: open -> closed

nosy: + amaury.forgeotdarc
messages: + msg99121

superseder: Cannot use both read and readline method in same ZipExtFile object
resolution: duplicate
2009-10-27 07:13:49volker_siepmanncreate