classification
Title: Scalable zipfile extension
Type: Stage:
Components: Interpreter Core Versions: Python 2.5
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: akuchling, deufeufeu, jafo, loewis, ronaldoussoren
Priority: normal Keywords: patch

Created on 2003-09-27 08:09 by deufeufeu, last changed 2007-02-13 08:59 by loewis. This issue is now closed.

Files
File name Uploaded Description Edit
zipfile.diff deufeufeu, 2003-09-27 08:09 patch made using cvs diff -c
Messages (7)
msg44702 - (view) Author: Marc De Falco (deufeufeu) Date: 2003-09-27 08:09
Playing around with large zipfiles (> 10000 files),
I've encountered big loading time, even if after having
loaded it I use only 30 files in it.
So I've introduced a differed parameter to the
Zipfile.__init__ in order to load headers on-demand.
As it's not a really good idea to activated it for all
zip it defaults to False.
I've updated the documentation too.

Thx and keep the good work ;)

P.S. : Dunno if it can be added to 2.3 or have to be
included in 2.4, so I've choosed 2.4 group.
msg44703 - (view) Author: Sean Reifschneider (jafo) * (Python committer) Date: 2006-05-25 14:36
Logged In: YES 
user_id=81797

There is a summer of code project to re-write the zipfile
module, so this patch is moot.

Sean
msg44704 - (view) Author: Sean Reifschneider (jafo) * (Python committer) Date: 2006-05-25 14:38
Logged In: YES 
user_id=81797

Actually, we'll leave it open until the Summer of Code
implementation is completed and accepted.

Sean
msg44705 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2006-05-27 18:22
Logged In: YES 
user_id=580910

Patch [1446489 ] zipfile: support for ZIP64 also addresses this as a side-
effect of adding support ZIP64 support (for very big zipfiles).

BTW. I don't quite understand why this patch is put on hold just because a 
rewrite of the zipfile module is planned. 

W.r.t. this patch: why is the on-demand loading optional? Loading the per-file 
headers when the zipfile is opened is not necessary for normal operation, the 
current zipfile module is basically doing a full verify of the zipfile on all 
occassions. This isn't necessary for normal operation and I don't think the 
infozip tools do this (probably because verification is very  expensive).
msg44706 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2006-12-22 18:58
According to http://mail.python.org/pipermail/python-dev/2006-November/069969.html, the author of the zipfile rewrite isn't quite happy with it.  It doesn't look like the new module will be API-compatible with zipfile, so I think this patch should still be considered for inclusion.
msg44707 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2006-12-22 19:00
Patch #1446489, mentioned in Ronald's 2006-05-27 14:22 comment, was committed and is in Python 2.5.  Is this patch still relevant?
msg44708 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-02-13 08:59
I'm rejecting the patch, for the following reasons:
- I agree with ronaldoussoren that this deferred loading already happens in the 2.5 version (specifically, it happens inside read)
- I also agree that making it an optional parameter is unnecessary, it just complicates the interface. I also think the proposed parameter name ('differed') is mis-spelled, and should have been 'deferred' (unless I'm missing a meaning of 'differed')
- the implementation of the patch is unacceptable because it duplicates code.

As for why this patch is "put on hold": it was not because a rewrite was planned. The patch was contributed in 2003, and jafo's (first) comment was in 2006. The patch was "on hold" because nobody found the time to review it (just like these other 400 or so patches).
History
Date User Action Args
2003-09-27 08:09:38deufeufeucreate