Author ronaldoussoren
Recipients
Date 2006-05-23.13:10:00
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
Logged In: YES 
user_id=580910

I've found some time to work on this. I've added zipfile-zip64-
version2.patch, this version:

* Makes zip64 behaviour optional (defaults to off because zip(1) doesn't 
support  zip64)

* Is significantly faster for large zipfiles because it doesn't scan the entire 
zipfile just to check that the file headers are consistent with the central 
directory w.r.t. filename (this check is now done when trying to read a file)

* Updates the reference documentation.

* Adds unittests. There are two sets of tests: one set tests the behaviour of 
zip64 extensions using small files by lowering the zip64 cutoff point and is 
run every time, the other set do tests with huge zipfiles and are run when the 
largefile feature is enabled when running the tests.

There one backward incompatible change: ZipInfo objects no longer have a 
file_offset attribute. That was the other reason for scanning the entire zipfile 
when opening it. IMNSHO this should have been a private attribute and the 
cost of this feature is not worth its *very* limited usefulness. As an indication 
of its cost: I got a 6x speedup when I removed the calculation of the 
file_offset attribute, something that adds up when you are dealing with huge 
zipfiles (I wrote this patch because I'm dealing with 10+GByte zipfiles with 
tens of thousands of files at work).

I noticed that zipfile raises RuntimeError in some places. I've changed one of 
those to zipfile.BadZipfile, but others remain. I don't like this, most of them 
should be replaced by TypeError or ValueError exceptions.

BTW. This patch also supports storing files >4GByte in the zipfile, but that 
feature isn't very useful because zipfile doesn't have an API for reading file 
data incrementally.
History
Date User Action Args
2007-08-23 15:46:47adminlinkissue1446489 messages
2007-08-23 15:46:47admincreate