This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: zipfile: support for ZIP64
Type: Stage:
Components: Library (Lib) Versions: Python 2.5
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: ronaldoussoren Nosy List: anthonybaxter, georg.brandl, gregory.p.smith, ronaldoussoren
Priority: normal Keywords: patch

Created on 2006-03-09 14:58 by ronaldoussoren, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
zipfile-zip64.patch ronaldoussoren, 2006-03-09 15:28
zipfile-zip64-version2.patch ronaldoussoren, 2006-05-23 13:10
zipfile64-version3.patch ronaldoussoren, 2006-05-26 08:26
zipfile64-version4.patch ronaldoussoren, 2006-05-30 13:28
zipfile64-version-5.patch ronaldoussoren, 2006-06-11 21:09
Messages (12)
msg49695 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2006-03-09 14:58
The attached patch implements support for ZIP64, that is zipfiles 
containing very large (>4GByte) files and zipfiles that are larger than
4GByte themselves. 

The output of this patch can be read by pkzip (see below for the actual 
version I used for testing).


msg49696 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2006-03-09 15:28
Logged In: YES 
user_id=580910

Oops, I've uploaded the wrong file. zipfile-zip64.patch is the correct one.

I've tested the correctness of created archives using this version of pkzip:

pkzipc -version
PKZIP(R) Server  Version 8  ZIP Compression Utility for Linux X86
Copyright (C) 1989-2005 PKWARE, Inc.  All Rights Reserved. Evaluation 
Version
PKZIP Reg. U.S. Pat. and Tm. Off.  Patent No. 5,051,745
Patent Pending

Version 8.40.66
msg49697 - (view) Author: Anthony Baxter (anthonybaxter) (Python triager) Date: 2006-04-02 05:02
Logged In: YES 
user_id=29957

I'd like to see a testcase and possibly a note for the
documentation about the new semantics. Also, should it be
possible to say "don't use the ZIP64 extension, instead
raise an Error" for people who don't want to generate these?
 
msg49698 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2006-04-02 19:13
Logged In: YES 
user_id=580910

The "don't use the ZIP64 extension" flag is a good idea, zipfiles that use this 
extension aren't readable by the infozip tools (zip and unzip on most unix 
systems).

I'll add tests and documentation in the near future.

The version of zipfile that I'm currently using also contains a patch for 
speeding up the opening of zipfiles, for the type of files I'm dealing with 
(about 11GByte large with tens of thousands of files) the speedup is very 
significant. I suppose it's better to file that as a separate patch after this has 
been approved.
msg49699 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2006-05-16 07:41
Logged In: YES 
user_id=849994

Since 2.5 beta is coming close, have you made progress on
the tests/docs?
msg49700 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2006-05-16 07:55
Logged In: YES 
user_id=580910

I haven't had time to work on this, all time I had to work on python related stuff 
has been eaten by finishing PyObjC's port to intel macs and universal binary 
patches.

The former is now done, the latter almost so I'll have some time to work on this 
again especially because I'm using this patch at work and might be able to claim 
some time to work on this during work-hours.
msg49701 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2006-05-23 13:10
Logged In: YES 
user_id=580910

I've found some time to work on this. I've added zipfile-zip64-
version2.patch, this version:

* Makes zip64 behaviour optional (defaults to off because zip(1) doesn't 
support  zip64)

* Is significantly faster for large zipfiles because it doesn't scan the entire 
zipfile just to check that the file headers are consistent with the central 
directory w.r.t. filename (this check is now done when trying to read a file)

* Updates the reference documentation.

* Adds unittests. There are two sets of tests: one set tests the behaviour of 
zip64 extensions using small files by lowering the zip64 cutoff point and is 
run every time, the other set do tests with huge zipfiles and are run when the 
largefile feature is enabled when running the tests.

There one backward incompatible change: ZipInfo objects no longer have a 
file_offset attribute. That was the other reason for scanning the entire zipfile 
when opening it. IMNSHO this should have been a private attribute and the 
cost of this feature is not worth its *very* limited usefulness. As an indication 
of its cost: I got a 6x speedup when I removed the calculation of the 
file_offset attribute, something that adds up when you are dealing with huge 
zipfiles (I wrote this patch because I'm dealing with 10+GByte zipfiles with 
tens of thousands of files at work).

I noticed that zipfile raises RuntimeError in some places. I've changed one of 
those to zipfile.BadZipfile, but others remain. I don't like this, most of them 
should be replaced by TypeError or ValueError exceptions.

BTW. This patch also supports storing files >4GByte in the zipfile, but that 
feature isn't very useful because zipfile doesn't have an API for reading file 
data incrementally.
msg49702 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2006-05-26 08:26
Logged In: YES 
user_id=580910

I've attached yet another version, this version reintroduces some functionalitity 
that was unintentionally removed and fixes a lame bug that caused 
test_zipimport to fail.
msg49703 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2006-05-30 13:28
Logged In: YES 
user_id=580910

I've added some more tests for pre-existing functionality. The unittests are still 
far from comprehensive, but at least touch upon most functionality of zipfile.

Does anyone feel like reviewing this? I'd like to get this into python2.5.
msg49704 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2006-06-11 20:33
Logged In: YES 
user_id=413

reading zipfile64-version64.patch:

* why does the zipfile module import itself?

* Why is the default ZIP64 limit 1 << 30?  shouldn't that be
1 << 31 - 1 (or slightly less) for maximum compatibility on
existing <2GiB zip files or zips with data just under 2GiB.
 Don't force zip64's use unless the size actually exceeds a
32bit signed integer.

* assert diskno == 0 and assert nodisks == 1 should be
turned into BadZipFile exceptions with an explanation that
multi-disk zip files aren't supported.

* in main() document the -t option in the usage string.

* TestZip64InSmallFiles changes zipfile.ZIP64_LIMIT but will
not restore the value if a test fails (that could lead to
other unrelated test failures).  not a problem in the
hopefully normal case of all tests passing.  use a try:
finally: to make sure that gets reset.

* documentation:  "Is does optionally handle" is awkward. 
how about "It can handle"


The removal of the file_offset attribute makes sense but
does make me wonder how much existing code that could break.
 I suggest leaving file_offset out and if any python 2.5
beta tester complains, restoring it or making scanning to
look file offsets up a ZipFile option (defaulting to True).

msg49705 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2006-06-11 21:09
Logged In: YES 
user_id=580910

* The import of zipfile itself is a bug

* The limit should indead be raised to (1<<31-1).

* the diskno and nodisks assertions are present in the current version of 
zipfiles, but I agree that those should be changed into exceptions.

* I've updated main to document and actually allow the -t option

* TestZip64InSmallFiles restores the ZIP64_LIMIT in the tearDown method,
   isn't that good enough?

I sure hope that nobody actually uses the file_offset. The only usecase I can 
think of for that is to reimplement the read method. If it turns out that this 
change does break existing code we could add yet another option, but lets 
wait with that until someone actually complains.

I've uploaded a new version of the patch that fixes all these issues.

BTW. Thanks for the review.
msg49706 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2006-07-18 12:37
Logged In: YES 
user_id=580910

this is part of 2.5, no need to keep this item open.
History
Date User Action Args
2022-04-11 14:56:15adminsetgithub: 43003
2006-03-09 14:58:19ronaldoussorencreate