Issue 467924: Improve the ZipFile Interface

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/35279

classification

Title:	Improve the ZipFile Interface
Type:	enhancement	Stage:
Components:	Library (Lib)	Versions:

process

Status:	closed	Resolution:	accepted
Dependencies:		Superseder:
Assigned To:		Nosy List:	alanmcintyre, crhode, georg.brandl, jvr, myers_carpenter, mzimmerman, scott_daniels
Priority:	normal	Keywords:

Created on 2001-10-04 15:54 by anonymous, last changed 2022-04-10 16:04 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
zipfile_extract.diff	alanmcintyre, 2008-01-05 00:30	Add extract/extractall methods to ZipFile

Messages (12)
msg53289 - (view)	Author: Nobody/Anonymous (nobody)	Date: 2001-10-04 15:54
There exist two methods to write to a ZipFile write(self, filename, arcname=None, compress_type=None) writestr(self, zinfo, bytes) but only one to read from it read(self, name) Additionally, the two 'write's behave differently with respect to compression. --- (a) 'read' does not fit to 'write', since 'write' takes a file and adds it to a ZipFile, but 'read' is not the reverse operation. 'read' should be called 'readstr' since it much better matches to 'writestr'. (b) It is confusing what 'write' and 'read' actually mean. Does 'write' write a file, or into the ZipFile? It would be more obvious if ZipFile has 4 methods which pair-wise fit together: writestr (self, zinfo, bytes) # same as now readstr (self, name) # returns bytes (as string), currently called 'read' # 'read' could still live but should be deprecated add (self, filename, arcname=None, compress_type=None) # currently 'write' # 'write' could still live but should be deprecated extract (self, name, filename, arcname=None) # new, desired functionality (c) BOTH, 'writestr' and 'add' should by default use the 'compress_type' that was passed to the constructor of 'ZipFile'. Currently, 'write' does it, 'writestr' via zinfo does it not. 'ZipInfo' sets the compression strict to 'ZIP_STORED' :-( It should not do that! It rather should: - allow more parameters in the signature of the constructor to also pass the compression type (and some other attributes, too) - default to 'None', so that 'writestr' can see this, and then take the default from the 'ZipFile' instance.
msg53290 - (view)	Author: Just van Rossum (jvr) *	Date: 2003-01-05 20:54
Logged In: YES user_id=92689 In Python 2.3, writestr() has an enhanced signature: the first arg may now also be an archive name, in which case the correct default settings are used (ie. the compression value is taken from the file). See patch #651621. extract() could be moderately useful (although I don't understand the 'arcname' arg, how's that different from 'name'?) but would have to deal with file modes (bin/text). The file mode isn't in the archive so would have to (optionally) be supplied by the caller.
msg53291 - (view)	Author: Matt Zimmerman (mzimmerman)	Date: 2003-07-31 14:22
Logged In: YES user_id=196786 It would also be very useful to be able to have ZipFile read/write the uncompressed file data from/to a file-like object, instead of just strings and files (respectively). I would like to use this module to work with zip files containing large files, but this is unworkable because the current implementation would use excessive amounts of memory. Currently, read() reads all of the compressed data into memory, then uncompresses it into memory. For files which may be hundreds of megabytes compressed, this is undesirable. Likewise for write(), I would like to be able to stream data into a zip file, passing in a ZipInfo to specify the metadata as is done with writestr(). The implementation of this functionality is quite straightforward, but I am not sure whether (or how) the interface should change. Some other parts of the library allow for a file object to be passed to the same interface which accepts a filename. The object is examined to see if it has the necessary read/write methods and if not, it is assumed to be a filename. Would this be the correct way to do it? I, too, am a bit irked by the lack of symmetry exhibited by read vs. write/writestr, and think that the interface suggested above would be a significant improvement.
msg53292 - (view)	Author: Myers Carpenter (myers_carpenter)	Date: 2004-05-09 18:23
Logged In: YES user_id=335935 The zipfile interface should match the tarfile interface. At the mininum is should work for this example: import zipfile zip = zipfile.open("sample.zip", "r") for zipinfo in zip: print tarinfo.name, "is", tarinfo.size, "bytes in size and is", zip.extract(zipinfo) zip.close() This closely matchs the 'tarfile' module.
msg53293 - (view)	Author: Chuck Rhode (crhode)	Date: 2005-09-22 15:56
Logged In: YES user_id=988879 I've been trying to read map files put out by the Census Bureau. These ZIP archives are downloaded from government contractors' sites by county. Within each county archive are several ZIP files for each map layer (roads, streams, waterbodies, etc). Each contains the elements of an ESRI shapefile database (.shp, .shx., and .dbf files). This doesn't make a lot of sense to me, either, because there's no compression advantage to making an archive of an archive. The technique is used purely for organizational purposes because ZIP does not compress subdirectories. Note: I've never seen a TAR of TAR files because TAR does compress subdirectories. What I've been struggling with is a way to leave these archives in their compressed form and still do python I/O on them. There is a tree organization to them, after all, just as with traditional os.path directories. I've designed some objects that let me retrieve the most recent file, ZIP member, or TAR member by name from a given path to a repository of such archives. What I get is a StreamIO object that I can subsequently put back where it came from. What would be nice is if there already were objects available to manipulate normal os.path directories comingled with ZIP and TAR archives. What would be nicer is if I/O could be opened at the character/line level transparently without regard to whether the source/destination was a file or an archive member within such a structure. In the days of hardware compression and on-the-fly encryption/decryption of I/O, is this too much to ask? -ccr-
msg53294 - (view)	Author: Scott David Daniels (scott_daniels) *	Date: 2005-09-25 20:20
Logged In: YES user_id=493818 I am currently working on an expanded zipfile module that: (a) Has a more easily extensible class (b) Allows BZIP2 compression (my orginal need) (c) Allows file-like (read) access to the elements of ZipFile (d) Provides for a single "writer" which can be used to generate file contents "incrementally" while possibly reading from other "files" in the zipfile (e) Allows the opening of embedded zips "in-place" What I don't have at the moment is a good set of tests or good documents of how to use it. Anyone interested in collaborating, let me know. --Scott David Daniels
msg53295 - (view)	Author: Alan McIntyre (alanmcintyre) *	Date: 2005-09-25 23:29
Logged In: YES user_id=1115903 Scott, I had put together some enhancements to ZipFile read/write, including test cases, but haven't had time to advocate getting it into 2.5. You can find it here: https://sourceforge.net/tracker/?func=detail&aid=1121142&group_id=5470&atid=305470 If it seems like it would be helpful, I can go round up the most recent version (that I've been using in a production environment) and send it to you.
msg57981 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2007-11-30 13:56
Alan's patch has since been committed. Is there any more work on this item?
msg57990 - (view)	Author: Alan McIntyre (alanmcintyre) *	Date: 2007-11-30 16:21
There was another issue that also asked for an extract feature, and if I recall correctly I said I'd try to work on it (I think I have some code somewhere for it but I'll have to look). Tonight or tomorrow I will see if I can find that other issue and let you know about it, and maybe take a look around at the various zipfile improvement/change requests to see if they've been completely addressed.
msg59269 - (view)	Author: Alan McIntyre (alanmcintyre) *	Date: 2008-01-05 00:30
I attached a patch with the following changes (as zipfile_extract.diff): (1) Add a note to the docs (under writestr) about how the compression is selected if a ZipInfo is passed as the zinfo_or_arcname parameter. If anybody thinks it's a good idea to add a compression argument to the ZipInfo constructor, I can modify the patch/docs accordingly. (2) Add an extract method to ZipFile and associated test/documentation changes.
msg59272 - (view)	Author: Alan McIntyre (alanmcintyre) *	Date: 2008-01-05 00:39
Are the method renames/additions suggested in the original issue worth doing? When I first started using this module, I found the documentation easy and thorough enough to understand how to use it, so I would vote for just leaving the ZipFile interface the way it is.
msg59476 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2008-01-07 18:48
I committed your patch (after reviewing the docs) as r59834. I think there is no more to do here.

History
Date	User	Action	Args
2022-04-10 16:04:30	admin	set	github: 35279
2008-01-07 18:48:55	georg.brandl	set	status: open -> closed resolution: accepted messages: + msg59476
2008-01-05 00:39:26	alanmcintyre	set	messages: + msg59272
2008-01-05 00:30:54	alanmcintyre	set	files: + zipfile_extract.diff messages: + msg59269
2007-11-30 16:21:47	alanmcintyre	set	messages: + msg57990
2007-11-30 13:56:06	georg.brandl	set	nosy: + georg.brandl messages: + msg57981
2001-10-04 15:54:14	anonymous	create