Issue 1121142: ZipFile.open - read-only file-like obj for files in archive

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/41565

classification

Title:	ZipFile.open - read-only file-like obj for files in archive
Type:		Stage:
Components:	Library (Lib)	Versions:	Python 2.5

process

Status:	closed	Resolution:	accepted
Dependencies:		Superseder:
Assigned To:		Nosy List:	alanmcintyre, glyf, loewis
Priority:	normal	Keywords:	patch

Created on 2005-02-12 00:08 by alanmcintyre, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
zipfile_patch2.tgz	alanmcintyre, 2005-02-27 20:51	Upated patch files
zipfile_patch3.tgz	alanmcintyre, 2005-04-13 06:29	Revision 3 of patch
zipfile_patch4.tgz	alanmcintyre, 2005-04-13 16:34	Revision 4 of patch
zipfile_patch5.tgz	alanmcintyre, 2005-04-27 02:23	Revision 5 of patch
zipfile_patch6.tgz	alanmcintyre, 2007-02-18 18:41	Revision 6 of patch for 2.6 trunk
zipfile_patch7.tgz	alanmcintyre, 2007-02-28 00:29	Revision 7 (hopefully final)

Messages (16)
msg47752 - (view)	Author: Alan McIntyre (alanmcintyre) *	Date: 2005-02-12 00:08
I originally started working on updating patch 992750, but decided after a short while to just start from scratch, so I'm posting it as a new patch. Sorry if this isn't appropriate. This patch provides a new open() method on ZipFile; this method returns a file-like object for the requested item in the archive. This file-like object only provides a read() method. ZipFile.read was modified to use the new open method (this was suggested by loewis in reference to patch 992750). The patched zipfile.py passed the existing tests in the test_zipfile.py from CVS. New tests were added to verify the operation of the object returned by open(). These tests were modeled after existing tests for ZipFile.read(); two read fixed-size chunks from the file-like object, and two others read random-sized chunks. I have only run the tests on Windows XP, using Python2.4 from the official Windows installer. I will test the patch out on Linux over the weekend. If the patch is accepted I'll also generate and submit patches for the appropriate documentation as well.
msg47753 - (view)	Author: Alan McIntyre (alanmcintyre) *	Date: 2005-02-27 06:28
Logged In: YES user_id=1115903 zipfile_patch2.tgz: I updated the file-like object to support readline, readlines, __iter__ and next(). Added tests for those new methods. Also added a patch for the documentation. Passes regression tests on 2.5a0 built from CVS HEAD with MSVC .NET on Windows XP.
msg47754 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2005-03-01 07:59
Logged In: YES user_id=21627 The patch is reversed: usually, diff is invoked as "-c old new". I think it is almost right, but I have a few remarks: - by tradition, open() should have a mode argument, defaulting to 'r'; it would be ok to raise exceptions if it is anything else. However, do consider implementing universal newlines; allowing 'b' as a no-op might also be reasonable. - I wonder what happens if the compression rate is < 1. It would appear that the code might use too few rawbytes. I suggest to recursively invoke read in this case. - I wonder whether it could ever happen that there is still data to uncompress in the zlib object, ie. whether it might be necessary to invoke .flush() after exhausting the rawbytes (and discard the zlib object afterwards) - it appears that atleast the builtin file object implements .read(0) as returning an empty string; the manual says that the entire file is meant only if size is omitted or negative.
msg47755 - (view)	Author: Alan McIntyre (alanmcintyre) *	Date: 2005-03-14 14:37
Logged In: YES user_id=1115903 Hmm...I could have sworn I did the diff in the correct order. I'll make sure next time. :) Here's my comments on your remarks (in order): - I'm adding support for universal newlines, and will reject all modes that aren't legal combinations of r, U, and b. - I'll see if I can make a Zip file store something with compression < 1, and if I can I'll add a test case for it. - I'll try work a .flush() in there on the compression object and come up with a test case if possible - .read(0) and .readline(0) will both return an empty string with no side-effects, since this seems to be what builtin files do. Right now ZipExtFile uses the ZipFile's file object, so you pretty much have to do whatever you're going to do with the ZipExtFile instance you get back from .open() before you use that ZipFile instance in a way that moves the file pointer around. I'm sure that somebody will eventually try to use the ZipFile in this way, so it appears my options are either to (1) give the ZipExtFile its own file object to use (when possible), or (2) make sure this limitation is documented. #1 feels (to me) to be the "right thing" to do, so that's what I'll try unless there's a good reason I shouldn't.
msg47756 - (view)	Author: Alan McIntyre (alanmcintyre) *	Date: 2005-04-13 05:58
Logged In: YES user_id=1115903 Uploaded the third revision of this patch; passes all regression tests against current CVS on WinXP. I think all the issues Martin brought up have been addressed except perhaps for the case of compression rate <1. I will have a look at that when I have time; just wanted to get an update here before the patch started to look abandoned. :)
msg47757 - (view)	Author: Alan McIntyre (alanmcintyre) *	Date: 2005-04-13 16:34
Logged In: YES user_id=1115903 I found a problem with my universal newline handling code in readline(); if the first byte of an '\r\n' pair was read from file but the second byte didn't come in on that same read, it resulted in the next line having an inappropriate '\n' prepended to it. zipfile_patch4.tgz has a fix for this included (with everything else, of course). I'm going to test the open() capability in a real application with some reasonably large zip files (containing files up to just short of 2GB) over the next few days, so if any bugs or performance problems show up I may have some more changes.
msg47758 - (view)	Author: Alan McIntyre (alanmcintyre) *	Date: 2005-04-27 02:23
Logged In: YES user_id=1115903 After testing on my large batch of large Zip files, I made one fix (version 4 of the patch didn't read all the content of large compressed archive items). The current set of changes is attached as zipfile_patch5.tgz.
msg47759 - (view)	Author: Alan McIntyre (alanmcintyre) *	Date: 2005-05-31 02:56
Logged In: YES user_id=1115903 Revision 5 of this patch has been in constant use with Python 2.4.1 in an application at my job for about a month; it seems to be stable and useful in that regard. If anybody has time to review and accept (or reject) it I would appreciate it.
msg47760 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2007-02-13 09:14
Can you please update the patch to the current subversion trunk? I'd like to apply it, but the code of zipfile has changed so that the patch is currently out-of-date. When redoing it, notice that the read implementation has changed (I couldn't figure out how you moved code around). Please do use the trunk (not Python 2.5), and please submit the patch as a single 'svn diff' output (rather than a tar file containing multiple individual diff files).
msg47761 - (view)	Author: Alan McIntyre (alanmcintyre) *	Date: 2007-02-14 17:13
I will see if I can make some progress on this over the weekend.
msg47762 - (view)	Author: Glyph Lefkowitz (glyf)	Date: 2007-02-17 03:48
Twisted also contains an implementation of this functionality, available from http://twistedmatrix.com/trac/browser/trunk/twisted/python/zipstream.py As far as I can tell it doesn't have anything to recommend it over the attached patch, however (in fact, the test coverage of the attached patch looks better), but perhaps it has some behavior which might be desirable to steal.
msg47763 - (view)	Author: Alan McIntyre (alanmcintyre) *	Date: 2007-02-18 16:44
Thanks, glyph; I'll definitely have a look and see what I can steal. :-)
msg47764 - (view)	Author: Alan McIntyre (alanmcintyre) *	Date: 2007-02-18 18:41
Here is a version of the patch against the current trunk. It passes a "make test" on my Gentoo laptop at the moment, but I think I still need to add some more tests (like reading from encrypted compressed files), and I need to try out the test_zipfile64 stuff. I don't have time today to look through the module Glyph suggested, so it may be that after I do that I'll have some more tweaks to make. I'm pretty sure I'll be able to spend some more time on Wednesday this week. I also included the full contents of zipfile.py and test_zipfile.py just in case that's useful to somebody. File Added: zipfile_patch6.tgz
msg47765 - (view)	Author: Alan McIntyre (alanmcintyre) *	Date: 2007-02-28 00:29
This version of the patch includes the updates to the documentation. It passes regression tests as well as the test_zipfile64 tests on my Gentoo laptop. File Added: zipfile_patch7.tgz
msg47766 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2007-03-06 10:49
Thanks for the patch. Committed as r54152. If you want to produce further changes, please submit them as new patches against the subversion trunk. Two remarks: - I recall discussion of adding an extract method. Are you still interested in this? - The previous version required that the mode is 'r' or 'a' to invoke .read(), and raised a RuntimeError in this case. Your version doesn't (but instead raises KeyError if the archive member to be read wasn't written yet). Was this change deliberate?
msg47767 - (view)	Author: Alan McIntyre (alanmcintyre) *	Date: 2007-03-06 14:29
Thanks for committing. :) I will probably try adding the extract method and submit a patch for that soon. I had noticed the mode problem just a day or two ago while I was writing some more tests to improve coverage of the zipfile module, and I will include a fix for that (and the new tests) in a separate patch.

History
Date	User	Action	Args
2022-04-11 14:56:09	admin	set	github: 41565
2005-02-12 00:08:30	alanmcintyre	create