This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author gambl
Recipients Arthur.Darcet, Dave Sawyer, christian.heimes, chroipahtz, eric.araujo, gambl, lars.gustaebel, loewis, rhettinger, rossmclendon, sandro.tosi, serhiy.storchaka, terry.reedy, ubershmekel, victorlee129
Date 2015-04-09.15:21:56
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1428592919.73.0.650964401403.issue6818@psf.upfronthosting.co.za>
In-reply-to
Content
Hi,

I've recently been working on a Python module for the Adobe universal container format (UCF) which extends the zip specification - as part of this I wanted to be able to remove and rename files in an archive.

I discovered this issue when writing the module so realised there wasn't currently a solution - so I went down the rabbit hole.

I've attached a patch which supports the removal and renaming of files in a zip archive. You can also look at this python module in a git-repo which is a the same code but separated out into a class that extends ZipFile: https://github.com/gambl/zipextended.

The patch provides 4 main new "public" functions for the zipfile library:

- remove(self, zinfo_or_arcname):
- rename(self, zinfo_or_arcname, filename):
- commit(self):
- clone(self, file, filenames_or_infolist=None, ignore_hidden_files=False)

The patch is in part modelled on the rubyzip solution. Remove and rename will initially only update the ZipFile's infolist. Changes are then persisted via a commit function which can be called manually - or will be called automatically upon close. Commit will then clone the zipfile with the necessary changes to a temporary file and replace the original file when that operation has completed successfully.

An alternative to remove files without modifying the original is via the clone method directly. This is in the spirit of Serhiy's suggestion of filtering the content and not modifying the original. You can pass a list of filenames or fileinfos of the files to be included in the clone.
So that clone can be performed without decompressing and then recompressing the files in the archive I have added two functions write_compressed and read_compressed.

I have also attempted to address Serhiy's concern with respect to the tricky.zip - "hidden files" in between members of the archive. The clone method will by default retain any hidden files and maintain the same relative order in the archive. You can also elect to ignore the hidden files, and clone with just the files listed in the central directory.

I did have to modify the tricky.zip attached to this issue manually as the CRC of file two (with file three embedded) was incorrect - and would therefore fail testzip(). I'm not actually sure how one would create such an archive - but I think that it's valid according to the zip spec. I've actually included the modified version in the patch for a few of the tests.

I appreciate that this is a large-ish patch and may take some time to review - but as suggested in the comments - this wasn't as straight forward as is seems!

Look forward to your comments. 

The signatures of the main functions are described below:

remove(self, zinfo_or_arcname):

    Remove a member from the archive.

    Args:
      zinfo_or_arcname (ZipInfo, str) ZipInfo object or filename of the
        member.

    Raises:
      RuntimeError: If attempting to modify an Zip archive that is closed.
---

rename(self, zinfo_or_arcname, filename):

    Rename a member in the archive.

    Args:
      zinfo_or_arcname (ZipInfo, str): ZipInfo object or filename of the
        member.
      filename (str): the new name for the member.

    Raises:
      RuntimeError: If attempting to modify an Zip archive that is closed.


clone(self, file, filenames_or_infolist=None, ignore_hidden_files=False):

    Clone the a zip file using the given file (filename or filepointer).

    Args:
      file (File, str): file-like object or filename of file to write the
        new zip file to.
      filenames_or_infolist (list(str), list(ZipInfo), optional): list of
        members from this zip file to include in the new zip file.
      ignore_hidden_files (boolean): flag to indicate wether hidden files
        (data inbetween managed memebers of the archive) should be included.

    Returns:
        A new ZipFile object of the cloned zipfile open in append mode.

        If copying hidden files then clone will attempt to maintain the
        relative order between the files and members in the archive

commit(self):
     Commit any inline modifications (removal and rename) to the zip archive.

     This makes use of a temporary file to create a new zip archive with the
     required modifications and then replaces the original.

     This therefore requires write access to either the directory where the
     original zipfile lives, or to python's default tempfile location.
History
Date User Action Args
2015-04-09 15:22:01gamblsetrecipients: + gambl, loewis, rhettinger, terry.reedy, lars.gustaebel, christian.heimes, rossmclendon, eric.araujo, ubershmekel, victorlee129, sandro.tosi, chroipahtz, serhiy.storchaka, Arthur.Darcet, Dave Sawyer
2015-04-09 15:21:59gamblsetmessageid: <1428592919.73.0.650964401403.issue6818@psf.upfronthosting.co.za>
2015-04-09 15:21:59gambllinkissue6818 messages
2015-04-09 15:21:59gamblcreate