This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author serhiy.storchaka
Recipients Arthur.Darcet, christian.heimes, chroipahtz, eric.araujo, lars.gustaebel, loewis, rhettinger, rossmclendon, sandro.tosi, serhiy.storchaka, terry.reedy, ubershmekel, victorlee129
Date 2014-10-23.19:17:10
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1414091830.79.0.729893893444.issue6818@psf.upfronthosting.co.za>
In-reply-to
Content
I agree with Martin and Lars, this issue is not so easy at looks at first glance.

For ZIP files we should distinct two different operations.

1. Remove the entry from the central directory (and may be mark local file header as invalid if it is possible). This is easy, fast and safe, but it doesn't change the size of ZIP file.

2. Physical remove the content of the file from ZIP file. This is so easy as remove a line from the text file. In worst case it has linear complexity from the size of ZIP file.

2a. The safer way is to create temporary file in the same directory, copy the content of original ZIP file excluding deleted file, and then replace original ZIP file by modified copy. Be aware about file and parent directory permissions, owners, and disk space.

2b. The faster but less safe way is to "shift" the content of the ZIP file after deleted file by reading it and writing back in the same ZIP file at different position. This way is not safe because when something bad happen at writing, we can lost all data. And of course there are crafty ZIP files in which the order of files doesn't match the order in central directory or even files data overlap.

For performance may be we should implement (2) not as a method to remove single file, but as a method which takes the filter function and then left in the ZIP file only files for which it returns true.

Or may be implement (1) and then add a method which cleans up the ZIP archive be removing all files removed from the central directory. We should discuss alternatives.

And as for concrete patch, zipfile.remove.2.patch can read the content of all ZIP file in the memory. This is not appropriate, because ZIP file can be very large.
History
Date User Action Args
2014-10-23 19:17:10serhiy.storchakasetrecipients: + serhiy.storchaka, loewis, rhettinger, terry.reedy, lars.gustaebel, christian.heimes, rossmclendon, eric.araujo, ubershmekel, victorlee129, sandro.tosi, chroipahtz, Arthur.Darcet
2014-10-23 19:17:10serhiy.storchakasetmessageid: <1414091830.79.0.729893893444.issue6818@psf.upfronthosting.co.za>
2014-10-23 19:17:10serhiy.storchakalinkissue6818 messages
2014-10-23 19:17:10serhiy.storchakacreate