Author takluyver
Recipients takluyver
Date 2016-01-07.15:10:49
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1452179451.89.0.468945044395.issue26039@psf.upfronthosting.co.za>
In-reply-to
Content
I was working recently on some code writing zip files, and needed a bit more flexibility than the interface of zipfile provides. To do what I wanted, I ended up copying the entirety of ZipFile.write() into my own code with modifications. That's ugly, and to make it worse, the code I copied from the Python 3.4 stdlib didn't work properly with Python 3.5.

Specifically, I want to:

* Override the timestamp, which write() unconditionally takes from the mtime of the file.
* Do extra processing on the data (e.g. calculating a checksum) as it is read. Being able to pass a file-like object in to be read, rather than just a filename, would work for this.

I could do both by using ZipFile.writestr(), but then the entire file must be loaded into memory at once, which I would much rather avoid.

The patch attached is one proposal which would make it easier to do what I want, but it's intended as a starting point for discussion. I'm not particularly attached to the API.

- Should this be a new method, or new functionality for either the write or writestr method? I favour a new method, because the existing methods are already quite long, and I think it's nicer to break write() up into two parts like this.

- If a new method, what should it be called? I opted for writefile()

- What should its signature be? It's currently writefile(file_obj, zinfo, force_zip64=False), but I can see an argument for aligning it with writestr(zinfo_or_arcname, data, compress_type=None).

- What to do about ZIP64: the code has to decide whether to use ZIP64 format before writing, because it affects the header size, but we may not know the length before we have read it all. I've used a force_zip64 boolean parameter, and documented that people should pass it True if a file of unknown size could exceed 2 GiB.

- Are there other points where it could be made more flexible while we're thinking about this? I'd quite like to split out the code for writing a directory entry to a separate method ('writedir'?). I'd also like to allow datetime objects to be passed in for timestamps as well as the 6-tuples currently expected.
History
Date User Action Args
2016-01-07 15:10:51takluyversetrecipients: + takluyver
2016-01-07 15:10:51takluyversetmessageid: <1452179451.89.0.468945044395.issue26039@psf.upfronthosting.co.za>
2016-01-07 15:10:51takluyverlinkissue26039 messages
2016-01-07 15:10:50takluyvercreate