classification
Title: ZipExtFile in zipfile can be seekable
Type: enhancement Stage: commit review
Components: Library (Lib) Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: Iridium.Yang, dkessel, gregory.p.smith, jae, jjolly, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2014-11-20 15:20 by Iridium.Yang, last changed 2018-01-30 08:53 by gregory.p.smith. This issue is now closed.

Files
File name Uploaded Description Edit
zipfile.diff Iridium.Yang, 2014-11-20 15:20
ziz.py jjolly, 2017-12-22 13:44 zip-in-zip test program
Pull Requests
URL Status Linked Edit
PR 4966 merged python-dev, 2017-12-21 18:29
Messages (7)
msg231438 - (view) Author: Iridium Yang (Iridium.Yang) Date: 2014-11-20 15:20
The ZipExtFile class in zipfile module does not provide a seek method like GzipFile. As a result, it is hard to manipulate files without extract all the content.
For example, a very large tar file compressed with zip. The TarFile module can operate on file object, but need seek method. So the ZipExtFile instance return from ZipFile can not passed into TarFile.
This may seem strange but I encounter this on Samsung firmware.

In fact, the seek method in GzipFile or someother compressed format can be implemented in zipfile very easily. Here is my naive modification (nearly same as in GzipFile)
msg231446 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-20 18:20
I'm -1 on adding the seek method with linear complexity. This looks as attractive nonsense to me. It would be better just make TarFile working with non-seekable streams.
msg231472 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-21 14:08
Actually TarFile already works with non-seekable streams. Use TarFile.open() with mode='r|*' or like.

On other hand I'm not against the make non-compressed ZipExtFile seekable. It can be helpful in case when ZIP file is used just as a container for other files.
msg256683 - (view) Author: Daniel Kessel (dkessel) Date: 2015-12-18 14:24
It would be great to have the ZipFileExt class seekable.
This would help in implementing features in other projects.

For example, pydicom would gain the ability to read from ZIP files, as mentioned in https://github.com/darcymason/pydicom/issues/219
msg268764 - (view) Author: Jürgen A. Erhard (jae) Date: 2016-06-18 05:02
To add to this (without looking at the patch): I just to my astonishment learned that a ZipExtFile doesn't even support tell().  I can understand the seek being nontrivial... but the tell?  It's a bytestream, and there is (isn't there?) a clear definition of what next byte a read(1) would deliver.  It should be trivial to keep track of the (only ever increasing) file position.
msg308935 - (view) Author: John Jolly (jjolly) * Date: 2017-12-22 13:44
Please be gentle, this is my first submission to python.

The use case for me was a recursive zip-within-a-zip situation. I wanted to allow the creation of a zipfile.ZipFile from an existing zipfile.ZipExtFile, but the lack of seek prevented this.

I simply treated forward seeks as a read, and backward seeks as a reset-and-read. The reset was the tricky part as it required restoring several original values such as the remaining compressed length, remaining data length, and the running crc32.

I pushed this into the latest upstream branch, but as I am testing this in v3.4 it should be easy to backport if necessary (I suspect not).

I based my fix on a little program that I wrote to test the feasibility of this idea. I am attaching that test program here.
msg311254 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2018-01-30 08:51
New changeset 066df4fd454d6ff9be66e80b2a65995b10af174f by Gregory P. Smith (John Jolly) in branch 'master':
bpo-22908: Add seek and tell functionality to ZipExtFile (GH-4966)
https://github.com/python/cpython/commit/066df4fd454d6ff9be66e80b2a65995b10af174f
History
Date User Action Args
2018-01-30 08:53:26gregory.p.smithsetstatus: open -> closed
stage: patch review -> commit review
resolution: fixed
versions: + Python 3.7, - Python 3.5
2018-01-30 08:51:42gregory.p.smithsetmessages: + msg311254
2018-01-30 08:45:02gregory.p.smithsetassignee: serhiy.storchaka -> gregory.p.smith

nosy: + gregory.p.smith
2017-12-22 13:44:50jjollysetfiles: + ziz.py
nosy: + jjolly
messages: + msg308935

2017-12-21 18:29:06python-devsetstage: needs patch -> patch review
pull_requests: + pull_request4858
2016-06-18 05:02:59jaesetnosy: + jae
messages: + msg268764
2015-12-18 14:24:16dkesselsetnosy: + dkessel
messages: + msg256683
2014-11-21 14:08:37serhiy.storchakasetassignee: serhiy.storchaka
stage: needs patch
messages: + msg231472
versions: + Python 3.5, - Python 3.4
2014-11-20 18:20:45serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg231446
2014-11-20 15:20:43Iridium.Yangcreate