This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author gregory.p.smith
Recipients aroussel, bckohan, gregory.p.smith, iritkatriel, serhiy.storchaka, vstinner
Date 2020-10-27.07:24:36
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1603783476.51.0.920154143622.issue42096@roundup.psfhosted.org>
In-reply-to
Content
ZipFile.open() is not the code for opening a zip file. :)

That's the code for opening a file embedded within an already constructed mode='r' archive as done the ZipFile.__init__() constructor.  By the time you've gotten to the open() method, you've loaded the entire unbounded in size central directory into memory as ZipInfo objects [constructor] and are checking signature of an individual file header you are attempting to read out of the archive.

Follow the ZipFile() constructor, it calls ZipFile._RealGetContents() which is the true start of parsing the archive.  https://github.com/python/cpython/blob/master/Lib/zipfile.py#L1317

Sure, more and more steps can be done.  But if you want to do that, you may as well just get rid of is_zipfile() entirely - a functions who's point is to be fast and not consume an amount of memory determined by the input data - and have people just call `zipfile.ZipFile(path_in_question, mode='r')` and live with the consequences of attempting to load and parse the whole thing.  If that doesn't raise an exception, it is more likely to be a zip file.  But that could still raise an exception when trying to open each of the files inside, so you'd need to iterate over this and open those and make sure they're valid.

is_zipfile() isn't a verify_zipfile_integrity() routine.  Just a quick best guess.

is_zipfile() cannot be perfect and is not intended to be.  There is always going to be yet another thing it _could_ try.  It isn't worth chasing the impossible goal and making it not be fast.

Just update the is_zipfile() docs.
History
Date User Action Args
2020-10-27 07:24:36gregory.p.smithsetrecipients: + gregory.p.smith, vstinner, serhiy.storchaka, iritkatriel, aroussel, bckohan
2020-10-27 07:24:36gregory.p.smithsetmessageid: <1603783476.51.0.920154143622.issue42096@roundup.psfhosted.org>
2020-10-27 07:24:36gregory.p.smithlinkissue42096 messages
2020-10-27 07:24:36gregory.p.smithcreate