This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients aroussel, bckohan, gregory.p.smith, iritkatriel, vstinner
Date 2020-10-27.03:07:15
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1603768036.28.0.40612189058.issue42096@roundup.psfhosted.org>
In-reply-to
Content
ZipFile.open() checks the first 4 bytes:

            # Skip the file header:
            fheader = zef_file.read(sizeFileHeader)
            if len(fheader) != sizeFileHeader:
                raise BadZipFile("Truncated file header")
            fheader = struct.unpack(structFileHeader, fheader)
            if fheader[_FH_SIGNATURE] != stringFileHeader:
                raise BadZipFile("Bad magic number for file header")

But is_zipfile() does not. Code could be shared for that.

.gz and .zip files don't start by the same bytes, so this check should reduce the number of false positives.

--

You may have a look at the validate() methods of my old Hachoir project, they check a few bytes to check if a file looks a valid gzip or ZIP archive.

gzip:

https://github.com/vstinner/hachoir/blob/0f56883d7cea7082e784bfbdd2882e0f2dd2f34b/hachoir/parser/archive/gzip_parser.py#L51-L62

zip:

https://github.com/vstinner/hachoir/blob/0f56883d7cea7082e784bfbdd2882e0f2dd2f34b/hachoir/parser/archive/zip.py#L411-L430
History
Date User Action Args
2020-10-27 03:07:16vstinnersetrecipients: + vstinner, gregory.p.smith, iritkatriel, aroussel, bckohan
2020-10-27 03:07:16vstinnersetmessageid: <1603768036.28.0.40612189058.issue42096@roundup.psfhosted.org>
2020-10-27 03:07:16vstinnerlinkissue42096 messages
2020-10-27 03:07:15vstinnercreate