Title: zipfile.is_zipfile wrongly recognizes non-zip as zip
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.3, Python 2.7
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: bkabrda, r.david.murray, serhiy.storchaka
Priority: normal Keywords:

Created on 2012-12-20 08:30 by bkabrda, last changed 2012-12-21 09:47 by serhiy.storchaka. This issue is now closed.

Messages (10)
msg177804 - (view) Author: Bohuslav "Slavek" Kabrda (bkabrda) * Date: 2012-12-20 08:30
When I use zipfile.is_zipfile on file fastjar (sample uploaded at [1]) from libgcj, I get True, while I should get False (reproducible with fastjar from libgcj 4.7.2 on Fedora 18).
This is caused by stringEndArchive string being present in the file, but the file still isn't zip. Would it be possible to add some further checks to eliminate this kind of errors? I'd like to submit a patch but I'm not sure what to check for, maybe some other constants mentioned in the ZIP format definition?

Thanks a lot.

msg177806 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-20 08:59
You can upload a sample file on bug tracker.

Actually jar files are just zip files (with some limitation and special files). zipfile.is_zipfile should return True on a jar file.
msg177807 - (view) Author: Bohuslav "Slavek" Kabrda (bkabrda) * Date: 2012-12-20 09:04
Oh, sorry, I will upload it on the bugtracker next time.

I know that jar files are zip files, but this is not a jar (although it has "jar" in file). This is a binary.
msg177830 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-12-20 15:30
I'm imagining that it creates jar files, and thus has the signature as a constant.  The is_zipfile check is much more complicated than just looking for that string, though, so what is going on must be even more perverse than that.  It would be interesting to know if other zip tools have an issue with it, although be careful when comparing, since is_zipfile only does the initial check, whereas running another unzip tool against it may produce an error, but only later in the process (after the zip tool has decided it is a zip file and tries to process it).
msg177831 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-20 16:08
$ zipinfo fastjar
Archive:  fastjar
Zip file size: 47664 bytes, number of entries: 31883

     Zipfile is disk 33807 of a multi-disk archive, and this is not the disk on
     which the central zipfile directory begins (disk 190).

I.e. zipinfo detects fastjar as a zip file, but fails to read a contents (`unzip -l fastjar` and `python -m zipinfo -l fastjar` fail too). The file contains an obviously incorrect values in the control structures.
msg177834 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-12-20 16:21
So, it looks like this is not a bug in Python, just a weirdness of fastjar.  Or, if you prefer, a bug in fastjar (they could assemble the signature instead of coding it as a single constant).
msg177835 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-20 16:25
It's rather a bug in the ZIP format design.
msg177836 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-12-20 16:51
Well, yes, but that ship has already sunk :)
msg177866 - (view) Author: Bohuslav "Slavek" Kabrda (bkabrda) * Date: 2012-12-21 07:17
Tried is_zipfile on /usr/bin/zip and it returns True, too, so it seems that this is a more general problem for zip-handling binaries... Anyway, thank you both, I agree that there is not much that can be done here.
msg177872 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-21 09:47
zipinfo detects /usr/bin/zip as a zip archive too.
Date User Action Args
2012-12-21 09:47:43serhiy.storchakasetmessages: + msg177872
2012-12-21 07:17:55bkabrdasetmessages: + msg177866
2012-12-20 16:51:53r.david.murraysetmessages: + msg177836
2012-12-20 16:25:38serhiy.storchakasetmessages: + msg177835
2012-12-20 16:21:52r.david.murraysetstatus: open -> closed
type: behavior
messages: + msg177834

resolution: not a bug
stage: resolved
2012-12-20 16:08:08serhiy.storchakasetmessages: + msg177831
2012-12-20 15:30:30r.david.murraysetnosy: + r.david.murray
messages: + msg177830
2012-12-20 09:04:35bkabrdasetmessages: + msg177807
2012-12-20 08:59:51serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg177806
2012-12-20 08:30:11bkabrdacreate