This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author serhiy.storchaka
Recipients Thomas.Waldmann, alanmcintyre, serhiy.storchaka, twouters
Date 2016-11-27.10:29:35
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1480242575.86.0.314296144584.issue28494@psf.upfronthosting.co.za>
In-reply-to
Content
No, checking the first bytes of the file is not appropriate option. zipfile should support the Python zip application format [1].

I see two options:

1. Make is_zipfile() more strict that the ZipFile constructor. The later supports ZIP files with a data past the comment or with truncated comments, but the former should reject them.

2. Make both is_zipfile() and the ZipFile constructor more robust. They should check not just the EOCD signature, but check the Zip64 end of central directory record (if exists) and the first central file header signature (if the ZIP file is not empty).

It may be that PDF files contain PK\005\006 not accidentally, but because they contain embedded ZIP files (I don't know if this is a case). In that circumstances is_zipfile() returning True is correct.

[1] https://docs.python.org/3/library/zipapp.html
History
Date User Action Args
2016-11-27 10:29:35serhiy.storchakasetrecipients: + serhiy.storchaka, twouters, alanmcintyre, Thomas.Waldmann
2016-11-27 10:29:35serhiy.storchakasetmessageid: <1480242575.86.0.314296144584.issue28494@psf.upfronthosting.co.za>
2016-11-27 10:29:35serhiy.storchakalinkissue28494 messages
2016-11-27 10:29:35serhiy.storchakacreate