Issue14315
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012-03-15 11:15 by pleed, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
bla.apk | pleed, 2012-03-15 11:15 | APK File (Zip Format) that unzips but crashes when opened with zipfile | ||
fix_zipfile_extra.patch | serhiy.storchaka, 2012-03-17 09:52 | FIX: Ignore extra padding bytes in zip file | review | |
zipfile_extra_test.patch | serhiy.storchaka, 2012-05-13 17:55 | Test for zipfile with padding. | review |
Messages (31) | |||
---|---|---|---|
msg155874 - (view) | Author: pleed (pleed) | Date: 2012-03-15 11:15 | |
zipfile.ZipFile("bla.apk") crashes, but should not since other tools work perfectly with this file. Python 2.7.2+ (default, Oct 4 2011, 20:06:09) [GCC 4.6.1] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import zipfile >>> zipfile.ZipFile("bla.apk") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/zipfile.py", line 710, in __init__ self._GetContents() File "/usr/lib/python2.7/zipfile.py", line 744, in _GetContents self._RealGetContents() File "/usr/lib/python2.7/zipfile.py", line 803, in _RealGetContents x._decodeExtra() File "/usr/lib/python2.7/zipfile.py", line 369, in _decodeExtra tp, ln = unpack('<HH', extra[:4]) struct.error: unpack requires a string argument of length 4 |
|||
msg156089 - (view) | Author: Terry J. Reedy (terry.reedy) * | Date: 2012-03-16 21:53 | |
.apk is primarily used for Android Package files https://en.wikipedia.org/wiki/APK_%28file_format%29 which are zipped JAR-based archives. By changing .apk to .zip, I can open the file in Win7 explorer, which has zip built in. A 'crash' is a segfault or equvalent. A traceback is a graceful shutdown and not a crash. I reproduced the traceback on 3.3.0a1. It appears to say that there are not enough bytes. import struct struct.unpack('<HH', b'abc') #produces the same error struct.error: unpack requires a bytes object of length 4 The 3.x message is clearer in that the second arg is a bytestring, not a char string. It would not hurt it it said how many bytes it did get. However, the behavior that leads to the message baffles me. The _decodeExtra code looks like this, with some prints that I added. def _decodeExtra(self): # Try to decode the extra field. extra = self.extra unpack = struct.unpack while extra: print(extra) tp, ln = unpack('<HH', extra[:4]) print(tp,ln) if tp == 1: pass # substitute for actual code not called extra = extra[ln+4:] print(extra) The output is b'\xfe\xca\x00\x00' 51966 0 b'' b'\x00' Traceback (most recent call last): ... So it looks like extra was properly formatted and properly snipped to 0 bytes, but somehow gained a null byte before being tested again. 1 byte is not 4 bytes and hence the error, but how? |
|||
msg156110 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2012-03-17 00:13 | |
This is print from *other* _decodeExtra call. Add print('start') at start of _decodeExtra and you will see it. start b'\xfe\xca\x00\x00' 51966 0 b'' start start b'\x00' |
|||
msg156135 - (view) | Author: Terry J. Reedy (terry.reedy) * | Date: 2012-03-17 04:18 | |
Thanks. There must be multiple extra fields allowed. This output suggests that a) zipfile mis-parses the extra field in some peculiar case (which I doubt) or b) the file either gives a bad field length or third extra field is mal-formed and that other tools catch the error and ignore the bad extra field whereas zipfile does not. |
|||
msg156147 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2012-03-17 09:52 | |
This is padding. Other implimentations just ignore it and Python must. |
|||
msg159584 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2012-04-29 06:44 | |
Anyone can look at the patch? This same problem reported in http://bugs.python.org/msg73317 . |
|||
msg159602 - (view) | Author: Yuval Greenfield (ubershmekel) * | Date: 2012-04-29 11:45 | |
If we're modifying zipfile, please consider adding the reviewed patch for ZipFile.remove at http://bugs.python.org/issue6818 |
|||
msg160538 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2012-05-13 17:58 | |
Serhiy: can you please provide a unit test? The OP's test case is unsuitable because of both size and copyright. As for the actual issue: the extra data (type 0xcafe) is apparently added by jar tools to make the jar file executable: http://www.androidadb.com/source/commons-compress-1.1-src/src/main/java/org/apache/commons/compress/archivers/zip/JarMarker.java.html ISTM that the extra 0 byte is in clear violation of the ZIP specification, which says that there always be type+length headers, followed by length data. So if the length is 0, there ought not to be any data. I don't buy the "padding" argument, since a) the spec is silent on any padding, and b) for alignment reasons, it's not plausible to add any padding, since it is aligned fine without this padding, and unaligned with the padding. I can sympathize with the desire to accept the zipfile, anyway (i.e. despite it being broken). At the same time, I also think that Python should not let errors pass silently. So as a way out, I propose that the ZipFile class gains a "strict" attribute, indicating whether "acceptable" violations of the spec are ignored or reported as exceptions. So -1 on the patch as it stands (but thanks for the analysis). |
|||
msg160539 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2012-05-13 18:06 | |
> Serhiy: can you please provide a unit test? The OP's test case is unsuitable because of both size and copyright. I provided the test several minutes ago. |
|||
msg160541 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2012-05-13 18:27 | |
> I can sympathize with the desire to accept the zipfile, anyway (i.e. despite it being broken). At the same time, I also think that Python should not let errors pass silently. I do not know other implementation of ZIP, which output an error or a warning on such files. The fact is that such files exist in the wild world. > So as a way out, I propose that the ZipFile class gains a "strict" attribute, indicating whether "acceptable" violations of the spec are ignored or reported as exceptions. It is a not easy task (and unnecessary, I suppose). Now zipfile ignores many errors (for example, it completely ignores local file headers). |
|||
msg160550 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2012-05-13 19:03 | |
>> So as a way out, I propose that the ZipFile class gains a "strict" >> attribute, indicating whether "acceptable" violations of the spec >> are ignored or reported as exceptions. > > It is a not easy task (and unnecessary, I suppose). Now zipfile > ignores many errors (for example, it completely ignores local file > headers). Why do you consider this difficult? Just add a "strict" flag, make it false by default, and raise an error if there is any unparsable extra data. I'm not asking that you implement a complete validity check for all aspects of the zip spec - just that there is a mode where it continues to raise an exception as it currently does (but with a better exception). |
|||
msg160553 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2012-05-13 19:18 | |
> Just add a "strict" flag, make > it false by default, and raise an error if there is any unparsable > extra data. If the module does not actually checks the correctness of all (including local file headers, which now it ignores), it will be a false promise. And I don't understand who these checks are needed at all. Note that the parameter "strict" in htmlparser is considered to be obsolete. |
|||
msg160590 - (view) | Author: Meador Inge (meador.inge) * | Date: 2012-05-14 01:20 | |
This is definitely *not* a padding issue. The problem is that one of the records has an extra field of length 1: $ zipinfo -v bla.apk ... Central directory entry #3: --------------------------- There are an extra 16 bytes preceding this file. res/raw/ice.png offset of local header from start of archive: 1190 (000004A6h) bytes file system or operating system of origin: MS-DOS, OS/2 or NT FAT version of encoding software: 1.0 minimum file system compatibility required: MS-DOS, OS/2 or NT FAT minimum software version required to extract: 1.0 compression method: none (stored) file security status: not encrypted extended local header: no file last modified on (DOS date/time): 2012 Jan 13 15:30:08 32-bit CRC value (hex): f969abab compressed size: 26250 bytes uncompressed size: 26250 bytes length of filename: 15 characters length of extra field: 1 bytes length of file comment: 0 characters disk number on which file begins: disk 1 apparent file type: binary non-MSDOS external file attributes: 000000 hex MS-DOS file attributes (00 hex): none The central-directory extra field contains: There is no file comment. Our implementation can't deal with that because of the line that Terry pointed out: tp, ln = unpack('<HH', extra[:4]) As Martin pointed out, the standard says that things must be in multiples of 4-bytes. So the record is non-portable. |
|||
msg160591 - (view) | Author: Meador Inge (meador.inge) * | Date: 2012-05-14 01:52 | |
FWIW, I think it will OK to just ignore extra fields that we can't interpret as according to the standard. The only one we currently care about is the "Zip64 extended information extra field". Also, other programs (including the Info-Zip tools) seem to mostly ignore these fields. |
|||
msg160645 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2012-05-14 17:31 | |
> This is definitely *not* a padding issue. This is definitely a padding issue. All uncompressed files are located so that the data starts with a 4-byte boundary (1190+30+15+1=1236, 27486 +30+17+3=27536, etc). This is, probably, allows the use of mmap for the resources. > As Martin pointed out, the standard says that things must be in > multiples of 4-bytes. More precisely, the extra field must have at least 4-bytes length to fit a header. The standard is insufficiently defined in terms of what would happen if the rest of the field is less than 4 bytes (this is hidden behind by ellipsis). > So the record is non-portable. De jure the record is non-portable. De facto the record is portable (many other tools supports it). But even if it does not portable, we are dealing with the expansion of the zip format, which is very easy support for reading. |
|||
msg160655 - (view) | Author: Meador Inge (meador.inge) * | Date: 2012-05-14 18:50 | |
On Mon, May 14, 2012 at 12:31 PM, Serhiy Storchaka <report@bugs.python.org> wrote: > Serhiy Storchaka <storchaka@gmail.com> added the comment: > >> This is definitely *not* a padding issue. > > This is definitely a padding issue. All uncompressed files are located > so that the data starts with a 4-byte boundary (1190+30+15+1=1236, 27486 > +30+17+3=27536, etc). This is, probably, allows the use of mmap for the > resources. So? Someone may be using the extra fields to pad things, but for the purpose of this issue that is completely irrelevant. We only care about the proper structure of the file. Besides, without clear reference to source code or a specification any hypothesis of padding is hearsay. Did you look at the decoding I sent? The extra length field length is clearly reported as a size of one and the contents of the extra field are set to '\x00'. The extra field of size one is the actual problem, not padding. >> As Martin pointed out, the standard says that things must be in >> multiples of 4-bytes. > > More precisely, the extra field must have at least 4-bytes length to fit > a header. The standard is insufficiently defined in terms of what would > happen if the rest of the field is less than 4 bytes (this is hidden > behind by ellipsis). How is it insufficiently defined at all? It says [1]: In order to allow different programs and different types of information to be stored in the 'extra' field in .ZIP files, the following structure should be used for all programs storing data in this field: header1+data1 + header2+data2 . . . Each header should consist of: Header ID - 2 bytes Data Size - 2 bytes Note: all fields stored in Intel low-byte/high-byte order. The ellipsis is just a standard convention for indicating a repeating pattern. Extra fields which are not multiples of four bytes are not properly formed. >> So the record is non-portable. > > De jure the record is non-portable. De facto the record is portable > (many other tools supports it). But even if it does not portable, we are > dealing with the expansion of the zip format, which is very easy support > for reading. Like I said before, I am all for dropping extra fields we can not interpret. However, let us be clear that with respect to the standard we are implementing that zip files constructed like this are ill-formed. [1] http://www.pkware.com/documents/casestudies/APPNOTE.TXT |
|||
msg160663 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2012-05-14 20:36 | |
> This is definitely a padding issue. All uncompressed files are located > so that the data starts with a 4-byte boundary (1190+30+15+1=1236, 27486 > +30+17+3=27536, etc). This is, probably, allows the use of mmap for the > resources. That can't possibly be the reason. mmap requires 4k (4096) alignment (on x86; more than that on SPARC). Perhaps there is a reason for 4-alignment - but it can't be mmap- |
|||
msg160664 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2012-05-14 20:38 | |
> Like I said before, I am all for dropping extra fields we can not > interpret. However, > let us be clear that with respect to the standard we are implementing > that zip files > constructed like this are ill-formed. I would still like to see the zipfile module to be able to detect ill-formed zipfiles - certainly not by default. This can be useful to check zipfiles that somebody produces. As the Python check wouldn't be complete, one would certainly need to apply other checks as well. |
|||
msg160673 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2012-05-14 22:22 | |
> That can't possibly be the reason. mmap requires 4k (4096) alignment (on > x86; more than that on SPARC). This may be the reason to mmap the entire file and then read aligned binary data. |
|||
msg160684 - (view) | Author: Meador Inge (meador.inge) * | Date: 2012-05-15 05:24 | |
> I would still like to see the zipfile module to be able to detect > ill-formed zipfiles - certainly not by default. Agreed. We already do produce some informational messages via the documented 'ZipFile.debug' attribute. Just checking that flag and producing a message to stdout like we do in other cases is one option. Another, as you suggested already, is to provide a strict flag that indicates an exception should be thrown upon encountering an ill-formed file. I like the idea of reusing what we already have. Although, when it fails, it doesn't fail loudly. |
|||
msg163743 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2012-06-24 07:26 | |
Any chance to commit the patch before final feature freeze? |
|||
msg163749 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2012-06-24 07:41 | |
Strictly speaking, the feature freeze has passed already, so you'll need the time machine. However, this is not the *final* feature freeze - there will be a 3.4 release of Python also. |
|||
msg163813 - (view) | Author: Terry J. Reedy (terry.reedy) * | Date: 2012-06-24 17:48 | |
This issue is currently marked as a bugfix for current releases also. Since the proposal seems to be to introduce a new flag, should it be re-classified as enhancement (for 3.4)? |
|||
msg163831 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2012-06-24 19:53 | |
I think this is a bugfix. I also don't think that "strict" attribute makes sense if we can't guarantee strong validity (see issue15114). |
|||
msg163835 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2012-06-24 20:01 | |
I still don't see the bug; the module is behaving correctly - it is the zipfile that is buggy. Supporting this specific kind of buggy zipfiles is a new feature. For reasons discussed, I also disagree with the proposed patch (file24902). If this helps to advance this issue, I can formally reject it if desired. |
|||
msg163855 - (view) | Author: Terry J. Reedy (terry.reedy) * | Date: 2012-06-24 20:42 | |
Given that the overall trend in the stdlib seems to be to be more lenient in parsing, as with htmlparser, and that the not-really-strict 'strict' parameter for that is being removed, (#15114) I do not understand the objection to accepting one more malformation in zip files and the desire to add a not-really-strict 'strict' parameter here. Perhaps this philosophical issue should be discussed on pydev. |
|||
msg163857 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2012-06-24 20:48 | |
> I still don't see the bug; the module is behaving correctly - it is the zipfile that is buggy. Supporting this specific kind of buggy zipfiles is a new feature. I agree. The module is behaving correctly. |
|||
msg187177 - (view) | Author: Void (void.sender) | Date: 2013-04-17 16:14 | |
> FWIW, I think it will OK to just ignore extra fields that we can't > interpret as according to the standard. The only one we currently > care about is the "Zip64 extended information extra field". Also, > other programs (including the Info-Zip tools) seem to mostly ignore > these fields. Yes, please. > The ellipsis is just a standard convention for indicating a repeating > pattern. Extra fields which are not multiples of four bytes are not > properly formed. . . . [;pause;] Totally agree. But at the very least it should check the size and raise a proper exception (e.g. BadZipFile). FWIW, the code is already avoiding proper bounds checking using the built-in behavior of slicing -- tp, ln = unpack('<HH', extra[:4]) ... extra = extra[ln+4:] -- which would otherwise throw exceptions if slice didn't innately and silently fail (due to an index being out-of-bounds). > I still don't see the bug; the module is behaving correctly - it is the zipfile that is buggy. Supporting this specific kind of buggy zipfiles is a new feature. You realize that you are calling user-controlled data buggy, right? |
|||
msg219377 - (view) | Author: Roundup Robot (python-dev) | Date: 2014-05-30 06:43 | |
New changeset 6dd5e9556a60 by Gregory P. Smith in branch '2.7': Fix issue #14315: The zipfile module now ignores extra fields in the central http://hg.python.org/cpython/rev/6dd5e9556a60 New changeset 33843896ce4e by Gregory P. Smith in branch '3.4': Fix issue #14315: The zipfile module now ignores extra fields in the central http://hg.python.org/cpython/rev/33843896ce4e New changeset 89be1419472c by Gregory P. Smith in branch 'default': Fix issue #14315: The zipfile module now ignores extra fields in the central http://hg.python.org/cpython/rev/89be1419472c |
|||
msg219378 - (view) | Author: Gregory P. Smith (gregory.p.smith) * | Date: 2014-05-30 06:59 | |
This was never an enhancement. zipfile was failing to properly deal with real world data that other zip file tools on the planet were perfectly happy to deal with. That's a bug. Fixed. Practicality beats purity. Be lenient in what you accept. The zipfile module is not meant to be a zip "standard" validation tool. The other discussions in this bug about adding actual features such as a "strict" mode flag could be done but should really go in feature requests of their own. The zipfile module is... not a wonderful body of code (understatement). Example: It is still quite possible to get non zipfile.BadZipFile exceptions such as struct.error based on various arrangements of input data. |
|||
msg323179 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2018-08-06 06:43 | |
Just for information: the padding in APK files is added by the zipalign utility [1]. [1] https://developer.android.com/studio/command-line/zipalign |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:28 | admin | set | github: 58523 |
2018-08-06 06:43:30 | serhiy.storchaka | set | messages: + msg323179 |
2014-05-30 06:59:38 | gregory.p.smith | set | status: open -> closed resolution: fixed assignee: gregory.p.smith stage: test needed -> resolved title: zipfile.ZipFile() unable to open zip File -> zipfile.ZipFile() unable to open zip File with a short extra header nosy: + gregory.p.smith versions: + Python 2.7, Python 3.4, Python 3.5, - Python 3.3 messages: + msg219378 type: enhancement -> behavior |
2014-05-30 06:43:02 | python-dev | set | nosy:
+ python-dev messages: + msg219377 |
2013-04-17 16:14:53 | void.sender | set | nosy:
+ void.sender messages: + msg187177 |
2012-06-24 20:48:24 | serhiy.storchaka | set | messages: + msg163857 |
2012-06-24 20:42:07 | terry.reedy | set | messages:
+ msg163855 versions: - Python 2.7, Python 3.2 |
2012-06-24 20:01:06 | loewis | set | type: behavior -> enhancement messages: + msg163835 |
2012-06-24 19:53:15 | serhiy.storchaka | set | messages: + msg163831 |
2012-06-24 17:48:35 | terry.reedy | set | messages: + msg163813 |
2012-06-24 07:41:51 | loewis | set | messages: + msg163749 |
2012-06-24 07:26:27 | serhiy.storchaka | set | messages: + msg163743 |
2012-05-15 05:24:05 | meador.inge | set | messages: + msg160684 |
2012-05-14 22:22:14 | serhiy.storchaka | set | messages: + msg160673 |
2012-05-14 20:38:42 | loewis | set | messages: + msg160664 |
2012-05-14 20:36:35 | loewis | set | messages: + msg160663 |
2012-05-14 18:50:40 | meador.inge | set | messages: + msg160655 |
2012-05-14 17:31:14 | serhiy.storchaka | set | messages: + msg160645 |
2012-05-14 01:52:21 | meador.inge | set | messages: + msg160591 |
2012-05-14 01:20:40 | meador.inge | set | messages: + msg160590 |
2012-05-13 19:18:00 | serhiy.storchaka | set | messages: + msg160553 |
2012-05-13 19:03:38 | loewis | set | messages: + msg160550 |
2012-05-13 18:27:06 | serhiy.storchaka | set | messages: + msg160541 |
2012-05-13 18:06:52 | serhiy.storchaka | set | messages: + msg160539 |
2012-05-13 17:58:53 | loewis | set | nosy:
+ loewis messages: + msg160538 |
2012-05-13 17:55:27 | serhiy.storchaka | set | files: + zipfile_extra_test.patch |
2012-05-13 15:30:04 | eric.araujo | set | stage: test needed |
2012-04-29 11:45:32 | ubershmekel | set | nosy:
+ ubershmekel messages: + msg159602 |
2012-04-29 06:44:54 | serhiy.storchaka | set | messages: + msg159584 |
2012-03-17 09:52:43 | serhiy.storchaka | set | files:
+ fix_zipfile_extra.patch keywords: + patch messages: + msg156147 |
2012-03-17 04:18:35 | terry.reedy | set | messages: + msg156135 |
2012-03-17 00:13:47 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg156110 |
2012-03-16 23:35:27 | eric.araujo | set | nosy:
+ eric.araujo |
2012-03-16 21:53:49 | terry.reedy | set | versions:
+ Python 3.2, Python 3.3 nosy: + mark.dickinson, alanmcintyre, terry.reedy, meador.inge messages: + msg156089 components: + Library (Lib), - None type: crash -> behavior |
2012-03-15 11:15:45 | pleed | create |