This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: zipfile is intolerant of extra bytes
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: third party
Dependencies: Superseder:
Assigned To: Nosy List: Devin Fisher, alanmcintyre, ronaldoussoren, serhiy.storchaka, twouters
Priority: normal Keywords:

Created on 2015-07-22 17:16 by Devin Fisher, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
bad.jar Devin Fisher, 2015-07-22 17:16
Messages (6)
msg247137 - (view) Author: Devin Fisher (Devin Fisher) Date: 2015-07-22 17:16
Not sure if this is a bug. The attached jar file is malformed.  Unzip (6.00) says the following about the malformedness of the jar file:

unzip -tqq bad.jar 
com/pixelmed/apps/DoseUtility$OurSourceDatabaseTreeBrowser$1.class bad extra-field entry:
      EF block length (43230 bytes) exceeds remaining EF data (10 bytes)


But unzip (6.00) and my GNOME Archive Manager (3.16.3) are able to open and extract the file without issue. 

So I'm wondering if zipfile is too strict?

Anyway, when trying to interact with attached jar file I get the following error.

Code:
import zipfile
if __name__ == "__main__":
    path = 'bad.jar'
    file = zipfile.ZipFile(path)

Output:
Traceback (most recent call last):
  File "/home/devin.fisher/sandboxes/feeder.v61_release.dev/temp/bug.py", line 4, in <module>
    file = zipfile.ZipFile(path)
  File "/usr/lib64/python3.4/zipfile.py", line 937, in __init__
    self._RealGetContents()
  File "/usr/lib64/python3.4/zipfile.py", line 1034, in _RealGetContents
    x._decodeExtra()
  File "/usr/lib64/python3.4/zipfile.py", line 418, in _decodeExtra
    counts = unpack('<QQQ', extra[4:28])
struct.error: unpack requires a bytes object of length 24
msg247177 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2015-07-23 06:47
The actual exception you're getting is IMHO a bug, it should have been a zipfile.BadZipfile exception.

That said, it might be useful to teach zipfile to optionally be a little more forgiving about errors like this when reading a zipfile. I'm at best -0 on that in general, in this case we could get away with restructuring the code a little: a number of ZipInfo attributes are set from "extra" data when the extra data is present and the value in the normal header max-ed out. The code could be changed to not even try to decode the "extra" data when the values in the normal header aren't max-ed out.

BTW. The RuntimeError that's raised in _decodeExtra should also be a BadZipfile exception.
msg247181 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-07-23 10:50
The extra fields data is b"UT\x05\x00\x07<\xa3\xaa#\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\x00\xeb\x93\x91'\xef\xbf\xbd\xef\xbf\xbd\x01\x00\xde\xa8\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\x01\x00\xde\xa8\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\x01". It contains following fields:

* Extended Timestamp (0x5455), length = 5, data = b'\x07<\xa3\xaa#'. It looks correct.

* NTFS Extra Field (0x000a), length = 32, data = b"\x00\x00\x00\x00\x01\x00\x18\x00\x00\xeb\x93\x91'\xef\xbf\xbd\xef\xbf\xbd\x01\x00\xde\xa8\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd". Looks incorrect, but is ignored by ZipFile.

* Zip64 Extended Information Extra Field (0x0001), length = 
43230, data = b'\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\x01'. Definitely incorrect.
msg247182 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-07-23 10:50
zipinfo -v emits an error:

Central directory entry #48:
---------------------------

  com/pixelmed/apps/DoseUtility$OurSourceDatabaseTreeBrowser$1.class

  offset of local header from start of archive:   2892
                                                  (0000000000000B4Ch) bytes
  file system or operating system of origin:      MS-DOS, OS/2 or NT FAT
  version of encoding software:                   2.0
  minimum file system compatibility required:     MS-DOS, OS/2 or NT FAT
  minimum software version required to extract:   2.0
  compression method:                             deflated
  compression sub-type (deflation):               normal
  file security status:                           not encrypted
  extended local header:                          no
  file last modified on (DOS date/time):          2011 Feb 15 19:46:54
  file last modified on (UT extra field modtime): 1988 Dec 17 21:11:08 local
  file last modified on (UT extra field modtime): 1988 Dec 17 18:11:08 UTC
  32-bit CRC value (hex):                         327bc88b
  compressed size:                                622 bytes
  uncompressed size:                              1304 bytes
  length of filename:                             66 characters
  length of extra field:                          59 bytes
  length of file comment:                         0 characters
  disk number on which file begins:               disk 1
  apparent file type:                             binary
  non-MSDOS external file attributes:             000000 hex
  MS-DOS file attributes (00 hex):                none

  The central-directory extra field contains:
  - A subfield with ID 0x5455 (universal time) and 5 data bytes.
    The local extra field has UTC/GMT modification/access/creation times.
  - A subfield with ID 0x000a (PKWARE Win32) and 32 data bytes.  The first
    20 are:   00 00 00 00 01 00 18 00 00 eb 93 91 27 ef bf bd ef bf bd 01.

  error: EF data block (type 0x0001) size 43230 exceeds remaining extra field
         space 10; block length has been truncated.

  - A subfield with ID 0x0001 (PKWARE 64-bit sizes) and 10 data bytes:
    ef bf bd ef bf bd ef bf bd 01.

  There is no file comment.
msg247190 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-07-23 12:24
Opened issue24693 about exception type.
msg355674 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-10-29 20:16
Interesting, but all extra after the Extended Timestamp field is a UTF-8 encoded text:

b"\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\x00\xeb\x93\x91'\xef\xbf\xbd\xef\xbf\xbd\x01\x00\xde\xa8\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\x01\x00\xde\xa8\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\x01" == "\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\x00\ub4d1'\ufffd\ufffd\x01\x00\u07a8\ufffd\ufffd\ufffd\x01\x00\u07a8\ufffd\ufffd\ufffd\x01".encode()

The chance that random binary data is a UTF-8 encoded text is small, so it looks like garbage text encoded with UTF-8 was appended to the extra.

It is definitely not Python issue.
History
Date User Action Args
2022-04-11 14:58:19adminsetgithub: 68874
2019-10-29 20:16:13serhiy.storchakasetstatus: open -> closed
resolution: third party
messages: + msg355674

stage: resolved
2015-07-23 12:24:31serhiy.storchakasetmessages: + msg247190
2015-07-23 10:50:55serhiy.storchakasetmessages: + msg247182
2015-07-23 10:50:17serhiy.storchakasetmessages: + msg247181
2015-07-23 06:47:50ronaldoussorensetnosy: + twouters, ronaldoussoren, alanmcintyre, serhiy.storchaka
messages: + msg247177
2015-07-22 17:16:22Devin Fishercreate