classification
Title: zipfile.ZipInfo objects contain invalid 'extra' fields.
Type: behavior Stage:
Components: IO Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Bram Stolk, dhillier
Priority: normal Keywords:

Created on 2020-01-10 21:57 by Bram Stolk, last changed 2020-02-02 02:50 by dhillier.

Messages (2)
msg359762 - (view) Author: Bram Stolk (Bram Stolk) Date: 2020-01-10 21:57
This has been tested with Windows Python 2.7 and Python 3.8

If you get the ZipInfo objects of a ZIP file that is larger than 2GiB, then all the ZipInfo entries with a header offset > 2G will report phantom 'extra' data.

import zipfile
zipname = "reallybig.zip"
z = zipfile.ZipFile( zipname )
zi = z.infolist()
for inf in zi:
      print( inf.filename, inf.header_offset, inf.extra )  

And observe that:
* All entries with offset < 2G will report no extra field.
* All entries with offset > 2G will report extra field.

It's hard to package this up as a self-contained test, because it requires a very large zip to test.
msg361201 - (view) Author: Daniel Hillier (dhillier) * Date: 2020-02-02 02:50
This looks to be expected behaviour for the zip64 extension in the zip spec (for handling large files or large archives).

Section 4.4.1.4 of the zip spec outlines when the zip64 extra fields are used (https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT). In short, when the file offset header (number of bytes to the start of the file in the archive) exceeds the size allotted in the header in the original spec (0xFFFFFFFF or just under 2Gb).

Let me know if what you're observing is unrelated to this.
History
Date User Action Args
2020-02-02 02:50:32dhilliersetnosy: + dhillier
messages: + msg361201
2020-01-10 21:57:59Bram Stolkcreate