This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Zipfile lib overwrites the extra field during closing when the archive size is more then ZIP64_LIMIT
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.11, Python 3.10, Python 3.9, Python 3.8, Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: alanmcintyre, anadius, serhiy.storchaka, shaanbhaya, twouters
Priority: normal Keywords:

Created on 2021-05-07 08:18 by shaanbhaya, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
zip_bug.zip shaanbhaya, 2021-05-07 08:18
Messages (3)
msg393178 - (view) Author: AMRIT RAI (shaanbhaya) Date: 2021-05-07 08:18
The current zipFile implementation supports the allowZip64,which can make large zip files.
There is a bug in the current implementation of close method ,where the extra field is overwritten .

To reproduce it :
1.Create a directory with more then 4 GB of data(spread over many files).
2.Make the zip of those files using the any rar achiever which adds NTFS metadata(mtime,atime and ctime) of files in the zip.
3.Apped a new file to the zip using the zip library .

When i open the zip again ,the files processed after the ZIP64_LIMIT is reached will have there extra fields overwritten.

I have attached the zip which contained the python used to add the new file, and the images of zip archive before adding new files and after.
msg393179 - (view) Author: AMRIT RAI (shaanbhaya) Date: 2021-05-07 08:41
The issue stems from the following code inside the 
def _write_end_record(self): method ,where the extra fields are trimmed .

if zinfo.header_offset > ZIP64_LIMIT:
    extra.append(zinfo.header_offset)
    header_offset = 0xffffffff
else:
    header_offset = zinfo.header_offset

extra_data = zinfo.extra
min_version = 0
if extra:
    # Append a ZIP64 field to the extra's
    extra_data = _strip_extra(extra_data, (1,))
    extra_data = struct.pack(
        '<HH' + 'Q'*len(extra),
        1, 8*len(extra), *extra) + extra_data

min_version = ZIP64_VERSION
msg405760 - (view) Author: anadius (anadius) Date: 2021-11-05 00:04
I was looking at `zipfile._strip_extra` trying to figure out how it works. It doesn't. It skips extra headers after the last one that matches. That's what causes this issue.

Here's a fixed version:

def _strip_extra(extra, xids):
    # Remove Extra Fields with specified IDs.
    unpack = _EXTRA_FIELD_STRUCT.unpack
    modified = False
    buffer = []
    start = i = 0
    while i + 4 <= len(extra):
        xid, xlen = unpack(extra[i : i + 4])
        j = i + 4 + xlen
        if xid in xids:
            if i != start:
                buffer.append(extra[start : i])
            start = j
            modified = True
        i = j
    if i != start:
        buffer.append(extra[start : i])
    if not modified:
        return extra
    return b''.join(buffer)

Or this one, easier to understand:

def _strip_extra(extra, xids):
    # Remove Extra Fields with specified IDs.
    unpack = _EXTRA_FIELD_STRUCT.unpack
    modified = False
    buffer = []
    i = 0
    while i + 4 <= len(extra):
        xid, xlen = unpack(extra[i : i + 4])
        j = i + 4 + xlen
        if xid in xids:
            modified = True
        else:
            buffer.append(extra[i : j])
        i = j
    if not modified:
        return extra
    return b''.join(buffer)

Not sure which one is better.
History
Date User Action Args
2022-04-11 14:59:45adminsetgithub: 88233
2021-11-07 00:34:59anadiussetnosy: + twouters, alanmcintyre, serhiy.storchaka
2021-11-05 00:04:58anadiussetnosy: + anadius
messages: + msg405760
2021-05-07 08:41:46shaanbhayasetmessages: + msg393179
2021-05-07 08:18:55shaanbhayacreate