Issue 44067: Zipfile lib overwrites the extra field during closing when the archive size is more then ZIP64_LIMIT

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/88233

classification

Title:	Zipfile lib overwrites the extra field during closing when the archive size is more then ZIP64_LIMIT
Type:	behavior	Stage:
Components:	Library (Lib)	Versions:	Python 3.11, Python 3.10, Python 3.9, Python 3.8, Python 3.7, Python 3.6

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	alanmcintyre, anadius, serhiy.storchaka, shaanbhaya, twouters
Priority:	normal	Keywords:

Created on 2021-05-07 08:18 by shaanbhaya, last changed 2022-04-11 14:59 by admin.

Files
File name	Uploaded	Description	Edit
zip_bug.zip	shaanbhaya, 2021-05-07 08:18

Messages (3)
msg393178 - (view)	Author: AMRIT RAI (shaanbhaya)	Date: 2021-05-07 08:18
The current zipFile implementation supports the allowZip64,which can make large zip files. There is a bug in the current implementation of close method ,where the extra field is overwritten . To reproduce it : 1.Create a directory with more then 4 GB of data(spread over many files). 2.Make the zip of those files using the any rar achiever which adds NTFS metadata(mtime,atime and ctime) of files in the zip. 3.Apped a new file to the zip using the zip library . When i open the zip again ,the files processed after the ZIP64_LIMIT is reached will have there extra fields overwritten. I have attached the zip which contained the python used to add the new file, and the images of zip archive before adding new files and after.
msg393179 - (view)	Author: AMRIT RAI (shaanbhaya)	Date: 2021-05-07 08:41
The issue stems from the following code inside the def _write_end_record(self): method ,where the extra fields are trimmed . if zinfo.header_offset > ZIP64_LIMIT: extra.append(zinfo.header_offset) header_offset = 0xffffffff else: header_offset = zinfo.header_offset extra_data = zinfo.extra min_version = 0 if extra: # Append a ZIP64 field to the extra's extra_data = _strip_extra(extra_data, (1,)) extra_data = struct.pack( '<HH' + 'Q'len(extra), 1, 8len(extra), *extra) + extra_data min_version = ZIP64_VERSION
msg405760 - (view)	Author: anadius (anadius)	Date: 2021-11-05 00:04
I was looking at `zipfile._strip_extra` trying to figure out how it works. It doesn't. It skips extra headers after the last one that matches. That's what causes this issue. Here's a fixed version: def _strip_extra(extra, xids): # Remove Extra Fields with specified IDs. unpack = _EXTRA_FIELD_STRUCT.unpack modified = False buffer = [] start = i = 0 while i + 4 <= len(extra): xid, xlen = unpack(extra[i : i + 4]) j = i + 4 + xlen if xid in xids: if i != start: buffer.append(extra[start : i]) start = j modified = True i = j if i != start: buffer.append(extra[start : i]) if not modified: return extra return b''.join(buffer) Or this one, easier to understand: def _strip_extra(extra, xids): # Remove Extra Fields with specified IDs. unpack = _EXTRA_FIELD_STRUCT.unpack modified = False buffer = [] i = 0 while i + 4 <= len(extra): xid, xlen = unpack(extra[i : i + 4]) j = i + 4 + xlen if xid in xids: modified = True else: buffer.append(extra[i : j]) i = j if not modified: return extra return b''.join(buffer) Not sure which one is better.

History
Date	User	Action	Args
2022-04-11 14:59:45	admin	set	github: 88233
2021-11-07 00:34:59	anadius	set	nosy: + twouters, alanmcintyre, serhiy.storchaka
2021-11-05 00:04:58	anadius	set	nosy: + anadius messages: + msg405760
2021-05-07 08:41:46	shaanbhaya	set	messages: + msg393179
2021-05-07 08:18:55	shaanbhaya	create