msg389040 - (view) |
Author: Hans-Christoph Steiner (eighthave) |
Date: 2021-03-18 21:25 |
It is now standard for Java JARs and Android APKs (both ZIP files) to zero out lots of the fields in the ZIP header. For example:
* each file entry has the date set to zero
* the create_system is always set to zero on all platforms
zipfile currently cannot create such ZIPs because of two small restrictions that it introduced:
* must use a tuple of 6 values to set the date
* forced create_system value based on sys.platform == 'win32'
* maybe other fields?
I lump these together because it might make sense to handle this with a single argument, something like zero_header=True. The use case is for working with ZIP, JAR, APK, AAR files for reproducible builds. The whole build system for F-Droid is built in Python. We need to be able to copy the JAR/APK signatures in order to reproduce signed builds using only the source code and the signature files themselves. Right now, that's not possible because building a ZIP with Python's zipfile cannot zero out the ZIP header like other tools can, including Java.
|
msg389041 - (view) |
Author: Hans-Christoph Steiner (eighthave) |
Date: 2021-03-18 22:00 |
I just found another specific example in _open_to_write(). 0 is a valid value for zinfo.external_attr. But this code always forces 0 to something else:
if not zinfo.external_attr:
zinfo.external_attr = 0o600 << 16 # permissions: ?rw-------
|
msg389338 - (view) |
Author: Felix C. Stegerman (obfusk) * |
Date: 2021-03-22 20:20 |
I've created a draft PR; RFC :)
Also:
* setting the date to (1980,0,0,0,0,0) already works;
* the main issue seems to be that external_attr cannot be 0 atm.
|
msg389339 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2021-03-22 20:44 |
Hi,
thanks for looking into reproducible builds. I have a few suggestions:
- since it's a new feature, it cannot go into older releases.
- zeroed is not a self-explanatory term. I suggest to find a term that does describe the result, not the internal operation.
- I don't think you have to introduce a new argument at all. Instead you can provide a new method that creates a carefully crafted zipinfo object that results into zeroed arguments. That's how I implemented reproducible tar.bz2 files.
- For full reproducible builds you may have to write files to zipfiles in a well-defined order.
|
msg389343 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2021-03-22 20:58 |
zinfo = zipfile.ZipInfo()
zinfo.date_time = (1980, 0, 0, 0, 0, 0)
zinfo.create_system = 0
external_attr == 0 may cause issues with permissions. I do something like this in my reproducible tarfile code:
if zinfo.isdir():
# 0755 + MS-DOS directory flag
zinfo.external_attr = 0o755 | 0x010
else:
zinfo.external_attr = 0o644
|
msg389348 - (view) |
Author: Felix C. Stegerman (obfusk) * |
Date: 2021-03-22 22:58 |
I've closed the PR for now.
Using a carefully crafted ZipInfo object doesn't work because ZipFile modifies its .external_attr when set to 0.
Using something like this quickly hacked together ZipInfo subclass does work:
class ZeroedZipInfo(zipfile.ZipInfo):
def __init__(self, zinfo):
for k in self.__slots__:
setattr(self, k, getattr(zinfo, k))
def __getattribute__(self, name):
if name == "date_time":
return (1980,0,0,0,0,0)
if name == "external_attr":
return 0
return object.__getattribute__(self, name)
...
myzipfile.writestr(ZeroedZipInfo(info), data)
|
msg389349 - (view) |
Author: Felix C. Stegerman (obfusk) * |
Date: 2021-03-22 23:05 |
> external_attr == 0 may cause issues with permissions.
That may be true in some scenarios, but not being able to set it to 0 means you can't create identical files to those produced by other tools -- like those used to generate APKs -- which do in fact set it to 0.
|
msg389382 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2021-03-23 10:30 |
The __getattr__ hack is not needed. You can reset the flags in a different, more straight forward way:
class ReproducibleZipInfo(ZipInfo):
__slots__ = ()
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._reset_flags()
@classmethod
def from_file(cls, *args, **kwargs):
zinfo = super().from_file(*args, **kwargs)
zinfo._reset_flags()
return zinfo
def _reset_flags(self):
self.date_time = (1980, 0, 0, 0, 0, 0)
self.create_system = 0
self.external_attr = 0
>>> zinfo = ReproducibleZipInfo.from_file("/etc/os-release")
>>> zinfo.external_attr
0
>>> zinfo.create_system
0
>>> zinfo.date_time
(1980, 0, 0, 0, 0, 0)
I think it makes also sense to replace hard-coded ZipInfo class with dispatcher attribute on the class:
@@ -1203,6 +1211,7 @@ class ZipFile:
fp = None # Set here since __del__ checks it
_windows_illegal_name_trans_table = None
+ zipinfo_class = ZipInfo
def __init__(self, file, mode="r", compression=ZIP_STORED, allowZip64=True,
compresslevel=None, *, strict_timestamps=True):
@@ -1362,7 +1371,7 @@ def _RealGetContents(self):
# Historical ZIP filename encoding
filename = filename.decode('cp437')
# Create ZipInfo instance to store file information
- x = ZipInfo(filename)
+ x = self.zipinfo_class(filename)
|
msg389392 - (view) |
Author: Felix C. Stegerman (obfusk) * |
Date: 2021-03-23 15:26 |
> The __getattr__ hack is not needed. You can reset the flags in a different, more straight forward way
As mentioned, ZipFile._open_to_write() will modify the ZipInfo's .external_attr when it is set to 0.
> I just found another specific example in _open_to_write(). 0 is a valid value for zinfo.external_attr. But this code always forces 0 to something else:
>
> if not zinfo.external_attr:
> zinfo.external_attr = 0o600 << 16 # permissions: ?rw-------
Your alternative doesn't seem to take that subsequent modification into account.
|
msg389441 - (view) |
Author: Hans-Christoph Steiner (eighthave) |
Date: 2021-03-24 10:41 |
> - For full reproducible builds you may have to write files to zipfiles in a well-defined order.
That already works fine now, we've been doing that with Python for years. But that leaves it up to the implemented to do. I suppose zipfile could provide a method to sort entries, but that's out of scope for this issue IMHO.
|
msg396332 - (view) |
Author: Felix C. Stegerman (obfusk) * |
Date: 2021-06-22 13:49 |
https://github.com/obfusk/apksigcopier currently produces reproducible ZIP files identical to those produced by apksigner using this code:
DATETIMEZERO = (1980, 0, 0, 0, 0, 0)
class ReproducibleZipInfo(zipfile.ZipInfo):
"""Reproducible ZipInfo hack."""
_override = {} # type: Dict[str, Any]
def __init__(self, zinfo, **override):
if override:
self._override = {**self._override, **override}
for k in self.__slots__:
if hasattr(zinfo, k):
setattr(self, k, getattr(zinfo, k))
def __getattribute__(self, name):
if name != "_override":
try:
return self._override[name]
except KeyError:
pass
return object.__getattribute__(self, name)
class APKZipInfo(ReproducibleZipInfo):
"""Reproducible ZipInfo for APK files."""
_override = dict(
compress_type=8,
create_system=0,
create_version=20,
date_time=DATETIMEZERO,
external_attr=0,
extract_version=20,
flag_bits=0x800,
)
def patch_meta(...):
...
with zipfile.ZipFile(output_apk, "a") as zf_out:
info_data = [(APKZipInfo(info, date_time=date_time), data)
for info, data in extracted_meta]
_write_to_zip(info_data, zf_out)
if sys.version_info >= (3, 7):
def _write_to_zip(info_data, zf_out):
for info, data in info_data:
zf_out.writestr(info, data, compresslevel=9)
else:
def _write_to_zip(info_data, zf_out):
old = zipfile._get_compressor
zipfile._get_compressor = lambda _: zlib.compressobj(9, 8, -15)
try:
for info, data in info_data:
zf_out.writestr(info, data)
finally:
zipfile._get_compressor = old
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:43 | admin | set | github: 87713 |
2021-06-22 13:49:25 | obfusk | set | messages:
+ msg396332 |
2021-03-24 10:41:43 | eighthave | set | messages:
+ msg389441 |
2021-03-23 15:26:22 | obfusk | set | messages:
+ msg389392 |
2021-03-23 10:30:58 | christian.heimes | set | messages:
+ msg389382 |
2021-03-22 23:05:05 | obfusk | set | messages:
+ msg389349 |
2021-03-22 22:59:47 | obfusk | set | type: enhancement |
2021-03-22 22:59:15 | obfusk | set | components:
+ Library (Lib) |
2021-03-22 22:58:57 | obfusk | set | components:
- Library (Lib), IO versions:
- Python 3.6, Python 3.7, Python 3.8, Python 3.9 |
2021-03-22 22:58:03 | obfusk | set | type: enhancement -> (no value) messages:
+ msg389348 components:
+ IO versions:
+ Python 3.6, Python 3.7, Python 3.8, Python 3.9 |
2021-03-22 20:58:58 | christian.heimes | set | messages:
+ msg389343 |
2021-03-22 20:44:28 | christian.heimes | set | versions:
- Python 3.6, Python 3.7, Python 3.8, Python 3.9 nosy:
+ christian.heimes
messages:
+ msg389339
components:
- IO type: enhancement |
2021-03-22 20:20:47 | obfusk | set | messages:
+ msg389338 |
2021-03-22 20:13:07 | obfusk | set | keywords:
+ patch stage: patch review pull_requests:
+ pull_request23737 |
2021-03-20 23:14:46 | obfusk | set | nosy:
+ obfusk
|
2021-03-20 21:54:24 | jondo | set | nosy:
+ jondo
|
2021-03-18 22:00:47 | eighthave | set | messages:
+ msg389041 |
2021-03-18 21:25:38 | eighthave | create | |