classification
Title: support ZIP files with zeroed out fields (e.g. for reproducible builds)
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, eighthave, jondo, obfusk
Priority: normal Keywords: patch

Created on 2021-03-18 21:25 by eighthave, last changed 2021-06-22 13:49 by obfusk.

Pull Requests
URL Status Linked Edit
PR 24979 closed obfusk, 2021-03-22 20:13
Messages (11)
msg389040 - (view) Author: Hans-Christoph Steiner (eighthave) Date: 2021-03-18 21:25
It is now standard for Java JARs and Android APKs (both ZIP files) to zero out lots of the fields in the ZIP header.  For example:

* each file entry has the date set to zero
* the create_system is always set to zero on all platforms

zipfile currently cannot create such ZIPs because of two small restrictions that it introduced:

* must use a tuple of 6 values to set the date
* forced create_system value based on sys.platform == 'win32'
* maybe other fields?

I lump these together because it might make sense to handle this with a single argument, something like zero_header=True.  The use case is for working with ZIP, JAR, APK, AAR files for reproducible builds.  The whole build system for F-Droid is built in Python.  We need to be able to copy the JAR/APK signatures in order to reproduce signed builds using only the source code and the signature files themselves.  Right now, that's not possible because building a ZIP with Python's zipfile cannot zero out the ZIP header like other tools can, including Java.
msg389041 - (view) Author: Hans-Christoph Steiner (eighthave) Date: 2021-03-18 22:00
I just found another specific example in _open_to_write().  0 is a valid value for zinfo.external_attr.  But this code always forces 0 to something else:

        if not zinfo.external_attr:
            zinfo.external_attr = 0o600 << 16  # permissions: ?rw-------
msg389338 - (view) Author: Felix C. Stegerman (obfusk) * Date: 2021-03-22 20:20
I've created a draft PR; RFC :)

Also:

* setting the date to (1980,0,0,0,0,0) already works;
* the main issue seems to be that external_attr cannot be 0 atm.
msg389339 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-03-22 20:44
Hi,

thanks for looking into reproducible builds. I have a few suggestions:

- since it's a new feature, it cannot go into older releases.
- zeroed is not a self-explanatory term. I suggest to find a term that does describe the result, not the internal operation.
- I don't think you have to introduce a new argument at all. Instead you can provide a new method that creates a carefully crafted zipinfo object that results into zeroed arguments. That's how I implemented reproducible tar.bz2 files.
- For full reproducible builds you may have to write files to zipfiles in a well-defined order.
msg389343 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-03-22 20:58
zinfo = zipfile.ZipInfo()
zinfo.date_time = (1980, 0, 0, 0, 0, 0)
zinfo.create_system = 0

external_attr == 0 may cause issues with permissions. I do something like this in my reproducible tarfile code:

if zinfo.isdir():
    # 0755 + MS-DOS directory flag
    zinfo.external_attr = 0o755 | 0x010
else:
    zinfo.external_attr = 0o644
msg389348 - (view) Author: Felix C. Stegerman (obfusk) * Date: 2021-03-22 22:58
I've closed the PR for now.

Using a carefully crafted ZipInfo object doesn't work because ZipFile modifies its .external_attr when set to 0.

Using something like this quickly hacked together ZipInfo subclass does work:

class ZeroedZipInfo(zipfile.ZipInfo):
    def __init__(self, zinfo):
        for k in self.__slots__:
            setattr(self, k, getattr(zinfo, k))

    def __getattribute__(self, name):
        if name == "date_time":
            return (1980,0,0,0,0,0)
        if name == "external_attr":
            return 0
        return object.__getattribute__(self, name)

...

myzipfile.writestr(ZeroedZipInfo(info), data)
msg389349 - (view) Author: Felix C. Stegerman (obfusk) * Date: 2021-03-22 23:05
> external_attr == 0 may cause issues with permissions.

That may be true in some scenarios, but not being able to set it to 0 means you can't create identical files to those produced by other tools -- like those used to generate APKs -- which do in fact set it to 0.
msg389382 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-03-23 10:30
The __getattr__ hack is not needed. You can reset the flags in a different, more straight forward way:


class ReproducibleZipInfo(ZipInfo):
    __slots__ = ()

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._reset_flags()

    @classmethod
    def from_file(cls, *args, **kwargs):
        zinfo = super().from_file(*args, **kwargs)
        zinfo._reset_flags()
        return zinfo

    def _reset_flags(self):
        self.date_time = (1980, 0, 0, 0, 0, 0)
        self.create_system = 0
        self.external_attr = 0


>>> zinfo = ReproducibleZipInfo.from_file("/etc/os-release")
>>> zinfo.external_attr
0
>>> zinfo.create_system
0
>>> zinfo.date_time
(1980, 0, 0, 0, 0, 0)


I think it makes also sense to replace hard-coded ZipInfo class with dispatcher attribute on the class:


@@ -1203,6 +1211,7 @@ class ZipFile:
 
     fp = None                   # Set here since __del__ checks it
     _windows_illegal_name_trans_table = None
+    zipinfo_class = ZipInfo
 
     def __init__(self, file, mode="r", compression=ZIP_STORED, allowZip64=True,
                  compresslevel=None, *, strict_timestamps=True):
@@ -1362,7 +1371,7 @@ def _RealGetContents(self):
                 # Historical ZIP filename encoding
                 filename = filename.decode('cp437')
             # Create ZipInfo instance to store file information
-            x = ZipInfo(filename)
+            x = self.zipinfo_class(filename)
msg389392 - (view) Author: Felix C. Stegerman (obfusk) * Date: 2021-03-23 15:26
> The __getattr__ hack is not needed. You can reset the flags in a different, more straight forward way

As mentioned, ZipFile._open_to_write() will modify the ZipInfo's .external_attr when it is set to 0.

> I just found another specific example in _open_to_write().  0 is a valid value for zinfo.external_attr.  But this code always forces 0 to something else:
>
>     if not zinfo.external_attr:
>         zinfo.external_attr = 0o600 << 16  # permissions: ?rw-------

Your alternative doesn't seem to take that subsequent modification into account.
msg389441 - (view) Author: Hans-Christoph Steiner (eighthave) Date: 2021-03-24 10:41
> - For full reproducible builds you may have to write files to zipfiles in a well-defined order.

That already works fine now, we've been doing that with Python for years.  But that leaves it up to the implemented to do.  I suppose zipfile could provide a method to sort entries, but that's out of scope for this issue IMHO.
msg396332 - (view) Author: Felix C. Stegerman (obfusk) * Date: 2021-06-22 13:49
https://github.com/obfusk/apksigcopier currently produces reproducible ZIP files identical to those produced by apksigner using this code:


DATETIMEZERO = (1980, 0, 0, 0, 0, 0)


class ReproducibleZipInfo(zipfile.ZipInfo):
    """Reproducible ZipInfo hack."""

    _override = {}  # type: Dict[str, Any]

    def __init__(self, zinfo, **override):
        if override:
            self._override = {**self._override, **override}
        for k in self.__slots__:
            if hasattr(zinfo, k):
                setattr(self, k, getattr(zinfo, k))

    def __getattribute__(self, name):
        if name != "_override":
            try:
                return self._override[name]
            except KeyError:
                pass
        return object.__getattribute__(self, name)


class APKZipInfo(ReproducibleZipInfo):
    """Reproducible ZipInfo for APK files."""

    _override = dict(
        compress_type=8,
        create_system=0,
        create_version=20,
        date_time=DATETIMEZERO,
        external_attr=0,
        extract_version=20,
        flag_bits=0x800,
    )


def patch_meta(...):
    ...
    with zipfile.ZipFile(output_apk, "a") as zf_out:
        info_data = [(APKZipInfo(info, date_time=date_time), data)
                     for info, data in extracted_meta]
        _write_to_zip(info_data, zf_out)


if sys.version_info >= (3, 7):
    def _write_to_zip(info_data, zf_out):
        for info, data in info_data:
            zf_out.writestr(info, data, compresslevel=9)
else:
    def _write_to_zip(info_data, zf_out):
        old = zipfile._get_compressor
        zipfile._get_compressor = lambda _: zlib.compressobj(9, 8, -15)
        try:
            for info, data in info_data:
                zf_out.writestr(info, data)
        finally:
            zipfile._get_compressor = old
History
Date User Action Args
2021-06-22 13:49:25obfusksetmessages: + msg396332
2021-03-24 10:41:43eighthavesetmessages: + msg389441
2021-03-23 15:26:22obfusksetmessages: + msg389392
2021-03-23 10:30:58christian.heimessetmessages: + msg389382
2021-03-22 23:05:05obfusksetmessages: + msg389349
2021-03-22 22:59:47obfusksettype: enhancement
2021-03-22 22:59:15obfusksetcomponents: + Library (Lib)
2021-03-22 22:58:57obfusksetcomponents: - Library (Lib), IO
versions: - Python 3.6, Python 3.7, Python 3.8, Python 3.9
2021-03-22 22:58:03obfusksettype: enhancement -> (no value)
messages: + msg389348
components: + IO
versions: + Python 3.6, Python 3.7, Python 3.8, Python 3.9
2021-03-22 20:58:58christian.heimessetmessages: + msg389343
2021-03-22 20:44:28christian.heimessetversions: - Python 3.6, Python 3.7, Python 3.8, Python 3.9
nosy: + christian.heimes

messages: + msg389339

components: - IO
type: enhancement
2021-03-22 20:20:47obfusksetmessages: + msg389338
2021-03-22 20:13:07obfusksetkeywords: + patch
stage: patch review
pull_requests: + pull_request23737
2021-03-20 23:14:46obfusksetnosy: + obfusk
2021-03-20 21:54:24jondosetnosy: + jondo
2021-03-18 22:00:47eighthavesetmessages: + msg389041
2021-03-18 21:25:38eighthavecreate