classification
Title: Allow setting timestamp in gzip-compressed tarfiles
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: jonash, martin.panter, randombit
Priority: normal Keywords:

Created on 2017-09-20 05:11 by randombit, last changed 2017-11-10 23:46 by martin.panter.

Messages (3)
msg302590 - (view) Author: Jack Lloyd (randombit) Date: 2017-09-20 05:11
Context: I have a script which checks out a software release (tagged git revision) and builds an archive to distribute to end users. One goal of this script is that the archive is reproducible, ie if the script is run twice (at different times, on different machines, by different people) it produces bit-for-bit identical output, and thus also has the same SHA-256 hash.

Mostly this works great, using the TarInfo feature of tarfile.py to set the uid/gid/mtime to fixed values. Except I also want to compress the archive, and tarfile calls time.time() to find out the timestamp that will be embedded in the gzip header. This breaks my carefully deterministic output.

I would like it if tarfile accepted an additional keyword that allowed overriding the time value for the gzip header. As it is I just hack around it with

def null_time():
    return 0
time.time = null_time

which does work but is also horrible.

Alternately, tarfile could just always set the timestamp header to 0 and avoid having its output depend on the current clock. I doubt anyone would notice.

The script in question is here 
https://github.com/randombit/botan/blob/master/src/scripts/dist.py

My script uses Python2 for various reasons, but it seems the same problem affects even the tarfile.py in latest Python3. I would be willing to try writing a patch for this, if anything along these lines might be accepted.

Thanks.
msg305915 - (view) Author: Jonas H. (jonash) Date: 2017-11-08 22:32
This affects me too.
msg306065 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-11-10 23:46
Perhaps you can compress the tar file using the “gzip.GzipFile” class. It accepts a custom “mtime” parameter (see Issue 4272, added in 2.7 and 3.1+):

>>> gztar = BytesIO()
>>> tar = GzipFile(fileobj=gztar, mode="w", mtime=0)
>>> tarfile.open(fileobj=tar, mode="w|").close()
>>> tar.close()
>>> gztar.getvalue().hex()
'1f8b08000000000002ffedc1010d000000c2a0f74f6d0e37a00000000000000000008037039ade1d2700280000'

However, “tarfile.open” accepts a “compresslevel” argument for two of the compressors, so you could argue it is okay to add another argument to pass to the gzip compressor.
History
Date User Action Args
2017-11-10 23:46:22martin.pantersetnosy: + martin.panter
messages: + msg306065
2017-11-08 22:32:41jonashsetnosy: + jonash
messages: + msg305915
2017-09-20 05:11:11randombitcreate