Author jfrechet
Recipients jfrechet
Date 2008-11-06.20:46:04
SpamBayes Score 0.0
Marked as misclassified No
Message-id <1226004430.06.0.832608065072.issue4272@psf.upfronthosting.co.za>
In-reply-to
Content
The gzip header defined in RFC 1952 includes a mandatory "MTIME" field,
originally intended to contain the modification time of the original
uncompressed file.  It is often ignored when decompressing, though
gunzip (for example) uses it to set the modification time of the output
file if applicable.

The Python gzip module always sets the MTIME field to the current time,
and always discards MTIME when decompressing.  As a result, compressing
the same string using gzip produces different output every time.  For
certain applications, especially those involving comparisons or
cryprographic signing of binary files, these spurious changes can be
quite inconvenient.  Aside from the MTIME field, the gzip module already
produces entirely deterministic output.

I'm attaching a patch which adds an optional "mtime" argument to the
GzipFile class, giving the caller the option of providing a timestamp
when compressing.  Default behavior is unchanged.  I've included updated
documentation and three new test cases in the patch.

In order to facilitate testing, the patch also includes code to set the
"mtime" member of the GzipFile instance when decompressing.  The first
test case uses the new member to ensure that the timestamp given to the
GzipFile constructor is preserved correctly.  The second test checks for
specific values in the entire gzip header (not just the MTIME field) by
reading the compressed file directly, examining individual fields in a
(relatively) flexible way.  The third compares the entire compressed
stream against a predetermined sequence of bytes in a relatively
inflexible way.  All tests pass on my AMD64 box, and I expect them all
to pass on all supported platforms without any problems.  However, If
anybody is concerned that any of the tests sound like they might be too
brittle, I'm certainly not overly attached to them.

If anyone has any further suggestions, I'd be delighted to submit a new
patch.

Thanks!

Jacques
History
Date User Action Args
2008-11-06 20:47:10jfrechetsetrecipients: + jfrechet
2008-11-06 20:47:10jfrechetsetmessageid: <1226004430.06.0.832608065072.issue4272@psf.upfronthosting.co.za>
2008-11-06 20:46:09jfrechetlinkissue4272 messages
2008-11-06 20:46:07jfrechetcreate