Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set timestamp in gzip stream #48522

Closed
jfrechet mannequin opened this issue Nov 6, 2008 · 7 comments
Closed

set timestamp in gzip stream #48522

jfrechet mannequin opened this issue Nov 6, 2008 · 7 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@jfrechet
Copy link
Mannequin

jfrechet mannequin commented Nov 6, 2008

BPO 4272
Nosy @amauryfa, @pitrou
Files
  • gzip-mtime-py3k.patch: gzip mtime patch (vs branches/py3k)
  • gzip-mtime-2.x.patch: gzip mtime patch (vs 2.x trunk)
  • gzip-mtime-revised-py3k.patch: same patch without test_literal_output [py3k]
  • gzip-mtime-revised-2.x.patch: same patch without test_literal_output [2.x trunk]
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2009-01-04.21:39:42.612>
    created_at = <Date 2008-11-06.20:46:09.590>
    labels = ['type-feature', 'library']
    title = 'set timestamp in gzip stream'
    updated_at = <Date 2009-01-04.21:39:42.599>
    user = 'https://bugs.python.org/jfrechet'

    bugs.python.org fields:

    activity = <Date 2009-01-04.21:39:42.599>
    actor = 'pitrou'
    assignee = 'none'
    closed = True
    closed_date = <Date 2009-01-04.21:39:42.612>
    closer = 'pitrou'
    components = ['Library (Lib)']
    creation = <Date 2008-11-06.20:46:09.590>
    creator = 'jfrechet'
    dependencies = []
    files = ['11954', '11955', '12528', '12529']
    hgrepos = []
    issue_num = 4272
    keywords = ['patch']
    message_count = 7.0
    messages = ['75580', '75581', '75586', '75588', '78679', '78758', '79086']
    nosy_count = 3.0
    nosy_names = ['amaury.forgeotdarc', 'pitrou', 'jfrechet']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue4272'
    versions = ['Python 3.1', 'Python 2.7']

    @jfrechet
    Copy link
    Mannequin Author

    jfrechet mannequin commented Nov 6, 2008

    The gzip header defined in RFC 1952 includes a mandatory "MTIME" field,
    originally intended to contain the modification time of the original
    uncompressed file. It is often ignored when decompressing, though
    gunzip (for example) uses it to set the modification time of the output
    file if applicable.

    The Python gzip module always sets the MTIME field to the current time,
    and always discards MTIME when decompressing. As a result, compressing
    the same string using gzip produces different output every time. For
    certain applications, especially those involving comparisons or
    cryprographic signing of binary files, these spurious changes can be
    quite inconvenient. Aside from the MTIME field, the gzip module already
    produces entirely deterministic output.

    I'm attaching a patch which adds an optional "mtime" argument to the
    GzipFile class, giving the caller the option of providing a timestamp
    when compressing. Default behavior is unchanged. I've included updated
    documentation and three new test cases in the patch.

    In order to facilitate testing, the patch also includes code to set the
    "mtime" member of the GzipFile instance when decompressing. The first
    test case uses the new member to ensure that the timestamp given to the
    GzipFile constructor is preserved correctly. The second test checks for
    specific values in the entire gzip header (not just the MTIME field) by
    reading the compressed file directly, examining individual fields in a
    (relatively) flexible way. The third compares the entire compressed
    stream against a predetermined sequence of bytes in a relatively
    inflexible way. All tests pass on my AMD64 box, and I expect them all
    to pass on all supported platforms without any problems. However, If
    anybody is concerned that any of the tests sound like they might be too
    brittle, I'm certainly not overly attached to them.

    If anyone has any further suggestions, I'd be delighted to submit a new
    patch.

    Thanks!

    Jacques

    @jfrechet jfrechet mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Nov 6, 2008
    @jfrechet
    Copy link
    Mannequin Author

    jfrechet mannequin commented Nov 6, 2008

    This discussion of the problem and possible workarounds might also be of
    interest:

    http://stackoverflow.com/questions/264224/setting-the-gzip-timestamp-from-python

    @amauryfa
    Copy link
    Member

    amauryfa commented Nov 7, 2008

    I considered using a datetime.datetime object instead. But it make more
    sense to use a time_t number, like os.stat() and time.time().

    About the tests on the gzip format details: I am not an expert of the
    gzip format, but are we sure that the compressed data will always be the
    same?
    Otherwise the patch is fine.

    @jfrechet
    Copy link
    Mannequin Author

    jfrechet mannequin commented Nov 7, 2008

    I'm no expert either. The output certainly seems to be deterministic
    for a given version of zlib, and I'm not aware of any prior versions of
    zlib that produce different compressed output. However, my
    understanding is that there is more than one possible compressed
    representation of a given uncompressed input, so it's entirely possible
    that a past or future version of zlib might produce compressed output
    that is different while remaining interoperable. I have no idea whether
    the zlib people care specifically about producing identical compressed
    output across versions or not. It might be a big deal to them, or they
    might have other priorities.

    I included the third test because I am guessing that the compressed
    output probably won't change very soon, and that if it does, it might be
    interesting to know that it changed. If that sounds to you like it
    might be more trouble than it's worth, then I think the right thing to
    do would be to simply get rid of the third test and keep the first two.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 1, 2009

    test_literal_output looks really too strict to me. At most, you could
    check that the header and trailer are unchanged, but it would probably
    make it equivalent to test_metadata.
    Other than that, I think it's an useful addition.

    @jfrechet
    Copy link
    Mannequin Author

    jfrechet mannequin commented Jan 2, 2009

    I am uploading a new patch, identical to the previous patch except that
    it does not contain the ill-advised third test case
    (test_literal_output). The patch still applies cleanly and the tests
    still pass.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 4, 2009

    The patches have been committed, thanks!

    @pitrou pitrou closed this as completed Jan 4, 2009
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants