Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gzip metadata fails to reflect compresslevel #83570

Closed
wchargin mannequin opened this issue Jan 19, 2020 · 12 comments
Closed

gzip metadata fails to reflect compresslevel #83570

wchargin mannequin opened this issue Jan 19, 2020 · 12 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes easy stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@wchargin
Copy link
Mannequin

wchargin mannequin commented Jan 19, 2020

BPO 39389
Nosy @ned-deily, @serhiy-storchaka, @miss-islington, @wchargin
PRs
  • bpo-39389: gzip: fix compression level metadata #18077
  • [3.8] bpo-39389: gzip: fix compression level metadata (GH-18077) #18100
  • [3.7] bpo-39389: gzip: fix compression level metadata (GH-18077) #18101
  • Files
  • repro.py: repro script, as given in initial bug comment
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-03-04.07:06:55.894>
    created_at = <Date 2020-01-19.20:23:58.000>
    labels = ['easy', 'type-bug', '3.8', '3.9', '3.7', 'library']
    title = 'gzip metadata fails to reflect compresslevel'
    updated_at = <Date 2020-03-04.07:06:55.894>
    user = 'https://github.com/wchargin'

    bugs.python.org fields:

    activity = <Date 2020-03-04.07:06:55.894>
    actor = 'ned.deily'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-03-04.07:06:55.894>
    closer = 'ned.deily'
    components = ['Library (Lib)']
    creation = <Date 2020-01-19.20:23:58.000>
    creator = 'wchargin'
    dependencies = []
    files = ['48853']
    hgrepos = []
    issue_num = 39389
    keywords = ['patch', 'easy']
    message_count = 12.0
    messages = ['360268', '360269', '360299', '360301', '360302', '360390', '360391', '360392', '360524', '363272', '363273', '363331']
    nosy_count = 4.0
    nosy_names = ['ned.deily', 'serhiy.storchaka', 'miss-islington', 'wchargin']
    pr_nums = ['18077', '18100', '18101']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue39389'
    versions = ['Python 3.7', 'Python 3.8', 'Python 3.9']

    @wchargin
    Copy link
    Mannequin Author

    wchargin mannequin commented Jan 19, 2020

    The gzip module properly uses the user-specified compression level to
    control the underlying zlib stream compression level, but always writes
    metadata that indicates that the maximum compression level was used.

    Repro:

    import gzip
    
    blob = b"The quick brown fox jumps over the lazy dog." * 32
    
    with gzip.GzipFile("fast.gz", mode="wb", compresslevel=1) as outfile:
        outfile.write(blob)
    
    with gzip.GzipFile("best.gz", mode="wb", compresslevel=9) as outfile:
        outfile.write(blob)
    

    Run this script, then run wc -c *.gz and file *.gz:

    $ wc -c *.gz
     82 best.gz
     84 fast.gz
    166 total
    $ file *.gz
    best.gz: gzip compressed data, was "best", last modified: Sun Jan 19 20:15:23 2020, max compression
    fast.gz: gzip compressed data, was "fast", last modified: Sun Jan 19 20:15:23 2020, max compression
    

    The file sizes correctly reflect the difference, but file thinks that
    both archives are written at max compression.

    The error is that the ninth byte of the header in the output stream is
    hard-coded to \002 at Lib/gzip.py:260 (as of 558f078), which
    indicates maximum compression. The correct value to indicate maximum
    speed is \004. See RFC 1952, section 2.3.1:
    https://tools.ietf.org/html/rfc1952

    Using GNU gzip(1) with --fast creates the same output file as the
    one emitted by the gzip module, except for two bytes: the metadata and
    the OS (the ninth and tenth bytes).

    @wchargin wchargin mannequin added 3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Jan 19, 2020
    @wchargin
    Copy link
    Mannequin Author

    wchargin mannequin commented Jan 19, 2020

    (The commit reference above was meant to be git558f07891170, not a
    Mercurial reference. Pardon the churn; I'm new here. :-) )

    @serhiy-storchaka
    Copy link
    Member

    Looks reasonable. gzip should write b'\002' for compresslevel == _COMPRESS_LEVEL_BEST, b'\004' for compresslevel == _COMPRESS_LEVEL_FAST, and b'\000' otherwise. Do you mind to create a PR William.

    @wchargin
    Copy link
    Mannequin Author

    wchargin mannequin commented Jan 20, 2020

    Sure, PR sent (pull_request17470).

    @wchargin
    Copy link
    Mannequin Author

    wchargin mannequin commented Jan 20, 2020

    PR URL, for reference:
    <https://github.com/python/cpython/pull/18077\>

    @serhiy-storchaka
    Copy link
    Member

    New changeset eab3b3f by Serhiy Storchaka (William Chargin) in branch 'master':
    bpo-39389: gzip: fix compression level metadata (GH-18077)
    eab3b3f

    @serhiy-storchaka
    Copy link
    Member

    Thank you for your contribution William!

    @miss-islington
    Copy link
    Contributor

    New changeset ab0d8e3 by Miss Islington (bot) in branch '3.8':
    bpo-39389: gzip: fix compression level metadata (GH-18077)
    ab0d8e3

    @wchargin
    Copy link
    Mannequin Author

    wchargin mannequin commented Jan 23, 2020

    My pleasure; thanks for the triage and review!

    @ned-deily
    Copy link
    Member

    Ping. The 3.7.x backport (PR 18101) for this issue is still open and neither needs to be fixed or closed.

    @ned-deily ned-deily reopened this Mar 3, 2020
    @ned-deily ned-deily reopened this Mar 3, 2020
    @ned-deily
    Copy link
    Member

    "either"

    @ned-deily
    Copy link
    Member

    New changeset 12c45ef by Miss Islington (bot) in branch '3.7':
    [3.7] bpo-39389: gzip: fix compression level metadata (GH-18077) (GH-18101)
    12c45ef

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes easy stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants