classification
Title: gzip metadata fails to reflect compresslevel
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: miss-islington, ned.deily, serhiy.storchaka, wchargin
Priority: normal Keywords: easy, patch

Created on 2020-01-19 20:23 by wchargin, last changed 2020-03-04 07:06 by ned.deily. This issue is now closed.

Files
File name Uploaded Description Edit
repro.py wchargin, 2020-01-19 20:23 repro script, as given in initial bug comment
Pull Requests
URL Status Linked Edit
PR 18077 merged wchargin, 2020-01-20 08:49
PR 18100 merged miss-islington, 2020-01-21 11:25
PR 18101 merged miss-islington, 2020-01-21 11:25
Messages (12)
msg360268 - (view) Author: William Chargin (wchargin) * Date: 2020-01-19 20:23
The `gzip` module properly uses the user-specified compression level to
control the underlying zlib stream compression level, but always writes
metadata that indicates that the maximum compression level was used.

Repro:

```
import gzip

blob = b"The quick brown fox jumps over the lazy dog." * 32

with gzip.GzipFile("fast.gz", mode="wb", compresslevel=1) as outfile:
    outfile.write(blob)

with gzip.GzipFile("best.gz", mode="wb", compresslevel=9) as outfile:
    outfile.write(blob)
```

Run this script, then run `wc -c *.gz` and `file *.gz`:

```
$ wc -c *.gz
 82 best.gz
 84 fast.gz
166 total
$ file *.gz
best.gz: gzip compressed data, was "best", last modified: Sun Jan 19 20:15:23 2020, max compression
fast.gz: gzip compressed data, was "fast", last modified: Sun Jan 19 20:15:23 2020, max compression
```

The file sizes correctly reflect the difference, but `file` thinks that
both archives are written at max compression.

The error is that the ninth byte of the header in the output stream is
hard-coded to `\002` at Lib/gzip.py:260 (as of 558f07891170), which
indicates maximum compression. The correct value to indicate maximum
speed is `\004`. See RFC 1952, section 2.3.1:
<https://tools.ietf.org/html/rfc1952>

Using GNU `gzip(1)` with `--fast` creates the same output file as the
one emitted by the `gzip` module, except for two bytes: the metadata and
the OS (the ninth and tenth bytes).
msg360269 - (view) Author: William Chargin (wchargin) * Date: 2020-01-19 20:27
(The commit reference above was meant to be git558f07891170, not a
Mercurial reference. Pardon the churn; I'm new here. :-) )
msg360299 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-01-20 08:29
Looks reasonable. gzip should write b'\002' for compresslevel == _COMPRESS_LEVEL_BEST, b'\004' for compresslevel == _COMPRESS_LEVEL_FAST, and b'\000' otherwise. Do you mind to create a PR William.
msg360301 - (view) Author: William Chargin (wchargin) * Date: 2020-01-20 08:58
Sure, PR sent (pull_request17470).
msg360302 - (view) Author: William Chargin (wchargin) * Date: 2020-01-20 08:59
PR URL, for reference:
<https://github.com/python/cpython/pull/18077>
msg360390 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-01-21 11:25
New changeset eab3b3f1c60afecfb4db3c3619109684cb04bd60 by Serhiy Storchaka (William Chargin) in branch 'master':
bpo-39389: gzip: fix compression level metadata (GH-18077)
https://github.com/python/cpython/commit/eab3b3f1c60afecfb4db3c3619109684cb04bd60
msg360391 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-01-21 11:27
Thank you for your contribution William!
msg360392 - (view) Author: miss-islington (miss-islington) Date: 2020-01-21 11:42
New changeset ab0d8e356ecd351d55f89519a6a97a1e69c0dfab by Miss Islington (bot) in branch '3.8':
bpo-39389: gzip: fix compression level metadata (GH-18077)
https://github.com/python/cpython/commit/ab0d8e356ecd351d55f89519a6a97a1e69c0dfab
msg360524 - (view) Author: William Chargin (wchargin) * Date: 2020-01-23 00:18
My pleasure; thanks for the triage and review!
msg363272 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2020-03-03 16:16
Ping. The 3.7.x backport (PR 18101) for this issue is still open and neither needs to be fixed or closed.
msg363273 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2020-03-03 16:16
"either"
msg363331 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2020-03-04 07:06
New changeset 12c45efe828a90a2f2f58a1f95c85d792a0d9c0a by Miss Islington (bot) in branch '3.7':
[3.7] bpo-39389: gzip: fix compression level metadata (GH-18077) (GH-18101)
https://github.com/python/cpython/commit/12c45efe828a90a2f2f58a1f95c85d792a0d9c0a
History
Date User Action Args
2020-03-04 07:06:55ned.deilysetstatus: open -> closed
resolution: fixed
stage: backport needed -> resolved
2020-03-04 07:06:23ned.deilysetmessages: + msg363331
2020-03-03 16:16:32ned.deilysetmessages: + msg363273
2020-03-03 16:16:10ned.deilysetstatus: closed -> open

nosy: + ned.deily
messages: + msg363272

resolution: fixed -> (no value)
stage: resolved -> backport needed
2020-01-23 00:18:03wcharginsetmessages: + msg360524
2020-01-21 11:42:52miss-islingtonsetnosy: + miss-islington
messages: + msg360392
2020-01-21 11:27:14serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg360391

stage: patch review -> resolved
2020-01-21 11:25:47miss-islingtonsetpull_requests: + pull_request17490
2020-01-21 11:25:40miss-islingtonsetpull_requests: + pull_request17489
2020-01-21 11:25:31serhiy.storchakasetmessages: + msg360390
2020-01-20 08:59:07wcharginsetmessages: + msg360302
2020-01-20 08:58:17wcharginsetmessages: + msg360301
2020-01-20 08:49:36wcharginsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request17470
2020-01-20 08:29:13serhiy.storchakasetversions: - Python 2.7, Python 3.5, Python 3.6
nosy: + serhiy.storchaka

messages: + msg360299

keywords: + easy
stage: needs patch
2020-01-19 20:27:07wcharginsetmessages: + msg360269
2020-01-19 20:24:33wcharginsettype: behavior
2020-01-19 20:23:58wchargincreate