classification
Title: tarfile in stream mode always set zlib compression level to 9
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Patrik Dufresne, jarondl, lars.gustaebel, martin.panter, wim.glenn, xiang.zhang
Priority: normal Keywords:

Created on 2016-02-01 00:55 by Patrik Dufresne, last changed 2017-08-01 17:03 by jarondl.

Pull Requests
URL Status Linked Edit
PR 2962 open jarondl, 2017-07-31 17:29
Messages (6)
msg259304 - (view) Author: Patrik Dufresne (Patrik Dufresne) Date: 2016-02-01 00:55
When using tarfile.open(mode='w|gz'), the compression level is hard-coded to 9. Seed _Stream._init_write_gz():
    self.zlib.compressobj(9,

1. In regards to zlib, I would start by replacing the value of 9 by zlib.Z_DEFAULT_COMPRESSION. This is the default value and zipfile is using it. Why using something different.

2. Then, I would also love to control the compression level when calling tarfile.open()
msg259308 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-02-01 03:13
It looks like the default has been hard-coded to 9 ever since tarfile was added to Python. The gzip module is also hard-coded to 9 since it was added. If tarfile is changed, maybe gzip should too.

Why would you want to use zlib’s default (apparently 6)? Memory usage or speed perhaps? If we do change the default, maybe it is best to only do it in 3.6. I don’t see it as a bug fix, and there is a chance it could break someone’s code.

To be able to control the compression level, perhaps you can already do it by wrapping the tar stream with GzipFile (untested):

gz_writer = GzipFile(fileobj=raw_writer, mode="wb", compresslevel=...)
tar_writer = tarfile.open(fileobj=gz_writer, mode="w|")
tar_writer.addfile(...)
tar_writer.close()
gz_writer.close()

If the default is changed, it certainly makes sense to add an easy compression level parameter, to be able to restore the old behaviour.
msg259987 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-02-10 06:52
Actually it’s not really obvious from the signatures, but in the middle of the tarfile.open() documentation it says “. . . tarfile.open() accepts the keyword argument _compresslevel_”, so it should already be possible.
msg292628 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-04-30 12:23
*compresslevel* takes effect for modes 'w:gz', 'r:gz', 'w:bz2', 'r:bz2', 'x:gz', 'x:bz2'. For stream modes, 'r|gz', 'w|gz', 'r|bz2', 'w|bz2', the *compresslevel* doesn't make sense. It seems not hard to make it possible but I'm not sure it's worth it or there is any reason it's hard-coded.
msg297220 - (view) Author: wim glenn (wim.glenn) * Date: 2017-06-28 20:13
This issue also got me.  compresslevel kwarg works fine for tarfile.open(..., mode='w:gz') but raises exception for tarfile.open(..., mode='w|gz')

I want to use stream compression, and compresslevel=1 is more than enough for my use case, the default of 9 is way too slow.
msg299622 - (view) Author: Yaron de Leeuw (jarondl) * Date: 2017-08-01 17:03
I have submitted a PR on GitHub https://github.com/python/cpython/pull/2962
History
Date User Action Args
2017-08-01 17:03:47jarondlsetnosy: + jarondl

messages: + msg299622
versions: + Python 3.7, - Python 3.6
2017-07-31 17:29:13jarondlsetpull_requests: + pull_request3009
2017-06-28 20:13:01wim.glennsetnosy: + wim.glenn
messages: + msg297220
2017-04-30 12:23:32xiang.zhangsetnosy: + xiang.zhang
messages: + msg292628
2016-02-12 04:49:38ned.deilysetnosy: + lars.gustaebel
2016-02-10 06:52:33martin.pantersetmessages: + msg259987
2016-02-01 03:13:43martin.pantersetversions: - Python 2.7, Python 3.2, Python 3.3, Python 3.4, Python 3.5
nosy: + martin.panter

messages: + msg259308

type: behavior -> enhancement
2016-02-01 00:55:23Patrik Dufresnecreate