This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author rhpvorderman
Recipients rhpvorderman
Date 2021-02-24.15:05:59
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1614179160.12.0.374051475595.issue43317@roundup.psfhosted.org>
In-reply-to
Content
python -m gzip reads in chunks of 1024 bytes: https://github.com/python/cpython/blob/1f433406bd46fbd00b88223ad64daea6bc9eaadc/Lib/gzip.py#L599

This hurts performance somewhat. Using io.DEFAULT_BUFFER_SIZE will improve it. Also 'io.DEFAULT_BUFFER_SIZE' is better than: 'ARBITRARY_NUMBER_WITH_NO_COMMENT_EXPLAINING_WHY'.

With 1024 blocks
Decompression:
$ hyperfine -r 10 -w 3 'cat ~/test/500000reads.fastq.gz | ./prefix/bin/python3 -m gzip -d > /dev/null'
Benchmark #1: cat ~/test/500000reads.fastq.gz | ./prefix/bin/python3 -m gzip -d > /dev/null
  Time (mean ± σ):     926.9 ms ±   7.7 ms    [User: 901.2 ms, System: 59.1 ms]
  Range (min … max):   913.3 ms … 939.4 ms    10 runs

Compression:
$ hyperfine -r 10 -w 3 'cat ~/test/500000reads.fastq | ./prefix/bin/python3 -m gzip --fast > /dev/null'
Benchmark #1: cat ~/test/500000reads.fastq | ./prefix/bin/python3 -m gzip --fast > /dev/null
  Time (mean ± σ):      2.514 s ±  0.030 s    [User: 2.469 s, System: 0.125 s]
  Range (min … max):    2.472 s …  2.563 s    10 runs


with io.DEFAULT_BUFFER_SIZE
Decompression:
$ hyperfine -r 10 -w 3 'cat ~/test/500000reads.fastq.gz | ./prefix/bin/python3 -m gzip -d > /dev/null'
Benchmark #1: cat ~/test/500000reads.fastq.gz | ./prefix/bin/python3 -m gzip -d > /dev/null
  Time (mean ± σ):     839.9 ms ±   7.3 ms    [User: 816.0 ms, System: 57.3 ms]
  Range (min … max):   830.1 ms … 851.3 ms    10 runs

Compression:
$ hyperfine -r 10 -w 3 'cat ~/test/500000reads.fastq | ./prefix/bin/python3 -m gzip --fast > /dev/null'
Benchmark #1: cat ~/test/500000reads.fastq | ./prefix/bin/python3 -m gzip --fast > /dev/null
  Time (mean ± σ):      2.275 s ±  0.024 s    [User: 2.247 s, System: 0.096 s]
  Range (min … max):    2.254 s …  2.322 s    10 runs


Speedups: 
- Decompression 840 / 927 = 0.906 ~= 9% reduction in runtime
- Compression 2.275 / 2.514 = 0.905 ~= 9% reduction in runtime.

It is not stellar, but it is a quite nice improvement for such a tiny change.
History
Date User Action Args
2021-02-24 15:06:00rhpvordermansetrecipients: + rhpvorderman
2021-02-24 15:06:00rhpvordermansetmessageid: <1614179160.12.0.374051475595.issue43317@roundup.psfhosted.org>
2021-02-24 15:06:00rhpvordermanlinkissue43317 messages
2021-02-24 15:05:59rhpvordermancreate