Author wolma
Recipients docs@python, wolma
Date 2014-04-03.10:40:05
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1396521606.53.0.161098821761.issue21146@psf.upfronthosting.co.za>
In-reply-to
Content
The current documentation of the gzip module should have its section "12.2.1. Examples of usage" updated to reflect the changes made to the module in Python3.2 (https://docs.python.org/3.2/whatsnew/3.2.html#gzip-and-zipfile).

Currently, the recipe given for gz-compressing a file is:

import gzip
with open('/home/joe/file.txt', 'rb') as f_in:
    with gzip.open('/home/joe/file.txt.gz', 'wb') as f_out:
        f_out.writelines(f_in)

which is clearly sub-optimal because it is line-based.

An equally simple, but more efficient recipe would be:

chunk_size = 1024
with open('/home/joe/file.txt', 'rb') as f_in:
    with gzip.open('/home/joe/file.txt.gz', 'wb') as f_out:
        while True:
            c = f_in.read(chunk_size)
            if not c: break
            d = f_out.write(c)

Comparing the two examples I find a >= 2x performance gain (both in terms of CPU time and wall time).

In the inverse scenario of file *de*-compression (which is not part of the docs though), the performance increase of substituting:

with gzip.open('/home/joe/file.txt.gz', 'rb') as f_in:
    with open('/home/joe/file.txt', 'wb') as f_out:
        f_out.writelines(f_in)

with:

with gzip.open('/home/joe/file.txt.gz', 'rb') as f_in:
    with open('/home/joe/file.txt', 'wb') as f_out:
        while True:
            c = f_in.read(chunk_size)
            if not c: break
            d = f_out.write(c)

is even higher (4-5x speed-ups).

In the de-compression case, another >= 2x speed-up can be achieved by avoiding the gzip module completely and going through a zlib.decompressobj instead, but of course this is a bit more complicated and should be documented in the zlib docs rather than the gzip docs (if you're interested, I could provide my code for it though).
Using the zlib library compression/decompression speed gets comparable to linux gzip/gunzip.
History
Date User Action Args
2014-04-03 10:40:06wolmasetrecipients: + wolma, docs@python
2014-04-03 10:40:06wolmasetmessageid: <1396521606.53.0.161098821761.issue21146@psf.upfronthosting.co.za>
2014-04-03 10:40:06wolmalinkissue21146 messages
2014-04-03 10:40:05wolmacreate