Message215440
The current documentation of the gzip module should have its section "12.2.1. Examples of usage" updated to reflect the changes made to the module in Python3.2 (https://docs.python.org/3.2/whatsnew/3.2.html#gzip-and-zipfile).
Currently, the recipe given for gz-compressing a file is:
import gzip
with open('/home/joe/file.txt', 'rb') as f_in:
with gzip.open('/home/joe/file.txt.gz', 'wb') as f_out:
f_out.writelines(f_in)
which is clearly sub-optimal because it is line-based.
An equally simple, but more efficient recipe would be:
chunk_size = 1024
with open('/home/joe/file.txt', 'rb') as f_in:
with gzip.open('/home/joe/file.txt.gz', 'wb') as f_out:
while True:
c = f_in.read(chunk_size)
if not c: break
d = f_out.write(c)
Comparing the two examples I find a >= 2x performance gain (both in terms of CPU time and wall time).
In the inverse scenario of file *de*-compression (which is not part of the docs though), the performance increase of substituting:
with gzip.open('/home/joe/file.txt.gz', 'rb') as f_in:
with open('/home/joe/file.txt', 'wb') as f_out:
f_out.writelines(f_in)
with:
with gzip.open('/home/joe/file.txt.gz', 'rb') as f_in:
with open('/home/joe/file.txt', 'wb') as f_out:
while True:
c = f_in.read(chunk_size)
if not c: break
d = f_out.write(c)
is even higher (4-5x speed-ups).
In the de-compression case, another >= 2x speed-up can be achieved by avoiding the gzip module completely and going through a zlib.decompressobj instead, but of course this is a bit more complicated and should be documented in the zlib docs rather than the gzip docs (if you're interested, I could provide my code for it though).
Using the zlib library compression/decompression speed gets comparable to linux gzip/gunzip. |
|
Date |
User |
Action |
Args |
2014-04-03 10:40:06 | wolma | set | recipients:
+ wolma, docs@python |
2014-04-03 10:40:06 | wolma | set | messageid: <1396521606.53.0.161098821761.issue21146@psf.upfronthosting.co.za> |
2014-04-03 10:40:06 | wolma | link | issue21146 messages |
2014-04-03 10:40:05 | wolma | create | |
|