Message244582
This bug was originally raised against Python 3.3, and the speed has improved a lot since then. Perhaps this bug can be closed as it is, or maybe people would like to consider my decomp-optim.patch which squeezes a bit more speed out. I don’t actually have a strong opinion either way.
Python 3.4 was apparently much faster than 3.3 courtesy of Issue 16034. In Python 3.5, all three decompression modules (LZMA, gzip and bzip) now use a BufferedReader internally, due to my work in Issue 23529. The modules delegate method calls to the internal BufferedReader, rather than returning an instance directly, for backwards compatibility.
I found that bypassing the readline() delegation speeds things up significantly, and adding a custom “closed” property on the underlying raw reader class also helps. However, I did not think it would be wise to bypass the locking in the “bz2” module, I didn’t bypass BZ2File.readline() in the patch. Timing results and a test script I used to investigate different options below:
lzma gzip bz2
======= ======== ========
Unpatched 3.2 s 2.513 s 5.180 s
Custom __iter__() 1.31 s 1.317 s 2.433 s
__iter__() and closed 0.53 s* 0.543 s* 1.650 s
closed change only 4.047 s*
External BufferedReader 0.64 s 0.597 s 1.750 s
Direct from BytesIO 0.33 s 0.370 s 1.280 s
Command-line tool 0.063 s 0.053 s 0.993 s
* Option implemented in decomp-optim.patch
---
import lzma, io
filename = "pacman.log.xz" # 256206 lines; 389 kB -> 13 MB
# Basic case
reader = lzma.LZMAFile(filename) # 3.2 s
# Add __iter__() optimization
def lzma_iter(self):
self._check_can_read()
return iter(self._buffer)
lzma.LZMAFile.__iter__ = lzma_iter # 1.31 s
# Add “closed” optimization
def decompressor_closed(self):
return self._decompressor is None
import _compression
_compression.DecompressReader.closed = property(decompressor_closed) # 0.53 s
#~ # External BufferedReader baseline
#~ reader = io.BufferedReader(lzma.LZMAFile(filename)) # 0.64 s
#~ # Direct from BytesIO baseline
#~ with open(filename, "rb") as file:
#~ data = file.read()
#~ reader = io.BytesIO(lzma.decompress(data)) # 0.33 s
for line in reader:
pass |
|
Date |
User |
Action |
Args |
2015-06-01 13:51:15 | martin.panter | set | recipients:
+ martin.panter, rhettinger, pitrou, vstinner, nadeem.vawda, eric.araujo, Arfrever, serhiy.storchaka, Michael.Fox |
2015-06-01 13:51:15 | martin.panter | set | messageid: <1433166675.09.0.664536742806.issue18003@psf.upfronthosting.co.za> |
2015-06-01 13:51:15 | martin.panter | link | issue18003 messages |
2015-06-01 13:51:13 | martin.panter | create | |
|