Author pitrou
Recipients Michael.Fox, nadeem.vawda, pitrou, rhettinger, serhiy.storchaka, vstinner
Date 2013-05-19.14:07:00
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1368972420.6.0.139242466915.issue18003@psf.upfronthosting.co.za>
In-reply-to
Content
I second Serhiy here. Wrapping the LZMAFile in a BufferedReader is the simple solution to the performance problem:

 ./python -m timeit -s "import lzma, io" "f=lzma.LZMAFile('words.xz', 'r')" "for line in f: pass"
10 loops, best of 3: 148 msec per loop

$ ./python -m timeit -s "import lzma, io" "f=io.BufferedReader(lzma.LZMAFile('words.xz', 'r'))" "for line in f: pass"
10 loops, best of 3: 44.3 msec per loop

$ time xzcat words.xz | wc -l
99156

real	0m0.021s
user	0m0.016s
sys	0m0.004s


Perhaps the top-level lzma.open() should do the wrapping for you, though.
Interestingly, opening in text (i.e. unicode) mode is almost as fast as with a BufferedReader:

$ ./python -m timeit -s "import lzma, io" "f=lzma.open('words.xz', 'rt')" "for line in f: pass"
10 loops, best of 3: 51.1 msec per loop
History
Date User Action Args
2013-05-19 14:07:00pitrousetrecipients: + pitrou, rhettinger, vstinner, nadeem.vawda, serhiy.storchaka, Michael.Fox
2013-05-19 14:07:00pitrousetmessageid: <1368972420.6.0.139242466915.issue18003@psf.upfronthosting.co.za>
2013-05-19 14:07:00pitroulinkissue18003 messages
2013-05-19 14:07:00pitroucreate