Message 189592 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	pitrou
Recipients	Michael.Fox, nadeem.vawda, pitrou, rhettinger, serhiy.storchaka, vstinner
Date	2013-05-19.14:07:00
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1368972420.6.0.139242466915.issue18003@psf.upfronthosting.co.za>
In-reply-to

Content
I second Serhiy here. Wrapping the LZMAFile in a BufferedReader is the simple solution to the performance problem: ./python -m timeit -s "import lzma, io" "f=lzma.LZMAFile('words.xz', 'r')" "for line in f: pass" 10 loops, best of 3: 148 msec per loop $ ./python -m timeit -s "import lzma, io" "f=io.BufferedReader(lzma.LZMAFile('words.xz', 'r'))" "for line in f: pass" 10 loops, best of 3: 44.3 msec per loop $ time xzcat words.xz \| wc -l 99156 real 0m0.021s user 0m0.016s sys 0m0.004s Perhaps the top-level lzma.open() should do the wrapping for you, though. Interestingly, opening in text (i.e. unicode) mode is almost as fast as with a BufferedReader: $ ./python -m timeit -s "import lzma, io" "f=lzma.open('words.xz', 'rt')" "for line in f: pass" 10 loops, best of 3: 51.1 msec per loop

I second Serhiy here. Wrapping the LZMAFile in a BufferedReader is the simple solution to the performance problem:

 ./python -m timeit -s "import lzma, io" "f=lzma.LZMAFile('words.xz', 'r')" "for line in f: pass"
10 loops, best of 3: 148 msec per loop

$ ./python -m timeit -s "import lzma, io" "f=io.BufferedReader(lzma.LZMAFile('words.xz', 'r'))" "for line in f: pass"
10 loops, best of 3: 44.3 msec per loop

$ time xzcat words.xz | wc -l
99156

real	0m0.021s
user	0m0.016s
sys	0m0.004s


Perhaps the top-level lzma.open() should do the wrapping for you, though.
Interestingly, opening in text (i.e. unicode) mode is almost as fast as with a BufferedReader:

$ ./python -m timeit -s "import lzma, io" "f=lzma.open('words.xz', 'rt')" "for line in f: pass"
10 loops, best of 3: 51.1 msec per loop

History
Date	User	Action	Args
2013-05-19 14:07:00	pitrou	set	recipients: + pitrou, rhettinger, vstinner, nadeem.vawda, serhiy.storchaka, Michael.Fox
2013-05-19 14:07:00	pitrou	set	messageid: <1368972420.6.0.139242466915.issue18003@psf.upfronthosting.co.za>
2013-05-19 14:07:00	pitrou	link	issue18003 messages
2013-05-19 14:07:00	pitrou	create