Author Michael.Fox
Recipients Michael.Fox, nadeem.vawda, pitrou, rhettinger, serhiy.storchaka, vstinner
Date 2013-05-19.16:49:14
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <CABbL6oaK54inBiiG9KwTGR+bpFZ7JDPApYO5RiWOoLG5__fr4A@mail.gmail.com>
In-reply-to <1368972420.6.0.139242466915.issue18003@psf.upfronthosting.co.za>
Content
io.BufferedReader works well for me. Thanks for the good suggestion.
Now python 3.3 and 3.4 have similar performance to each other and they
are only 2x slower than pyliblzma.

From my perspective default wrapping with io.BufferedReader is a great
idea. I can't think of who would suffer. Maybe someone who wants to
open thousands of simultaneous streams wouldn't appreciate the memory
overhead. If that person exists then he would want an option to turn
it off.

m@air:~/q/topaz/parse_datalog$ time python2 lzmaperf.py
102368

real    0m0.049s
user    0m0.040s
sys     0m0.008s
m@air:~/q/topaz/parse_datalog$ time python3 lzmaperf.py
102368

real    0m0.109s
user    0m0.092s
sys     0m0.020s
m@air:~/q/topaz/parse_datalog$ time
~/tmp/cpython-23836f17e4a2/bin/python3 lzmaperf.py
102368

real    0m0.101s
user    0m0.084s
sys     0m0.012s

On Sun, May 19, 2013 at 7:07 AM, Antoine Pitrou <report@bugs.python.org> wrote:
>
> Antoine Pitrou added the comment:
>
> I second Serhiy here. Wrapping the LZMAFile in a BufferedReader is the simple solution to the performance problem:
>
>  ./python -m timeit -s "import lzma, io" "f=lzma.LZMAFile('words.xz', 'r')" "for line in f: pass"
> 10 loops, best of 3: 148 msec per loop
>
> $ ./python -m timeit -s "import lzma, io" "f=io.BufferedReader(lzma.LZMAFile('words.xz', 'r'))" "for line in f: pass"
> 10 loops, best of 3: 44.3 msec per loop
>
> $ time xzcat words.xz | wc -l
> 99156
>
> real    0m0.021s
> user    0m0.016s
> sys     0m0.004s
>
>
> Perhaps the top-level lzma.open() should do the wrapping for you, though.
> Interestingly, opening in text (i.e. unicode) mode is almost as fast as with a BufferedReader:
>
> $ ./python -m timeit -s "import lzma, io" "f=lzma.open('words.xz', 'rt')" "for line in f: pass"
> 10 loops, best of 3: 51.1 msec per loop
>
> ----------
> nosy: +pitrou
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue18003>
> _______________________________________

-- 

-
Michael
History
Date User Action Args
2013-05-19 16:49:14Michael.Foxsetrecipients: + Michael.Fox, rhettinger, pitrou, vstinner, nadeem.vawda, serhiy.storchaka
2013-05-19 16:49:14Michael.Foxlinkissue18003 messages
2013-05-19 16:49:14Michael.Foxcreate