Message189605
io.BufferedReader works well for me. Thanks for the good suggestion.
Now python 3.3 and 3.4 have similar performance to each other and they
are only 2x slower than pyliblzma.
From my perspective default wrapping with io.BufferedReader is a great
idea. I can't think of who would suffer. Maybe someone who wants to
open thousands of simultaneous streams wouldn't appreciate the memory
overhead. If that person exists then he would want an option to turn
it off.
m@air:~/q/topaz/parse_datalog$ time python2 lzmaperf.py
102368
real 0m0.049s
user 0m0.040s
sys 0m0.008s
m@air:~/q/topaz/parse_datalog$ time python3 lzmaperf.py
102368
real 0m0.109s
user 0m0.092s
sys 0m0.020s
m@air:~/q/topaz/parse_datalog$ time
~/tmp/cpython-23836f17e4a2/bin/python3 lzmaperf.py
102368
real 0m0.101s
user 0m0.084s
sys 0m0.012s
On Sun, May 19, 2013 at 7:07 AM, Antoine Pitrou <report@bugs.python.org> wrote:
>
> Antoine Pitrou added the comment:
>
> I second Serhiy here. Wrapping the LZMAFile in a BufferedReader is the simple solution to the performance problem:
>
> ./python -m timeit -s "import lzma, io" "f=lzma.LZMAFile('words.xz', 'r')" "for line in f: pass"
> 10 loops, best of 3: 148 msec per loop
>
> $ ./python -m timeit -s "import lzma, io" "f=io.BufferedReader(lzma.LZMAFile('words.xz', 'r'))" "for line in f: pass"
> 10 loops, best of 3: 44.3 msec per loop
>
> $ time xzcat words.xz | wc -l
> 99156
>
> real 0m0.021s
> user 0m0.016s
> sys 0m0.004s
>
>
> Perhaps the top-level lzma.open() should do the wrapping for you, though.
> Interestingly, opening in text (i.e. unicode) mode is almost as fast as with a BufferedReader:
>
> $ ./python -m timeit -s "import lzma, io" "f=lzma.open('words.xz', 'rt')" "for line in f: pass"
> 10 loops, best of 3: 51.1 msec per loop
>
> ----------
> nosy: +pitrou
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue18003>
> _______________________________________
--
-
Michael |
|
Date |
User |
Action |
Args |
2013-05-19 16:49:14 | Michael.Fox | set | recipients:
+ Michael.Fox, rhettinger, pitrou, vstinner, nadeem.vawda, serhiy.storchaka |
2013-05-19 16:49:14 | Michael.Fox | link | issue18003 messages |
2013-05-19 16:49:14 | Michael.Fox | create | |
|