This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Michael.Fox
Recipients Michael.Fox, nadeem.vawda, vstinner
Date 2013-05-18.21:48:21
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <CABbL6oZnJtD7CE6hUcvBhO2Px1x86g5fPBapm2f9mRCWREPuyw@mail.gmail.com>
In-reply-to <CABbL6oaiyhE_WcXdPd9fOp8XLQ-co5cSiFGfHWEo9YYFdy5erg@mail.gmail.com>
Content
I looked into it a little and it looks like pyliblzma is a pure C
extension whereas new lzma library wraps liblzma but the rest is
python. In particular this happens for every line:

        if size < 0:
            end = self._buffer.find(b"\n", self._buffer_offset) + 1
            if end > 0:
                line = self._buffer[self._buffer_offset : end]
                self._buffer_offset = end
                self._pos += len(line)
                return line

And while that doesn't look like a lot of overhead, it's definitely
something. So, unless someone thinks that a pure C extension is the
right technical direction, lzma in 3.4 is probably as fast as it's
ever going to be. I will just use the workaround of piping in unxz
regardless.

On Sat, May 18, 2013 at 2:12 PM, Michael Fox <415fox@gmail.com> wrote:
> 3.4 is much better but still 4x slower than 2.7
>
> m@air:~/q/topaz/parse_datalog$ time python2.7 lzmaperf.py
> 102368
>
> real    0m0.053s
> user    0m0.052s
> sys     0m0.000s
> m@air:~/q/topaz/parse_datalog$ time
> ~/tmp/cpython-23836f17e4a2/bin/python3.4 lzmaperf.py
> 102368
>
> real    0m0.229s
> user    0m0.212s
> sys     0m0.012s
>
> The bottleneck has moved here:
>  102369    0.151    0.000    0.226    0.000 lzma.py:333(readline)
>
> I don't know if this is a strictly fair comparison. The lzma module
> and pyliblzma may not be of the same quality. I've just come across a
> real bug in pyliblzma. It doesn't apply to this test, but who knows
> what shortcuts it's taking.
>
> Finally, here's a baseline:
>
> m@air:~/q/topaz/parse_datalog$ time xzcat bigfile.xz | wc -l
> 102368
>
> real    0m0.034s
> user    0m0.024s
> sys     0m0.016s
>
> On Sat, May 18, 2013 at 12:46 PM, Nadeem Vawda <report@bugs.python.org> wrote:
>>
>> Nadeem Vawda added the comment:
>>
>> Have you tried running the benchmark against the default (3.4) branch?
>> There was some significant optimization work done in issue 16034, but
>> the changes were not backported to 3.3.
>>
>> ----------
>>
>> _______________________________________
>> Python tracker <report@bugs.python.org>
>> <http://bugs.python.org/issue18003>
>> _______________________________________
>
>
>
> --
>
> -
> Michael

-- 

-
Michael
History
Date User Action Args
2013-05-18 21:48:22Michael.Foxsetrecipients: + Michael.Fox, vstinner, nadeem.vawda
2013-05-18 21:48:22Michael.Foxlinkissue18003 messages
2013-05-18 21:48:21Michael.Foxcreate