This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author nadeem.vawda
Recipients MizardX, antlong, eric.araujo, nadeem.vawda, niemeyer, pitrou, rhettinger, wrobell, xuanji
Date 2011-01-26.15:04:01
SpamBayes Score 5.579878e-09
Marked as misclassified No
Message-id <1296054241.96.0.184635002178.issue5863@psf.upfronthosting.co.za>
In-reply-to
Content
>> * The read*() methods are implemented very inefficiently. Since they
>> have to deal with the bytes objects returned by
>> BZ2Decompressor.decompress(), a large read results in lots of
>> allocations that weren't necessary in the C implementation.
>
> It probably depends on the buffer size. Trying to fix this /might/ be
> premature optimization.

Actually, looking at the code again (and not being half-asleep this time), I
think readline() and readlines() are fine. My worry is about read(), where the
problem isn't the size of the buffer but rather the fact that every byte that is
read gets copied around more than necessary:
* Read into the readahead buffer in _fill_readahead().
* Copy into 'data' in _read_block()
* Copy into newly-allocated bytes object for read()'s return value

But you're right; this is probably premature optimization. I'll do some proper
performance measurements before I jump into rewriting. In the meanwhile, FWIW,
I noticed that with the Python implementation, test_bz2 took 20% longer than
with my C implementation (~1.5s up from ~1.25s). I don't think this is a very
reliable indicator of real-world performance, though.

> Also, as with GzipFile one goal should be for BZFile to be wrappable in
> a io.BufferedReader, which has its own very fast buffering layer (and
> also a fast readline() if you implement peek() in BZFile).

Ah, OK. I suppose that is a sensible way of using it. peek() will be quite easy
to implement. How should it interpret its argument, though? PEP3116 (New I/O)
makes no mention of the function. BufferedReader appears to ignore it and
return however much data is convenient.
History
Date User Action Args
2011-01-26 15:04:02nadeem.vawdasetrecipients: + nadeem.vawda, rhettinger, niemeyer, pitrou, wrobell, eric.araujo, MizardX, antlong, xuanji
2011-01-26 15:04:01nadeem.vawdasetmessageid: <1296054241.96.0.184635002178.issue5863@psf.upfronthosting.co.za>
2011-01-26 15:04:01nadeem.vawdalinkissue5863 messages
2011-01-26 15:04:01nadeem.vawdacreate