Message 127551 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	nadeem.vawda
Recipients	MizardX, antlong, eric.araujo, nadeem.vawda, niemeyer, pitrou, rhettinger, wrobell, xuanji
Date	2011-01-30.21:12:49
SpamBayes Score	8.54774e-11
Marked as misclassified	No
Message-id	<1296421972.79.0.442339363972.issue5863@psf.upfronthosting.co.za>
In-reply-to

Content
OK, I've rewritten the whole bz2 module (patch attached), and I think it is now ready for review. The BZ2File implementation is a cleaned-up version of the one from my previous patch, with some further additions. I've factored out the common compressor/decompressor stuff into classes Compressor and Decompressor in the _bz2 extension module; with these, BZ2Compressor, BZ2Decompressor, compress() and decompress() are trivial to implement in Python. My earlier efficiency concerns seem to have been unfounded; I ran some quick tests with a 4MB bz2 file, and there wasn't any measurable performance difference from the existing all-C implementation. I have added a peek() method to BZ2File, in accordance with Antoine's suggestion, but it's not clear how it should interpret its argument. I followed the lead of io.BufferedReader, and simply ignored the arg, returning whatever data as is already buffered. The patch also includes tests for peek() in test_bz2, based on test_io's BufferedRWPairTest. Also, while looking at io.BufferedReader's implementation, I noticed that it doesn't actually seem to use raw.peek() at all. If this is correct, then perhaps peek() is unnecessary, and shouldn't be added. The patch also adds a property 'eof' to BZ2Decompressor, so that the user can test whether EOF has been reached on the compressed stream. For the new files (Modules/_bz2module.c and Lib/bz2.py), I'm guessing there should be some license boilerplate stuff added at the top of each. I wasn't sure exactly what this should look like, though - some advice would be helpful here.

OK, I've rewritten the whole bz2 module (patch attached), and I think it is now ready for review. The BZ2File implementation is a cleaned-up version of the one from my previous patch, with some further additions. I've factored out the common compressor/decompressor stuff into classes Compressor and Decompressor in the _bz2 extension module; with these, BZ2Compressor, BZ2Decompressor, compress() and decompress() are trivial to implement in Python.

My earlier efficiency concerns seem to have been unfounded; I ran some quick tests with a 4MB bz2 file, and there wasn't any measurable performance difference from the existing all-C implementation.

I have added a peek() method to BZ2File, in accordance with Antoine's suggestion, but it's not clear how it should interpret its argument. I followed the lead of io.BufferedReader, and simply ignored the arg, returning whatever data as is already buffered. The patch also includes tests for peek() in test_bz2, based on test_io's BufferedRWPairTest.

Also, while looking at io.BufferedReader's implementation, I noticed that it doesn't actually seem to use raw.peek() at all. If this is correct, then perhaps peek() is unnecessary, and shouldn't be added.

The patch also adds a property 'eof' to BZ2Decompressor, so that the user can test whether EOF has been reached on the compressed stream.

For the new files (Modules/_bz2module.c and Lib/bz2.py), I'm guessing there should be some license boilerplate stuff added at the top of each. I wasn't sure exactly what this should look like, though - some advice would be helpful here.

History
Date	User	Action	Args
2011-01-30 21:12:55	nadeem.vawda	set	recipients: + nadeem.vawda, rhettinger, niemeyer, pitrou, wrobell, eric.araujo, MizardX, antlong, xuanji
2011-01-30 21:12:52	nadeem.vawda	set	messageid: <1296421972.79.0.442339363972.issue5863@psf.upfronthosting.co.za>
2011-01-30 21:12:52	nadeem.vawda	link	issue5863 messages
2011-01-30 21:12:52	nadeem.vawda	create