Message 174860 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	nadeem.vawda
Recipients	christian.heimes, eric.araujo, nadeem.vawda, pitrou, serhiy.storchaka
Date	2012-11-05.01:25:13
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1352078716.48.0.00642826117575.issue15955@psf.upfronthosting.co.za>
In-reply-to

Content
I agree that being able to limit output size is useful and desirable, but I'm not keen on copying the max_length/unconsumed_tail approach used by zlib's decompressor class. It feels awkward to use, and it complicates the implementation of the existing decompress() method, which is already unwieldy enough. As an alternative, I propose a thin wrapper around the underlying C API: def decompress_into(self, src, dst, src_start=0, dst_start=0): ... This would store decompressed data in a caller-provided bytearray, and return a pair of integers indicating the end points of the consumed and produced data in the respective buffers. The implementation should be extremely simple - it does not need to do any memory allocation or reference management. I think it could also be useful for optimizing the implementation of BZ2File and LZMAFile. I plan to write a prototype and run some benchmarks some time in the next few weeks. (Aside: if implemented for zlib, this could also be a nicer (I think) solution for the problem raised in issue 5804.)

I agree that being able to limit output size is useful and desirable, but
I'm not keen on copying the max_length/unconsumed_tail approach used by
zlib's decompressor class. It feels awkward to use, and it complicates
the implementation of the existing decompress() method, which is already
unwieldy enough.

As an alternative, I propose a thin wrapper around the underlying C API:

    def decompress_into(self, src, dst, src_start=0, dst_start=0): ...

This would store decompressed data in a caller-provided bytearray, and
return a pair of integers indicating the end points of the consumed and
produced data in the respective buffers.

The implementation should be extremely simple - it does not need to do
any memory allocation or reference management.

I think it could also be useful for optimizing the implementation of
BZ2File and LZMAFile. I plan to write a prototype and run some benchmarks
some time in the next few weeks.

(Aside: if implemented for zlib, this could also be a nicer (I think)
 solution for the problem raised in issue 5804.)

History
Date	User	Action	Args
2012-11-05 01:25:16	nadeem.vawda	set	recipients: + nadeem.vawda, pitrou, christian.heimes, eric.araujo, serhiy.storchaka
2012-11-05 01:25:16	nadeem.vawda	set	messageid: <1352078716.48.0.00642826117575.issue15955@psf.upfronthosting.co.za>
2012-11-05 01:25:16	nadeem.vawda	link	issue15955 messages
2012-11-05 01:25:13	nadeem.vawda	create