Message 106664 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	doerwalter, eric.araujo, lemburg, loewis, pitrou, vstinner
Date	2010-05-28.13:23:29
SpamBayes Score	5.6305158e-05
Marked as misclassified	No
Message-id	<4BFFC3CF.8090406@egenix.com>
In-reply-to	<1275051738.3101.7.camel@localhost.localdomain>

Content
Antoine Pitrou wrote: > > Antoine Pitrou <pitrou@free.fr> added the comment: > >> class BinaryDataCodec(codecs.Codec): >> >> # Note: Binding these as C functions will result in the class not >> # converting them to methods. This is intended. >> encode = codecs.readbuffer_encode >> decode = codecs.latin_1_decode > > What's the point, though? Creating a non-symmetrical codec doesn't sound > like a very useful or recommandable thing to do. Why not ? If you're only interested in the binary data and don't care about the original input object type, that's a very natural thing to do. E.g. you could use a memory mapped file as input to the encoder. Would you really expect the codec to recreate such a file object when decoding the binary data ? > Especially in the py3k > codec model where encode() only works on unicode objects. That's a common misunderstanding. The codec system does not mandate a specific type combination. Only the helper methods .encode() and .decode() on bytes and str objects in Python3 do. >> While it's possible to emulate the functions via other methods, >> these methods always introduce intermediate objects, which isn't >> necessary and only costs performance. > > The bytes() constructor doesn't (shouldn't) create any more intermediate > objects than read/charbuffer_encode() do. Looking at the code, the data takes quite a long path through the whole machinery. For non-Unicode objects, it always tries to create an integer and only if that fails reverts back to the buffer interface after a few more function calls. Furthermore, the bytes() constructor accepts a lot more objects than the "s#" parser marker, e.g. lists of integers, plain integers, arbitrary iterators, which a codec just interested in the binary representation of an object via the buffer interface most likely doesn't want to accept. > And all this doesn't address the fact that these functions have never > been documented, and don't seem used in the outside world > (understandably so, since there's no way to know about their existence, > and their intended use). That's a documentation bug and probably the result of the fact that none of the exposed encoder/decoder APIs are documented.

Antoine Pitrou wrote:
> 
> Antoine Pitrou <pitrou@free.fr> added the comment:
> 
>> class BinaryDataCodec(codecs.Codec):
>>
>>     # Note: Binding these as C functions will result in the class not
>>     # converting them to methods. This is intended.
>>     encode = codecs.readbuffer_encode
>>     decode = codecs.latin_1_decode
> 
> What's the point, though? Creating a non-symmetrical codec doesn't sound
> like a very useful or recommandable thing to do. 

Why not ? If you're only interested in the binary data and
don't care about the original input object type, that's a
very natural thing to do.

E.g. you could use a memory mapped file as input to the encoder.
Would you really expect the codec to recreate such a file object when
decoding the binary data ?

> Especially in the py3k
> codec model where encode() only works on unicode objects.

That's a common misunderstanding. The codec system does not
mandate a specific type combination. Only the helper methods
.encode() and .decode() on bytes and str objects in Python3 do.

>> While it's possible to emulate the functions via other methods,
>> these methods always introduce intermediate objects, which isn't
>> necessary and only costs performance.
> 
> The bytes() constructor doesn't (shouldn't) create any more intermediate
> objects than read/charbuffer_encode() do.

Looking at the code, the data takes quite a long path through
the whole machinery. For non-Unicode objects, it always tries to create
an integer and only if that fails reverts back to the buffer
interface after a few more function calls.

Furthermore, the bytes() constructor accepts a lot more
objects than the "s#" parser marker, e.g. lists of integers,
plain integers, arbitrary iterators, which a codec
just interested in the binary representation of an
object via the buffer interface most likely doesn't
want to accept.

> And all this doesn't address the fact that these functions have never
> been documented, and don't seem used in the outside world
> (understandably so, since there's no way to know about their existence,
> and their intended use).

That's a documentation bug and probably the result of the fact
that none of the exposed encoder/decoder APIs are documented.

History
Date	User	Action	Args
2010-05-28 13:23:31	lemburg	set	recipients: + lemburg, loewis, doerwalter, pitrou, vstinner, eric.araujo
2010-05-28 13:23:29	lemburg	link	issue8838 messages
2010-05-28 13:23:29	lemburg	create