Message 121330 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	belopolsky, lemburg, loewis, vstinner
Date	2010-11-17.00:19:07
SpamBayes Score	7.71605e-15
Marked as misclassified	No
Message-id	<4CE31F78.4010705@egenix.com>
In-reply-to	<AANLkTi=y2uaFa1DRinJp__bHdjJskyO6foYJ2DHqF5D0@mail.gmail.com>

Content
Alexander Belopolsky wrote: > > Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment: > > On Tue, Nov 16, 2010 at 5:54 PM, Marc-Andre Lemburg > <report@bugs.python.org> wrote: >> >> Marc-Andre Lemburg <mal@egenix.com> added the comment: >> >> Please note that PyCodec_Encode()/PyCodec_Decode() will return whatever the codec returns for these operations. >> >> The codec system is not limited to converting between Unicode and bytes only. > > Not according to the latest reST documentation: > > """ > * Encoding converts a string object to a bytes object using a > particular character set encoding (e.g., cp1252 or iso-8859-1). > > * Decoding converts a bytes object encoded using a particular > character set encoding to a string object. > """ http://docs.python.org/dev/library/codecs.html?highlight=codecs#codecs.Codec.encode That's another documentation bug, then. The codec system has always supported other type combinations for encoding/decoding as well. Only certain methods on str and bytes objects in 3.x limit the possible types to either str or bytes - which probably results in the idea that Python codecs don't support anything else. The text from the 2.7 documentation is correct, also for 3.x: http://docs.python.org/library/codecs.html#codec-objects >> A typical example is a same-type codec such as rot13 that only transforms Unicode data. > > I thought rot13 would only transform English (or Latin) alphabet. Right, everything else passes through as-is. Other examples are codecs that escape certain code points using e.g. XML entity sequences, backslash notations or other such techniques. For bytes, you have the zip, base64 and hex codecs which work in a similar way.

Alexander Belopolsky wrote:
> 
> Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:
> 
> On Tue, Nov 16, 2010 at 5:54 PM, Marc-Andre Lemburg
> <report@bugs.python.org> wrote:
>>
>> Marc-Andre Lemburg <mal@egenix.com> added the comment:
>>
>> Please note that PyCodec_Encode()/PyCodec_Decode() will return whatever the codec returns for these operations.
>>
>> The codec system is not limited to converting between Unicode and bytes only.
> 
> Not according to the latest reST documentation:
> 
> """
> * Encoding converts a string object to a bytes object using a
> particular character set encoding (e.g., cp1252 or iso-8859-1).
> 
> * Decoding converts a bytes object encoded using a particular
> character set encoding to a string object.
> """ http://docs.python.org/dev/library/codecs.html?highlight=codecs#codecs.Codec.encode

That's another documentation bug, then. The codec system has always
supported other type combinations for encoding/decoding as well.

Only certain methods on str and bytes objects in 3.x limit the possible
types to either str or bytes - which probably results in the
idea that Python codecs don't support anything else.

The text from the 2.7 documentation is correct, also for 3.x:

http://docs.python.org/library/codecs.html#codec-objects

>> A typical example is a same-type codec such as rot13 that only transforms Unicode data.
> 
> I thought rot13 would only transform English (or Latin) alphabet.

Right, everything else passes through as-is.

Other examples are codecs that escape certain code points using e.g.
XML entity sequences, backslash notations or other such techniques.

For bytes, you have the zip, base64 and hex codecs which work in
a similar way.

History
Date	User	Action	Args
2010-11-17 00:19:09	lemburg	set	recipients: + lemburg, loewis, belopolsky, vstinner
2010-11-17 00:19:07	lemburg	link	issue10435 messages
2010-11-17 00:19:07	lemburg	create