This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients amaury.forgeotdarc, gvanrossum, ishimoto, lemburg
Date 2008-05-06.08:26:32
SpamBayes Score 0.00015317625
Marked as misclassified No
Message-id <48201630.9020602@egenix.com>
In-reply-to <ca471dc20805051507w7f4ae799w4965721019b9f6ab@mail.gmail.com>
Content
On 2008-05-06 00:07, Guido van Rossum wrote:
> Guido van Rossum <guido@python.org> added the comment:
> 
> On Fri, Apr 18, 2008 at 1:46 AM, Marc-Andre Lemburg
> <report@bugs.python.org> wrote:
>> On 2008-04-18 05:35, atsuo ishimoto wrote:
>>  > atsuo ishimoto <ishimoto@users.sourceforge.net> added the comment:
>>  >
>>  > Is a codec which encode() returns an Unicode allowed in Python3?
>>
>>  Sure, why not ?
> 
> Actually, it is not. In Py3k, x.encode() always requires x to be a str
> (i.e. unicode) instance and return a bytes instance. y.decode()
> requires y to be a bytes instance and returns a str (i.e. unicode)
> instance.

So you've limited the codec design to just doing Unicode<->bytes
conversions ?

The original codec design was to have the codec decide which
types to take on input and to generate on output, e.g. to
escape characters in Unicode (converting Unicode to Unicode),
work on compressed 8-bit strings (converting 8-bit strings to
8-bit strings), etc.

>>  I think you have to ask another question: Is repr() allowed to
>>  return a string (instead of Unicode) in Py3k ?
> 
> In Py3k, "strings" *are* unicode. The str data type is Unicode.

With "strings" I always refer to 8-bit strings, ie. 8-bit data that
is encoded in some encoding.

> If you're asking about repr() possibly returning a bytes instance,
> definitely not.
> 
>>  If not, then unicode_repr() will have to check the return value of
>>  the codec and convert it back to Unicode as necessary.
> 
> What codec?

The idea is to have a codec which takes the Unicode object and
converts it to its repr()-value.

Now, since you apparently cannot
go the direct way anymore (ie. have the codec encode Unicode to
Unicode), you'd have to first use a codec which converts the Unicode
object to its repr()-value represented as bytes object and then
convert the bytes object back to Unicode in unicode_repr().

With the original design, this extra step wouldn't have been
necessary.

>>  > I started to think codec is not nessesary, but python function is enough.
>>
>>  That's what we currently have with unicode_repr(), but it doesn't
>>  solve the problem.
> 
> I'm lost here.

See my previous replies on this ticket.

> PS. Atsuo's PEP has now been checked in as PEP 3138. Discussion should
> start soon on the python-3000 list.
History
Date User Action Args
2008-05-06 08:26:36lemburgsetspambayes_score: 0.000153176 -> 0.00015317625
recipients: + lemburg, gvanrossum, ishimoto, amaury.forgeotdarc
2008-05-06 08:26:35lemburglinkissue2630 messages
2008-05-06 08:26:33lemburgcreate