Message 66307 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	amaury.forgeotdarc, gvanrossum, ishimoto, lemburg
Date	2008-05-06.08:26:32
SpamBayes Score	0.00015317625
Marked as misclassified	No
Message-id	<48201630.9020602@egenix.com>
In-reply-to	<ca471dc20805051507w7f4ae799w4965721019b9f6ab@mail.gmail.com>

Content
On 2008-05-06 00:07, Guido van Rossum wrote: > Guido van Rossum <guido@python.org> added the comment: > > On Fri, Apr 18, 2008 at 1:46 AM, Marc-Andre Lemburg > <report@bugs.python.org> wrote: >> On 2008-04-18 05:35, atsuo ishimoto wrote: >> > atsuo ishimoto <ishimoto@users.sourceforge.net> added the comment: >> > >> > Is a codec which encode() returns an Unicode allowed in Python3? >> >> Sure, why not ? > > Actually, it is not. In Py3k, x.encode() always requires x to be a str > (i.e. unicode) instance and return a bytes instance. y.decode() > requires y to be a bytes instance and returns a str (i.e. unicode) > instance. So you've limited the codec design to just doing Unicode<->bytes conversions ? The original codec design was to have the codec decide which types to take on input and to generate on output, e.g. to escape characters in Unicode (converting Unicode to Unicode), work on compressed 8-bit strings (converting 8-bit strings to 8-bit strings), etc. >> I think you have to ask another question: Is repr() allowed to >> return a string (instead of Unicode) in Py3k ? > > In Py3k, "strings" are unicode. The str data type is Unicode. With "strings" I always refer to 8-bit strings, ie. 8-bit data that is encoded in some encoding. > If you're asking about repr() possibly returning a bytes instance, > definitely not. > >> If not, then unicode_repr() will have to check the return value of >> the codec and convert it back to Unicode as necessary. > > What codec? The idea is to have a codec which takes the Unicode object and converts it to its repr()-value. Now, since you apparently cannot go the direct way anymore (ie. have the codec encode Unicode to Unicode), you'd have to first use a codec which converts the Unicode object to its repr()-value represented as bytes object and then convert the bytes object back to Unicode in unicode_repr(). With the original design, this extra step wouldn't have been necessary. >> > I started to think codec is not nessesary, but python function is enough. >> >> That's what we currently have with unicode_repr(), but it doesn't >> solve the problem. > > I'm lost here. See my previous replies on this ticket. > PS. Atsuo's PEP has now been checked in as PEP 3138. Discussion should > start soon on the python-3000 list.

On 2008-05-06 00:07, Guido van Rossum wrote:
> Guido van Rossum <guido@python.org> added the comment:
> 
> On Fri, Apr 18, 2008 at 1:46 AM, Marc-Andre Lemburg
> <report@bugs.python.org> wrote:
>> On 2008-04-18 05:35, atsuo ishimoto wrote:
>>  > atsuo ishimoto <ishimoto@users.sourceforge.net> added the comment:
>>  >
>>  > Is a codec which encode() returns an Unicode allowed in Python3?
>>
>>  Sure, why not ?
> 
> Actually, it is not. In Py3k, x.encode() always requires x to be a str
> (i.e. unicode) instance and return a bytes instance. y.decode()
> requires y to be a bytes instance and returns a str (i.e. unicode)
> instance.

So you've limited the codec design to just doing Unicode<->bytes
conversions ?

The original codec design was to have the codec decide which
types to take on input and to generate on output, e.g. to
escape characters in Unicode (converting Unicode to Unicode),
work on compressed 8-bit strings (converting 8-bit strings to
8-bit strings), etc.

>>  I think you have to ask another question: Is repr() allowed to
>>  return a string (instead of Unicode) in Py3k ?
> 
> In Py3k, "strings" *are* unicode. The str data type is Unicode.

With "strings" I always refer to 8-bit strings, ie. 8-bit data that
is encoded in some encoding.

> If you're asking about repr() possibly returning a bytes instance,
> definitely not.
> 
>>  If not, then unicode_repr() will have to check the return value of
>>  the codec and convert it back to Unicode as necessary.
> 
> What codec?

The idea is to have a codec which takes the Unicode object and
converts it to its repr()-value.

Now, since you apparently cannot
go the direct way anymore (ie. have the codec encode Unicode to
Unicode), you'd have to first use a codec which converts the Unicode
object to its repr()-value represented as bytes object and then
convert the bytes object back to Unicode in unicode_repr().

With the original design, this extra step wouldn't have been
necessary.

>>  > I started to think codec is not nessesary, but python function is enough.
>>
>>  That's what we currently have with unicode_repr(), but it doesn't
>>  solve the problem.
> 
> I'm lost here.

See my previous replies on this ticket.

> PS. Atsuo's PEP has now been checked in as PEP 3138. Discussion should
> start soon on the python-3000 list.

History
Date	User	Action	Args
2008-05-06 08:26:36	lemburg	set	spambayes_score: 0.000153176 -> 0.00015317625 recipients: + lemburg, gvanrossum, ishimoto, amaury.forgeotdarc
2008-05-06 08:26:35	lemburg	link	issue2630 messages
2008-05-06 08:26:33	lemburg	create