Message 81053 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	amaury.forgeotdarc, bupjae, ezio.melotti, lemburg, vstinner
Date	2009-02-03.13:18:19
SpamBayes Score	2.628342e-12
Marked as misclassified	No
Message-id	<4988441A.1010607@egenix.com>
In-reply-to	<200902031413.56264.victor.stinner@haypocalc.com>

Content
On 2009-02-03 14:14, STINNER Victor wrote: > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > > amaury> Since r56395, ord() and chr() accept and return surrogate pairs > amaury> even in narrow builds. > > Note: My examples are made with Python 2.x. > >> The goal is to remove most differences between narrow and wide unicode >> builds (except for string lengths, indices or slices) > > It would be nice to get the same behaviour in Python 2.x and 3.x to help > migration from Python2 to Python3 ;-) > > unichr() (in Python 2.x) documentation is correct. But I would approciate to > support surrogates using unichr() which means also changing ord() behaviour. This is not possible for unichr() in Python 2.x, since applications always expect len(unichr(x)) == 1. Changing ord() would be possible in Python 2.x is easier, since this would only extend the range of returned values for UCS2 builds. >> To address this problem, I suggest to change all functions in >> unicodectype.c so that they accept Py_UCS4 characters (instead of >> Py_UNICODE). > > Why? Using surrogates, you can use 16-bits Py_UNICODE to store non-BMP > characters (code > 0xffff). > > -- > > I can open a new issue if you agree that we can change unichr() / ord() > behaviour on narrow build. We may ask on the mailing list?

On 2009-02-03 14:14, STINNER Victor wrote:
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
> amaury> Since r56395, ord() and chr() accept and return surrogate pairs 
> amaury> even in narrow builds.
> 
> Note: My examples are made with Python 2.x.
> 
>> The goal is to remove most differences between narrow and wide unicode
>> builds (except for string lengths, indices or slices)
> 
> It would be nice to get the same behaviour in Python 2.x and 3.x to help 
> migration from Python2 to Python3 ;-)
> 
> unichr() (in Python 2.x) documentation is correct. But I would approciate to 
> support surrogates using unichr() which means also changing ord() behaviour.

This is not possible for unichr() in Python 2.x, since applications
always expect len(unichr(x)) == 1.

Changing ord() would be possible in Python 2.x is easier, since
this would only extend the range of returned values for UCS2
builds.

>> To address this problem, I suggest to change all functions in
>> unicodectype.c so that they accept Py_UCS4 characters (instead of
>> Py_UNICODE).
> 
> Why? Using surrogates, you can use 16-bits Py_UNICODE to store non-BMP 
> characters (code > 0xffff).
> 
> --
> 
> I can open a new issue if you agree that we can change unichr() / ord() 
> behaviour on narrow build. We may ask on the mailing list?

History
Date	User	Action	Args
2009-02-03 13:18:21	lemburg	set	recipients: + lemburg, amaury.forgeotdarc, vstinner, ezio.melotti, bupjae
2009-02-03 13:18:20	lemburg	link	issue5127 messages
2009-02-03 13:18:19	lemburg	create