Message 81052 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	amaury.forgeotdarc, bupjae, ezio.melotti, lemburg, vstinner
Date	2009-02-03.13:14:03
SpamBayes Score	7.734897e-10
Marked as misclassified	No
Message-id	<200902031413.56264.victor.stinner@haypocalc.com>
In-reply-to	<1233664789.97.0.674612431629.issue5127@psf.upfronthosting.co.za>

Content
amaury> Since r56395, ord() and chr() accept and return surrogate pairs amaury> even in narrow builds. Note: My examples are made with Python 2.x. > The goal is to remove most differences between narrow and wide unicode > builds (except for string lengths, indices or slices) It would be nice to get the same behaviour in Python 2.x and 3.x to help migration from Python2 to Python3 ;-) unichr() (in Python 2.x) documentation is correct. But I would approciate to support surrogates using unichr() which means also changing ord() behaviour. > To address this problem, I suggest to change all functions in > unicodectype.c so that they accept Py_UCS4 characters (instead of > Py_UNICODE). Why? Using surrogates, you can use 16-bits Py_UNICODE to store non-BMP characters (code > 0xffff). -- I can open a new issue if you agree that we can change unichr() / ord() behaviour on narrow build. We may ask on the mailing list?

amaury> Since r56395, ord() and chr() accept and return surrogate pairs 
amaury> even in narrow builds.

Note: My examples are made with Python 2.x.

> The goal is to remove most differences between narrow and wide unicode
> builds (except for string lengths, indices or slices)

It would be nice to get the same behaviour in Python 2.x and 3.x to help 
migration from Python2 to Python3 ;-)

unichr() (in Python 2.x) documentation is correct. But I would approciate to 
support surrogates using unichr() which means also changing ord() behaviour.

> To address this problem, I suggest to change all functions in
> unicodectype.c so that they accept Py_UCS4 characters (instead of
> Py_UNICODE).

Why? Using surrogates, you can use 16-bits Py_UNICODE to store non-BMP 
characters (code > 0xffff).

--

I can open a new issue if you agree that we can change unichr() / ord() 
behaviour on narrow build. We may ask on the mailing list?

History
Date	User	Action	Args
2009-02-03 13:14:06	vstinner	set	recipients: + vstinner, lemburg, amaury.forgeotdarc, ezio.melotti, bupjae
2009-02-03 13:14:04	vstinner	link	issue5127 messages
2009-02-03 13:14:03	vstinner	create