Message 81048 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ezio.melotti
Recipients	amaury.forgeotdarc, bupjae, ezio.melotti, vstinner
Date	2009-02-03.12:26:26
SpamBayes Score	1.2991277e-08
Marked as misclassified	No
Message-id	<1233663990.04.0.590867848095.issue5127@psf.upfronthosting.co.za>
In-reply-to

Content
FWIW, on Python3 it seems to work: >>> import unicodedata >>> unicodedata.category("\U00010000") 'Lo' >>> unicodedata.category("\U00011000") 'Cn' >>> unicodedata.category(chr(0x10000)) 'Lo' >>> unicodedata.category(chr(0x11000)) 'Cn' >>> ord(chr(0x10000)), 0x10000 (65536, 65536) >>> ord(chr(0x11000)), 0x11000 (69632, 69632) I'm using a narrow build too: >>> import sys >>> sys.maxunicode 65535 >>> len('\U00010000') 2 >>> ord('\U00010000') 65536 On Python2 unichr() is supposed to raise a ValueError on a narrow build if the value is greater than 0xFFFF [1], but if the characters above 0xFFFF can be represented with u"\Uxxxxxxxx" there should be a way to fix unichr so it can return them. Python3 already does it with chr(). Maybe we should open a new issue for this if it's not present already. [1]: http://docs.python.org/library/functions.html#unichr

FWIW, on Python3 it seems to work:
>>> import unicodedata
>>> unicodedata.category("\U00010000")
'Lo'
>>> unicodedata.category("\U00011000")
'Cn'
>>> unicodedata.category(chr(0x10000))
'Lo'
>>> unicodedata.category(chr(0x11000))
'Cn'
>>> ord(chr(0x10000)), 0x10000
(65536, 65536)
>>> ord(chr(0x11000)), 0x11000
(69632, 69632)

I'm using a narrow build too:
>>> import sys
>>> sys.maxunicode
65535
>>> len('\U00010000')
2
>>> ord('\U00010000')
65536

On Python2 unichr() is supposed to raise a ValueError on a narrow build
if the value is greater than 0xFFFF [1], but if the characters above
0xFFFF can be represented with u"\Uxxxxxxxx" there should be a way to
fix unichr so it can return them. Python3 already does it with chr().

Maybe we should open a new issue for this if it's not present already.

[1]: http://docs.python.org/library/functions.html#unichr

History
Date	User	Action	Args
2009-02-03 12:26:30	ezio.melotti	set	recipients: + ezio.melotti, amaury.forgeotdarc, vstinner, bupjae
2009-02-03 12:26:30	ezio.melotti	set	messageid: <1233663990.04.0.590867848095.issue5127@psf.upfronthosting.co.za>
2009-02-03 12:26:27	ezio.melotti	link	issue5127 messages
2009-02-03 12:26:27	ezio.melotti	create