Author ezio.melotti
Recipients amaury.forgeotdarc, bupjae, ezio.melotti, vstinner
Date 2009-02-03.12:26:26
SpamBayes Score 1.29913e-08
Marked as misclassified No
Message-id <1233663990.04.0.590867848095.issue5127@psf.upfronthosting.co.za>
In-reply-to
Content
FWIW, on Python3 it seems to work:
>>> import unicodedata
>>> unicodedata.category("\U00010000")
'Lo'
>>> unicodedata.category("\U00011000")
'Cn'
>>> unicodedata.category(chr(0x10000))
'Lo'
>>> unicodedata.category(chr(0x11000))
'Cn'
>>> ord(chr(0x10000)), 0x10000
(65536, 65536)
>>> ord(chr(0x11000)), 0x11000
(69632, 69632)

I'm using a narrow build too:
>>> import sys
>>> sys.maxunicode
65535
>>> len('\U00010000')
2
>>> ord('\U00010000')
65536

On Python2 unichr() is supposed to raise a ValueError on a narrow build
if the value is greater than 0xFFFF [1], but if the characters above
0xFFFF can be represented with u"\Uxxxxxxxx" there should be a way to
fix unichr so it can return them. Python3 already does it with chr().

Maybe we should open a new issue for this if it's not present already.

[1]: http://docs.python.org/library/functions.html#unichr
History
Date User Action Args
2009-02-03 12:26:30ezio.melottisetrecipients: + ezio.melotti, amaury.forgeotdarc, vstinner, bupjae
2009-02-03 12:26:30ezio.melottisetmessageid: <1233663990.04.0.590867848095.issue5127@psf.upfronthosting.co.za>
2009-02-03 12:26:27ezio.melottilinkissue5127 messages
2009-02-03 12:26:27ezio.melotticreate