Message81048
FWIW, on Python3 it seems to work:
>>> import unicodedata
>>> unicodedata.category("\U00010000")
'Lo'
>>> unicodedata.category("\U00011000")
'Cn'
>>> unicodedata.category(chr(0x10000))
'Lo'
>>> unicodedata.category(chr(0x11000))
'Cn'
>>> ord(chr(0x10000)), 0x10000
(65536, 65536)
>>> ord(chr(0x11000)), 0x11000
(69632, 69632)
I'm using a narrow build too:
>>> import sys
>>> sys.maxunicode
65535
>>> len('\U00010000')
2
>>> ord('\U00010000')
65536
On Python2 unichr() is supposed to raise a ValueError on a narrow build
if the value is greater than 0xFFFF [1], but if the characters above
0xFFFF can be represented with u"\Uxxxxxxxx" there should be a way to
fix unichr so it can return them. Python3 already does it with chr().
Maybe we should open a new issue for this if it's not present already.
[1]: http://docs.python.org/library/functions.html#unichr |
|
Date |
User |
Action |
Args |
2009-02-03 12:26:30 | ezio.melotti | set | recipients:
+ ezio.melotti, amaury.forgeotdarc, vstinner, bupjae |
2009-02-03 12:26:30 | ezio.melotti | set | messageid: <1233663990.04.0.590867848095.issue5127@psf.upfronthosting.co.za> |
2009-02-03 12:26:27 | ezio.melotti | link | issue5127 messages |
2009-02-03 12:26:27 | ezio.melotti | create | |
|