Author vstinner
Recipients amaury.forgeotdarc, bupjae, ezio.melotti, lemburg, vstinner
Date 2009-02-03.14:18:25
SpamBayes Score 5.19576e-10
Marked as misclassified No
Message-id <>
In-reply-to <>
lemburg> This is not possible for unichr() in Python 2.x, since applications
lemburg> always expect len(unichr(x)) == 1

Oh, ok.

lemburg> Changing ord() would be possible in Python 2.x is easier, since
lemburg> this would only extend the range of returned values for UCS2
lemburg> builds.

ord() of Python3 (narrow build) rejects surrogate characters:

>>> len(chr(0x10000))
>>> ord(0x10000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ord() expected string of length 1, but int found


It looks that narrow builds with surrogates have some more problems...

Test with U+10000: "LINEAR B SYLLABLE B008 A", category: Letter, Other.

Correct result (Python 2.5, wide build):

   $ python
   Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
   >>> unichr(0x10000)
   >>> unichr(0x10000).isalpha()

Error in Python3 (narrow build):

   marge$ ./python
   Python 3.1a0 (py3k:69105M, Feb  3 2009, 15:04:35)
   >>> chr(0x10000).isalpha()
   >>> list(chr(0x10000))
   ['\ud800', '\udc00']
   >>> chr(0xd800).isalpha()
   >>> chr(0xdc00).isalpha()

Unicode ranges, all in the category "Other, Surrogate":
 - U+D800..U+DB7F: Non Private Use High Surrogate
 - U+DB80..U+DBFF: Private Use High Surrogate
 - U+DC00..U+DFFF: Low Surrogate" range
Date User Action Args
2009-02-03 14:18:28vstinnersetrecipients: + vstinner, lemburg, amaury.forgeotdarc, ezio.melotti, bupjae
2009-02-03 14:18:26vstinnerlinkissue5127 messages
2009-02-03 14:18:25vstinnercreate