Author amaury.forgeotdarc
Recipients amaury.forgeotdarc, ezio.melotti, lemburg
Date 2010-07-08.13:50:00
SpamBayes Score 0.00559082
Marked as misclassified No
Message-id <1278597003.49.0.847714220976.issue9200@psf.upfronthosting.co.za>
In-reply-to
Content
On narrow unicode builds:
unicodedata.category(chr(0x10000)) == 'Lo'  # correct
Py_UNICODE_ISPRINTABLE(0x10000)    == 1     # correct 
str.isprintable(chr(0x10000))      == False # inconsistent

On narrow unicode builds, large code points are stored with a surrogate pair.  But str.isprintable() simply loops over the Py_UNICODE array, and test the surrogates separately.

There should be a way to walk a unicode string in C, character by character, and the str methods (str.is*, str.to*) should use it.
History
Date User Action Args
2010-07-08 13:50:03amaury.forgeotdarcsetrecipients: + amaury.forgeotdarc, lemburg, ezio.melotti
2010-07-08 13:50:03amaury.forgeotdarcsetmessageid: <1278597003.49.0.847714220976.issue9200@psf.upfronthosting.co.za>
2010-07-08 13:50:01amaury.forgeotdarclinkissue9200 messages
2010-07-08 13:50:00amaury.forgeotdarccreate