Message 109542 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	amaury.forgeotdarc
Recipients	amaury.forgeotdarc, ezio.melotti, lemburg
Date	2010-07-08.13:50:00
SpamBayes Score	0.005590815
Marked as misclassified	No
Message-id	<1278597003.49.0.847714220976.issue9200@psf.upfronthosting.co.za>
In-reply-to

Content
On narrow unicode builds: unicodedata.category(chr(0x10000)) == 'Lo' # correct Py_UNICODE_ISPRINTABLE(0x10000) == 1 # correct str.isprintable(chr(0x10000)) == False # inconsistent On narrow unicode builds, large code points are stored with a surrogate pair. But str.isprintable() simply loops over the Py_UNICODE array, and test the surrogates separately. There should be a way to walk a unicode string in C, character by character, and the str methods (str.is, str.to) should use it.

On narrow unicode builds:
unicodedata.category(chr(0x10000)) == 'Lo'  # correct
Py_UNICODE_ISPRINTABLE(0x10000)    == 1     # correct 
str.isprintable(chr(0x10000))      == False # inconsistent

On narrow unicode builds, large code points are stored with a surrogate pair.  But str.isprintable() simply loops over the Py_UNICODE array, and test the surrogates separately.

There should be a way to walk a unicode string in C, character by character, and the str methods (str.is*, str.to*) should use it.

History
Date	User	Action	Args
2010-07-08 13:50:03	amaury.forgeotdarc	set	recipients: + amaury.forgeotdarc, lemburg, ezio.melotti
2010-07-08 13:50:03	amaury.forgeotdarc	set	messageid: <1278597003.49.0.847714220976.issue9200@psf.upfronthosting.co.za>
2010-07-08 13:50:01	amaury.forgeotdarc	link	issue9200 messages
2010-07-08 13:50:00	amaury.forgeotdarc	create