Message 122336 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ezio.melotti
Recipients	belopolsky, eric.smith, ezio.melotti, lemburg, pitrou
Date	2010-11-25.06:28:47
SpamBayes Score	2.909284e-12
Marked as misclassified	No
Message-id	<1290666530.87.0.824189289576.issue10521@psf.upfronthosting.co.za>
In-reply-to

Content
I think that methods like str.isalpha can and should be fixed. Since _PyUnicode_IsAlpha now accepts a Py_UCS4, the body of unicode_isalpha can be changed to convert normal chars and surrogates pairs to a Py_UCS4 before calling Py_UNICODE_ISALPHA. The attached patch is a proof of concept of this approach and returns True for '\N{OLD ITALIC LETTER A}'.isalpha() on a narrow build. It still has a number of issues that should be addressed (check for narrow builds, check for lone surrogates, check for high surrogate at the end of a string, fix compiler warnings ...) but it should be good enough as a PoC. I would also suggest to introduce a set of macros to handle surrogates (e.g. detect, combine) and use it in all the functions that need to work with them.

I think that methods like str.isalpha can and should be fixed. Since _PyUnicode_IsAlpha now accepts a Py_UCS4, the body of unicode_isalpha can be changed to convert normal chars and surrogates pairs to a Py_UCS4 before calling Py_UNICODE_ISALPHA.
The attached patch is a proof of concept of this approach and returns True for '\N{OLD ITALIC LETTER A}'.isalpha() on a narrow build.
It still has a number of issues that should be addressed (check for narrow builds, check for lone surrogates, check for high surrogate at the end of a string, fix compiler warnings ...) but it should be good enough as a PoC.

I would also suggest to introduce a set of macros to handle surrogates (e.g. detect, combine) and use it in all the functions that need to work with them.

History
Date	User	Action	Args
2010-11-25 06:28:50	ezio.melotti	set	recipients: + ezio.melotti, lemburg, belopolsky, pitrou, eric.smith
2010-11-25 06:28:50	ezio.melotti	set	messageid: <1290666530.87.0.824189289576.issue10521@psf.upfronthosting.co.za>
2010-11-25 06:28:48	ezio.melotti	link	issue10521 messages
2010-11-25 06:28:48	ezio.melotti	create