This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ezio.melotti
Recipients belopolsky, eric.smith, ezio.melotti, lemburg, pitrou
Date 2010-11-25.06:28:47
SpamBayes Score 2.909284e-12
Marked as misclassified No
Message-id <1290666530.87.0.824189289576.issue10521@psf.upfronthosting.co.za>
In-reply-to
Content
I think that methods like str.isalpha can and should be fixed. Since _PyUnicode_IsAlpha now accepts a Py_UCS4, the body of unicode_isalpha can be changed to convert normal chars and surrogates pairs to a Py_UCS4 before calling Py_UNICODE_ISALPHA.
The attached patch is a proof of concept of this approach and returns True for '\N{OLD ITALIC LETTER A}'.isalpha() on a narrow build.
It still has a number of issues that should be addressed (check for narrow builds, check for lone surrogates, check for high surrogate at the end of a string, fix compiler warnings ...) but it should be good enough as a PoC.

I would also suggest to introduce a set of macros to handle surrogates (e.g. detect, combine) and use it in all the functions that need to work with them.
History
Date User Action Args
2010-11-25 06:28:50ezio.melottisetrecipients: + ezio.melotti, lemburg, belopolsky, pitrou, eric.smith
2010-11-25 06:28:50ezio.melottisetmessageid: <1290666530.87.0.824189289576.issue10521@psf.upfronthosting.co.za>
2010-11-25 06:28:48ezio.melottilinkissue10521 messages
2010-11-25 06:28:48ezio.melotticreate