Message122336
I think that methods like str.isalpha can and should be fixed. Since _PyUnicode_IsAlpha now accepts a Py_UCS4, the body of unicode_isalpha can be changed to convert normal chars and surrogates pairs to a Py_UCS4 before calling Py_UNICODE_ISALPHA.
The attached patch is a proof of concept of this approach and returns True for '\N{OLD ITALIC LETTER A}'.isalpha() on a narrow build.
It still has a number of issues that should be addressed (check for narrow builds, check for lone surrogates, check for high surrogate at the end of a string, fix compiler warnings ...) but it should be good enough as a PoC.
I would also suggest to introduce a set of macros to handle surrogates (e.g. detect, combine) and use it in all the functions that need to work with them. |
|
Date |
User |
Action |
Args |
2010-11-25 06:28:50 | ezio.melotti | set | recipients:
+ ezio.melotti, lemburg, belopolsky, pitrou, eric.smith |
2010-11-25 06:28:50 | ezio.melotti | set | messageid: <1290666530.87.0.824189289576.issue10521@psf.upfronthosting.co.za> |
2010-11-25 06:28:48 | ezio.melotti | link | issue10521 messages |
2010-11-25 06:28:48 | ezio.melotti | create | |
|