Message142253
As I said in msg142175 I think the Py_UNICODE_IS{HIGH|LOW|}SURROGATE and Py_UNICODE_JOIN_SURROGATES can be committed without trailing _ in 3.3 and with trailing _ in 2.7/3.2. They should go in unicodeobject.h and be public in 3.3+.
Regarding the name, it would be fine with me to use PyUNICODE_IS_HIGH_SURROGATE. Other IS* macros don't use spaces, but JOIN_SURROGATES and other proposed macros (e.g. PUT_NEXT/WRITE_NEXT) do. Also these macros are not related to any existing API like e.g. isalpha. I think HIGH/LOW are fine, we can mention lead/trail in the doc.
Regarding the implementation, we could use Victor's one if it's faster and it has no other side effects.
Regarding the other macros:
* _Py_UNICODE_NEXT and _Py_UNICODE_PUT_NEXT are useful, so once we have agreed about the name they can go in. They can be private in all the 3 branches and made public in 3.4 if they work well;
* IS_NONBMP doesn't simplify much the code but makes it more readable. ICU has U_IS_BMP, but in most of the cases we want to check for non-BMP, so if we add this macro it might be ok to check for non-BMP;
* I'm not sure HIGH_SURROGATE/LOW_SURROGATE are useful with _Py_UNICODE_NEXT. If they are they should get a better name because the current one is not clear about what they do.
Unless someone disagrees I'll prepare a patch with PyUNICODE_IS_{HIGH_|LOW_|}SURROGATE and Py_UNICODE_JOIN_SURROGATES for unicodeobject.h, using them where necessary, using with Victor implementation and commit it (after a review).
We can think about the rest later. |
|
Date |
User |
Action |
Args |
2011-08-17 05:04:07 | ezio.melotti | set | recipients:
+ ezio.melotti, lemburg, loewis, doerwalter, georg.brandl, rhettinger, amaury.forgeotdarc, belopolsky, Rhamphoryncus, pitrou, vstinner, eric.smith, stutzbach, tchrist |
2011-08-17 05:04:07 | ezio.melotti | set | messageid: <1313557447.24.0.77234115752.issue10542@psf.upfronthosting.co.za> |
2011-08-17 05:04:06 | ezio.melotti | link | issue10542 messages |
2011-08-17 05:04:06 | ezio.melotti | create | |
|