Author vstinner
Recipients Rhamphoryncus, amaury.forgeotdarc, belopolsky, doerwalter, eric.smith, ezio.melotti, georg.brandl, lemburg, loewis, pitrou, rhettinger, stutzbach, tchrist, vstinner
Date 2011-08-17.09:56:16
SpamBayes Score 6.3696e-11
Marked as misclassified No
Message-id <4E4B903D.8090205@haypocalc.com>
In-reply-to <1313557447.24.0.77234115752.issue10542@psf.upfronthosting.co.za>
Content
Le 17/08/2011 07:04, Ezio Melotti a écrit :
> As I said in msg142175 I think the Py_UNICODE_IS{HIGH|LOW|}SURROGATE and Py_UNICODE_JOIN_SURROGATES can be committed without trailing _ in 3.3 and with trailing _ in 2.7/3.2.  They should go in unicodeobject.h

For Python 2.7 and 3.2, I would prefer to not touch a public header, and 
so add the macros in unicodeobject.c.

> and be public in 3.3+.

If you want to make my HIGH_SURROGATE and LOW_SURROGATE macros public, 
they will use to substract 0x10000 themself (whereas my macros require 
the ordinal to be preproceed).

>   * _Py_UNICODE_NEXT and _Py_UNICODE_PUT_NEXT are useful, so once we have agreed about the name they can go in.  They can be private in all the 3 branches and made public in 3.4 if they work well;

Note: I don't think that _Py_UNICODE*NEXT should go into Python 2.7 or 3.2.

>   * IS_NONBMP doesn't simplify much the code but makes it more readable.  ICU has U_IS_BMP, but in most of the cases we want to check for non-BMP, so if we add this macro it might be ok to check for non-BMP;

If you want to make it public, it's better to call it PyUNICODE_IS_BMP() 
(check if the argument is in U+0000-U+FFFF).

>   * I'm not sure HIGH_SURROGATE/LOW_SURROGATE are useful with _Py_UNICODE_NEXT.  If they are they should get a better name because the current one is not clear about what they do.

They are still useful for UTF-16 encoders (to UTF-16-LE/BE and 16-bit 
wchar_t*). We can keep HIGH_SURROGATE and LOW_SURROGATE private in 
unicodeobject.c.

> Unless someone disagrees I'll prepare a patch with PyUNICODE_IS_{HIGH_|LOW_|}SURROGATE and Py_UNICODE_JOIN_SURROGATES for unicodeobject.h, using them where necessary, using with Victor implementation and commit it (after a review).

Cool. I suppose that you mean PyUNICODE_JOIN_SURROGATES (not 
Py_UNICODE_JOIN_SURROGATES). I used the verb "combine", taken from a 
comment in unicodeobject.c. "combine" is maybe better than "join"?
History
Date User Action Args
2011-08-17 09:56:17vstinnersetrecipients: + vstinner, lemburg, loewis, doerwalter, georg.brandl, rhettinger, amaury.forgeotdarc, belopolsky, Rhamphoryncus, pitrou, eric.smith, stutzbach, ezio.melotti, tchrist
2011-08-17 09:56:16vstinnerlinkissue10542 messages
2011-08-17 09:56:16vstinnercreate