This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients Rhamphoryncus, amaury.forgeotdarc, belopolsky, doerwalter, eric.smith, ezio.melotti, georg.brandl, lemburg, loewis, pitrou, rhettinger, stutzbach, tchrist, vstinner
Date 2011-08-17.10:07:21
SpamBayes Score 1.5766799e-10
Marked as misclassified No
Message-id <4E4B92D3.80303@egenix.com>
In-reply-to <4E4B903D.8090205@haypocalc.com>
Content
STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
> Le 17/08/2011 07:04, Ezio Melotti a écrit :
>> As I said in msg142175 I think the Py_UNICODE_IS{HIGH|LOW|}SURROGATE and Py_UNICODE_JOIN_SURROGATES can be committed without trailing _ in 3.3 and with trailing _ in 2.7/3.2.  They should go in unicodeobject.h

Ezio used two different naming schemes in his email. Please always
use Py_UNICODE_... or _Py_UNICODE (not PyUNICODE_ or _PyUNICODE_).

> For Python 2.7 and 3.2, I would prefer to not touch a public header, and 
> so add the macros in unicodeobject.c.

Why would you want to touch Python 2.7 at all ?

>> and be public in 3.3+.
> 
> If you want to make my HIGH_SURROGATE and LOW_SURROGATE macros public, 
> they will use to substract 0x10000 themself (whereas my macros require 
> the ordinal to be preproceed).

This can be done by having two definitions of the macros: one set for
UCS2 builds and one for UCS4.

>>   * _Py_UNICODE_NEXT and _Py_UNICODE_PUT_NEXT are useful, so once we have agreed about the name they can go in.  They can be private in all the 3 branches and made public in 3.4 if they work well;
> 
> Note: I don't think that _Py_UNICODE*NEXT should go into Python 2.7 or 3.2.

Certainly not into Python 2.7. Adding macros in patch level releases is
also not such a good idea.

>>   * IS_NONBMP doesn't simplify much the code but makes it more readable.  ICU has U_IS_BMP, but in most of the cases we want to check for non-BMP, so if we add this macro it might be ok to check for non-BMP;
> 
> If you want to make it public, it's better to call it PyUNICODE_IS_BMP() 
> (check if the argument is in U+0000-U+FFFF).

Py_UNICODE_IS_BMP() please.

>>   * I'm not sure HIGH_SURROGATE/LOW_SURROGATE are useful with _Py_UNICODE_NEXT.  If they are they should get a better name because the current one is not clear about what they do.
> 
> They are still useful for UTF-16 encoders (to UTF-16-LE/BE and 16-bit 
> wchar_t*). We can keep HIGH_SURROGATE and LOW_SURROGATE private in 
> unicodeobject.c.
>
>> Unless someone disagrees I'll prepare a patch with PyUNICODE_IS_{HIGH_|LOW_|}SURROGATE and Py_UNICODE_JOIN_SURROGATES for unicodeobject.h, using them where necessary, using with Victor implementation and commit it (after a review).
> 
> Cool. I suppose that you mean PyUNICODE_JOIN_SURROGATES (not 
> Py_UNICODE_JOIN_SURROGATES). I used the verb "combine", taken from a 
> comment in unicodeobject.c. "combine" is maybe better than "join"?

No, Py_UNICODE_... please !

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

________________________________________________________________________
2011-10-04: PyCon DE 2011, Leipzig, Germany                48 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/
History
Date User Action Args
2011-08-17 10:07:22lemburgsetrecipients: + lemburg, loewis, doerwalter, georg.brandl, rhettinger, amaury.forgeotdarc, belopolsky, Rhamphoryncus, pitrou, vstinner, eric.smith, stutzbach, ezio.melotti, tchrist
2011-08-17 10:07:21lemburglinkissue10542 messages
2011-08-17 10:07:21lemburgcreate