Message 142256 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	Rhamphoryncus, amaury.forgeotdarc, belopolsky, doerwalter, eric.smith, ezio.melotti, georg.brandl, lemburg, loewis, pitrou, rhettinger, stutzbach, tchrist, vstinner
Date	2011-08-17.09:56:16
SpamBayes Score	6.369605e-11
Marked as misclassified	No
Message-id	<4E4B903D.8090205@haypocalc.com>
In-reply-to	<1313557447.24.0.77234115752.issue10542@psf.upfronthosting.co.za>

Content
Le 17/08/2011 07:04, Ezio Melotti a écrit : > As I said in msg142175 I think the Py_UNICODE_IS{HIGH\|LOW\|}SURROGATE and Py_UNICODE_JOIN_SURROGATES can be committed without trailing _ in 3.3 and with trailing _ in 2.7/3.2. They should go in unicodeobject.h For Python 2.7 and 3.2, I would prefer to not touch a public header, and so add the macros in unicodeobject.c. > and be public in 3.3+. If you want to make my HIGH_SURROGATE and LOW_SURROGATE macros public, they will use to substract 0x10000 themself (whereas my macros require the ordinal to be preproceed). > * _Py_UNICODE_NEXT and _Py_UNICODE_PUT_NEXT are useful, so once we have agreed about the name they can go in. They can be private in all the 3 branches and made public in 3.4 if they work well; Note: I don't think that _Py_UNICODENEXT should go into Python 2.7 or 3.2. > IS_NONBMP doesn't simplify much the code but makes it more readable. ICU has U_IS_BMP, but in most of the cases we want to check for non-BMP, so if we add this macro it might be ok to check for non-BMP; If you want to make it public, it's better to call it PyUNICODE_IS_BMP() (check if the argument is in U+0000-U+FFFF). > * I'm not sure HIGH_SURROGATE/LOW_SURROGATE are useful with _Py_UNICODE_NEXT. If they are they should get a better name because the current one is not clear about what they do. They are still useful for UTF-16 encoders (to UTF-16-LE/BE and 16-bit wchar_t*). We can keep HIGH_SURROGATE and LOW_SURROGATE private in unicodeobject.c. > Unless someone disagrees I'll prepare a patch with PyUNICODE_IS_{HIGH_\|LOW_\|}SURROGATE and Py_UNICODE_JOIN_SURROGATES for unicodeobject.h, using them where necessary, using with Victor implementation and commit it (after a review). Cool. I suppose that you mean PyUNICODE_JOIN_SURROGATES (not Py_UNICODE_JOIN_SURROGATES). I used the verb "combine", taken from a comment in unicodeobject.c. "combine" is maybe better than "join"?

Le 17/08/2011 07:04, Ezio Melotti a écrit :
> As I said in msg142175 I think the Py_UNICODE_IS{HIGH|LOW|}SURROGATE and Py_UNICODE_JOIN_SURROGATES can be committed without trailing _ in 3.3 and with trailing _ in 2.7/3.2.  They should go in unicodeobject.h

For Python 2.7 and 3.2, I would prefer to not touch a public header, and 
so add the macros in unicodeobject.c.

> and be public in 3.3+.

If you want to make my HIGH_SURROGATE and LOW_SURROGATE macros public, 
they will use to substract 0x10000 themself (whereas my macros require 
the ordinal to be preproceed).

>   * _Py_UNICODE_NEXT and _Py_UNICODE_PUT_NEXT are useful, so once we have agreed about the name they can go in.  They can be private in all the 3 branches and made public in 3.4 if they work well;

Note: I don't think that _Py_UNICODE*NEXT should go into Python 2.7 or 3.2.

>   * IS_NONBMP doesn't simplify much the code but makes it more readable.  ICU has U_IS_BMP, but in most of the cases we want to check for non-BMP, so if we add this macro it might be ok to check for non-BMP;

If you want to make it public, it's better to call it PyUNICODE_IS_BMP() 
(check if the argument is in U+0000-U+FFFF).

>   * I'm not sure HIGH_SURROGATE/LOW_SURROGATE are useful with _Py_UNICODE_NEXT.  If they are they should get a better name because the current one is not clear about what they do.

They are still useful for UTF-16 encoders (to UTF-16-LE/BE and 16-bit 
wchar_t*). We can keep HIGH_SURROGATE and LOW_SURROGATE private in 
unicodeobject.c.

> Unless someone disagrees I'll prepare a patch with PyUNICODE_IS_{HIGH_|LOW_|}SURROGATE and Py_UNICODE_JOIN_SURROGATES for unicodeobject.h, using them where necessary, using with Victor implementation and commit it (after a review).

Cool. I suppose that you mean PyUNICODE_JOIN_SURROGATES (not 
Py_UNICODE_JOIN_SURROGATES). I used the verb "combine", taken from a 
comment in unicodeobject.c. "combine" is maybe better than "join"?

History
Date	User	Action	Args
2011-08-17 09:56:17	vstinner	set	recipients: + vstinner, lemburg, loewis, doerwalter, georg.brandl, rhettinger, amaury.forgeotdarc, belopolsky, Rhamphoryncus, pitrou, eric.smith, stutzbach, ezio.melotti, tchrist
2011-08-17 09:56:16	vstinner	link	issue10542 messages
2011-08-17 09:56:16	vstinner	create