Message 142222 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	Rhamphoryncus, amaury.forgeotdarc, belopolsky, doerwalter, eric.smith, ezio.melotti, georg.brandl, lemburg, loewis, pitrou, rhettinger, stutzbach, tchrist, vstinner
Date	2011-08-16.20:48:38
SpamBayes Score	1.5983698e-10
Marked as misclassified	No
Message-id	<1313527719.96.0.686337799751.issue10542@psf.upfronthosting.co.za>
In-reply-to

Content
I'm reposting my patch from #12751. I think that it's simpler than belopolsky's patch: it doesn't add public macros in unicodeobject.h and don't add the complex Py_UNICODE_NEXT() macro. My patch only adds private macros in unicodeobject.c to factorize the code. I don't want to add public macros because with the stable API and with the PEP 393, we are trying to hide the Py_UNICODE type and PyUnicodeObject internals. In belopolsky's patch, only Py_UNICODE_NEXT() is used outside unicodeobject.c. Copy/paste of the initial message of my issue #12751 (msg142108): --------------- A lot of code is duplicated in unicodeobject.c to manipulate ("encode/decode") surrogates. Each function has from one to three different implementations. The new decode_ucs4() function adds a new implementation. Attached patch replaces this code by macros. I think that only the implementations of IS_HIGH_SURROGATE and IS_LOW_SURROGATE are important for speed. ((ch & 0xFFFFFC00UL) == 0xD800) (from decode_ucs4) is a little bit faster than (0xD800 <= ch && ch <= 0xDBFF) on my CPU (Atom Z520 @ 1.3 GHz): running test_unicode 4 times takes ~54 sec instead of ~57 sec (-3%). These 3 macros have to be checked, I wrote the first one: #define IS_SURROGATE(ch) (((ch) & 0xFFFFF800UL) == 0xD800) #define IS_HIGH_SURROGATE(ch) (((ch) & 0xFFFFFC00UL) == 0xD800) #define IS_LOW_SURROGATE(ch) (((ch) & 0xFFFFFC00UL) == 0xDC00) I added cast to Py_UCS4 in COMBINE_SURROGATES to avoid integer overflow if Py_UNICODE is 16 bits (narrow build). It's maybe useless. #define COMBINE_SURROGATES(ch1, ch2) \ (((((Py_UCS4)(ch1) & 0x3FF) << 10) \| ((Py_UCS4)(ch2) & 0x3FF)) + 0x10000) HIGH_SURROGATE and LOW_SURROGATE require that their ordinal argument has been preproceed to fit in [0; 0xFFFF]. I added this requirement in the comment of these macros. It would be better to have only one macro to do the two operations, but because "p++" (dereference and increment) is usually used, I prefer to avoid one unique macro (I don't like passing p++ in a macro using its argument more than once). Or we may add a third macro using HIGH_SURROGATE and LOW_SURROGATE. I rewrote the main loop of PyUnicode_EncodeUTF16() to avoid an useless test on ch2 on narrow build. I also added a IS_NONBMP macro just because I prefer macro over hardcoded constants. ---------------

I'm reposting my patch from #12751. I think that it's simpler than belopolsky's patch: it doesn't add public macros in unicodeobject.h and don't add the complex Py_UNICODE_NEXT() macro. My patch only adds private macros in unicodeobject.c to factorize the code.

I don't want to add public macros because with the stable API and with the PEP 393, we are trying to hide the Py_UNICODE type and PyUnicodeObject internals. In belopolsky's patch, only Py_UNICODE_NEXT() is used outside unicodeobject.c.

Copy/paste of the initial message of my issue #12751 (msg142108):
---------------
A lot of code is duplicated in unicodeobject.c to manipulate ("encode/decode") surrogates. Each function has from one to three different implementations. The new decode_ucs4() function adds a new implementation. Attached patch replaces this code by macros.

I think that only the implementations of IS_HIGH_SURROGATE and IS_LOW_SURROGATE are important for speed. ((ch & 0xFFFFFC00UL) == 0xD800) (from decode_ucs4) is *a little bit* faster than (0xD800 <= ch && ch <= 0xDBFF) on my CPU (Atom Z520 @ 1.3 GHz): running test_unicode 4 times takes ~54 sec instead of ~57 sec (-3%).

These 3 macros have to be checked, I wrote the first one:

#define IS_SURROGATE(ch) (((ch) & 0xFFFFF800UL) == 0xD800)
#define IS_HIGH_SURROGATE(ch) (((ch) & 0xFFFFFC00UL) == 0xD800)
#define IS_LOW_SURROGATE(ch) (((ch) & 0xFFFFFC00UL) == 0xDC00)

I added cast to Py_UCS4 in COMBINE_SURROGATES to avoid integer overflow if Py_UNICODE is 16 bits (narrow build). It's maybe useless.

#define COMBINE_SURROGATES(ch1, ch2) \
 (((((Py_UCS4)(ch1) & 0x3FF) << 10) | ((Py_UCS4)(ch2) & 0x3FF)) + 0x10000)

HIGH_SURROGATE and LOW_SURROGATE require that their ordinal argument has been preproceed to fit in [0; 0xFFFF]. I added this requirement in the comment of these macros. It would be better to have only one macro to do the two operations, but because "*p++" (dereference and increment) is usually used, I prefer to avoid one unique macro (I don't like passing *p++ in a macro using its argument more than once).

Or we may add a third macro using HIGH_SURROGATE and LOW_SURROGATE.

I rewrote the main loop of PyUnicode_EncodeUTF16() to avoid an useless test on ch2 on narrow build.

I also added a IS_NONBMP macro just because I prefer macro over hardcoded constants.
---------------

History
Date	User	Action	Args
2011-08-16 20:48:40	vstinner	set	recipients: + vstinner, lemburg, loewis, doerwalter, georg.brandl, rhettinger, amaury.forgeotdarc, belopolsky, Rhamphoryncus, pitrou, eric.smith, stutzbach, ezio.melotti, tchrist
2011-08-16 20:48:39	vstinner	set	messageid: <1313527719.96.0.686337799751.issue10542@psf.upfronthosting.co.za>
2011-08-16 20:48:39	vstinner	link	issue10542 messages
2011-08-16 20:48:39	vstinner	create