This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients lemburg, loewis, pitrou, vstinner
Date 2011-10-09.11:23:03
SpamBayes Score 2.7755576e-16
Marked as misclassified No
Message-id <4E918414.9030605@egenix.com>
In-reply-to <1318113072.9190.14.camel@localhost.localdomain>
Content
[Posted the reply to the right ticket; see issue13136 for the original
 post to the wrong ticket]

Antoine Pitrou wrote:
> 
> Antoine Pitrou <pitrou@free.fr> added the comment:
> 
>> Before going further with this, I'd suggest you have a look at your
>> compiler settings.
> 
> They are set by the configure script:
> 
> gcc -pthread -c -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall
> -Wstrict-prototypes    -I. -I./Include    -DPy_BUILD_CORE -o
> Objects/unicodeobject.o Objects/unicodeobject.c

Which gcc version are you using ?
Is it possible that you have -fno-builtin enabled ?

>> Such optimizations are normally performed by the
>> compiler and don't need to be implemented in C, making maintenance
>> harder.
> 
> The fact that the glibc includes such optimization (in much more
> sophisticated form) suggests to me that many compilers don't perform
> these optimizations automically.

When using gcc, the glibc functions are usually not used at all,
since gcc comes with a (rather large) set of builtins which are
inlined directly, if you have optimizations enabled and inlining
is found to be more efficient than calling the glibc function:

http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

glibc includes the optimized versions since it has to implement
C library (obviously) and for cases where inlining does not
happen.

>> I tested using memchr() when writing those "naive" loops.
> 
> memchr() is mentioned in another issue, #13134.
> 
>> memchr()
>> is inlined by the compiler just like the direct loop
> 
> I don't think so. If you look at the glibc's memchr() implementation,
> it's a sophisticated routine, not a trivial loop. Perhaps you're
> thinking about memcpy().

See http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html and the
assembler output. If it's not inlined, then something must be
preventing this and it would be good to find out why.

>> and the generated
>> code for the direct version is often easier to optimize for the compiler
>> than the memchr() one, since it receives more knowledge about the used
>> data types.
> 
> ?? Data types are fixed in the memchr() definition, there's no knowledge
> to be gained by inlining.

There is: the compiler will have alignement information available and
can also benefit from using registers instead of the stack, knowledge
about processor cache lines, etc. Such information is lost when calling
a function. The function call itself will also create some overhead.

BTW: You should not only test the optimization with long strings, but also
with short ones (e.g. 2-15 chars) - which is a much more common case
in practice.
History
Date User Action Args
2011-10-09 11:23:04lemburgsetrecipients: + lemburg, loewis, pitrou, vstinner
2011-10-09 11:23:03lemburglinkissue13134 messages
2011-10-09 11:23:03lemburgcreate