Message145251
[Posted the reply to the right ticket; see issue13136 for the original
post to the wrong ticket]
Antoine Pitrou wrote:
>
> Antoine Pitrou <pitrou@free.fr> added the comment:
>
>> Before going further with this, I'd suggest you have a look at your
>> compiler settings.
>
> They are set by the configure script:
>
> gcc -pthread -c -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall
> -Wstrict-prototypes -I. -I./Include -DPy_BUILD_CORE -o
> Objects/unicodeobject.o Objects/unicodeobject.c
Which gcc version are you using ?
Is it possible that you have -fno-builtin enabled ?
>> Such optimizations are normally performed by the
>> compiler and don't need to be implemented in C, making maintenance
>> harder.
>
> The fact that the glibc includes such optimization (in much more
> sophisticated form) suggests to me that many compilers don't perform
> these optimizations automically.
When using gcc, the glibc functions are usually not used at all,
since gcc comes with a (rather large) set of builtins which are
inlined directly, if you have optimizations enabled and inlining
is found to be more efficient than calling the glibc function:
http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
glibc includes the optimized versions since it has to implement
C library (obviously) and for cases where inlining does not
happen.
>> I tested using memchr() when writing those "naive" loops.
>
> memchr() is mentioned in another issue, #13134.
>
>> memchr()
>> is inlined by the compiler just like the direct loop
>
> I don't think so. If you look at the glibc's memchr() implementation,
> it's a sophisticated routine, not a trivial loop. Perhaps you're
> thinking about memcpy().
See http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html and the
assembler output. If it's not inlined, then something must be
preventing this and it would be good to find out why.
>> and the generated
>> code for the direct version is often easier to optimize for the compiler
>> than the memchr() one, since it receives more knowledge about the used
>> data types.
>
> ?? Data types are fixed in the memchr() definition, there's no knowledge
> to be gained by inlining.
There is: the compiler will have alignement information available and
can also benefit from using registers instead of the stack, knowledge
about processor cache lines, etc. Such information is lost when calling
a function. The function call itself will also create some overhead.
BTW: You should not only test the optimization with long strings, but also
with short ones (e.g. 2-15 chars) - which is a much more common case
in practice. |
|
Date |
User |
Action |
Args |
2011-10-09 11:23:04 | lemburg | set | recipients:
+ lemburg, loewis, pitrou, vstinner |
2011-10-09 11:23:03 | lemburg | link | issue13134 messages |
2011-10-09 11:23:03 | lemburg | create | |
|