Message 145251 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	lemburg, loewis, pitrou, vstinner
Date	2011-10-09.11:23:03
SpamBayes Score	2.7755576e-16
Marked as misclassified	No
Message-id	<4E918414.9030605@egenix.com>
In-reply-to	<1318113072.9190.14.camel@localhost.localdomain>

Content
[Posted the reply to the right ticket; see issue13136 for the original post to the wrong ticket] Antoine Pitrou wrote: > > Antoine Pitrou <pitrou@free.fr> added the comment: > >> Before going further with this, I'd suggest you have a look at your >> compiler settings. > > They are set by the configure script: > > gcc -pthread -c -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall > -Wstrict-prototypes -I. -I./Include -DPy_BUILD_CORE -o > Objects/unicodeobject.o Objects/unicodeobject.c Which gcc version are you using ? Is it possible that you have -fno-builtin enabled ? >> Such optimizations are normally performed by the >> compiler and don't need to be implemented in C, making maintenance >> harder. > > The fact that the glibc includes such optimization (in much more > sophisticated form) suggests to me that many compilers don't perform > these optimizations automically. When using gcc, the glibc functions are usually not used at all, since gcc comes with a (rather large) set of builtins which are inlined directly, if you have optimizations enabled and inlining is found to be more efficient than calling the glibc function: http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html glibc includes the optimized versions since it has to implement C library (obviously) and for cases where inlining does not happen. >> I tested using memchr() when writing those "naive" loops. > > memchr() is mentioned in another issue, #13134. > >> memchr() >> is inlined by the compiler just like the direct loop > > I don't think so. If you look at the glibc's memchr() implementation, > it's a sophisticated routine, not a trivial loop. Perhaps you're > thinking about memcpy(). See http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html and the assembler output. If it's not inlined, then something must be preventing this and it would be good to find out why. >> and the generated >> code for the direct version is often easier to optimize for the compiler >> than the memchr() one, since it receives more knowledge about the used >> data types. > > ?? Data types are fixed in the memchr() definition, there's no knowledge > to be gained by inlining. There is: the compiler will have alignement information available and can also benefit from using registers instead of the stack, knowledge about processor cache lines, etc. Such information is lost when calling a function. The function call itself will also create some overhead. BTW: You should not only test the optimization with long strings, but also with short ones (e.g. 2-15 chars) - which is a much more common case in practice.

[Posted the reply to the right ticket; see issue13136 for the original
 post to the wrong ticket]

Antoine Pitrou wrote:
> 
> Antoine Pitrou <pitrou@free.fr> added the comment:
> 
>> Before going further with this, I'd suggest you have a look at your
>> compiler settings.
> 
> They are set by the configure script:
> 
> gcc -pthread -c -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall
> -Wstrict-prototypes    -I. -I./Include    -DPy_BUILD_CORE -o
> Objects/unicodeobject.o Objects/unicodeobject.c

Which gcc version are you using ?
Is it possible that you have -fno-builtin enabled ?

>> Such optimizations are normally performed by the
>> compiler and don't need to be implemented in C, making maintenance
>> harder.
> 
> The fact that the glibc includes such optimization (in much more
> sophisticated form) suggests to me that many compilers don't perform
> these optimizations automically.

When using gcc, the glibc functions are usually not used at all,
since gcc comes with a (rather large) set of builtins which are
inlined directly, if you have optimizations enabled and inlining
is found to be more efficient than calling the glibc function:

http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

glibc includes the optimized versions since it has to implement
C library (obviously) and for cases where inlining does not
happen.

>> I tested using memchr() when writing those "naive" loops.
> 
> memchr() is mentioned in another issue, #13134.
> 
>> memchr()
>> is inlined by the compiler just like the direct loop
> 
> I don't think so. If you look at the glibc's memchr() implementation,
> it's a sophisticated routine, not a trivial loop. Perhaps you're
> thinking about memcpy().

See http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html and the
assembler output. If it's not inlined, then something must be
preventing this and it would be good to find out why.

>> and the generated
>> code for the direct version is often easier to optimize for the compiler
>> than the memchr() one, since it receives more knowledge about the used
>> data types.
> 
> ?? Data types are fixed in the memchr() definition, there's no knowledge
> to be gained by inlining.

There is: the compiler will have alignement information available and
can also benefit from using registers instead of the stack, knowledge
about processor cache lines, etc. Such information is lost when calling
a function. The function call itself will also create some overhead.

BTW: You should not only test the optimization with long strings, but also
with short ones (e.g. 2-15 chars) - which is a much more common case
in practice.

History
Date	User	Action	Args
2011-10-09 11:23:04	lemburg	set	recipients: + lemburg, loewis, pitrou, vstinner
2011-10-09 11:23:03	lemburg	link	issue13134 messages
2011-10-09 11:23:03	lemburg	create