This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients lemburg, pitrou
Date 2011-10-08.22:29:12
SpamBayes Score 6.4205635e-10
Marked as misclassified No
Message-id <4E90CEAE.9020702@egenix.com>
In-reply-to <1318112311.49.0.66949742663.issue13136@psf.upfronthosting.co.za>
Content
Antoine Pitrou wrote:
> 
> New submission from Antoine Pitrou <pitrou@free.fr>:
> 
> This patch speeds up _PyUnicode_CONVERT_BYTES by unrolling its loop.
> 
> Example micro-benchmark:
> 
> ./python -m timeit -s "a='x'*10000;b='\u0102'*1000;c='\U00100000'" "a+b+c"
> 
> -> before:
> 100000 loops, best of 3: 14.9 usec per loop
> -> after:
> 100000 loops, best of 3: 9.19 usec per loop

Before going further with this, I'd suggest you have a look at your
compiler settings. Such optimizations are normally performed by the
compiler and don't need to be implemented in C, making maintenance
harder.

The fact that Windows doesn't exhibit the same performance difference
suggests that the optimizer is not using the same level or feature
set as on Linux. MSVC is at least as good at optimizing code as gcc,
often better.

I tested using memchr() when writing those "naive" loops. It turned
out that using memchr() was slower than using the direct loops. memchr()
is inlined by the compiler just like the direct loop and the generated
code for the direct version is often easier to optimize for the compiler
than the memchr() one, since it receives more knowledge about the used
data types.
History
Date User Action Args
2011-10-08 22:29:13lemburgsetrecipients: + lemburg, pitrou
2011-10-08 22:29:12lemburglinkissue13136 messages
2011-10-08 22:29:12lemburgcreate