Message145193
Antoine Pitrou wrote:
>
> New submission from Antoine Pitrou <pitrou@free.fr>:
>
> This patch speeds up _PyUnicode_CONVERT_BYTES by unrolling its loop.
>
> Example micro-benchmark:
>
> ./python -m timeit -s "a='x'*10000;b='\u0102'*1000;c='\U00100000'" "a+b+c"
>
> -> before:
> 100000 loops, best of 3: 14.9 usec per loop
> -> after:
> 100000 loops, best of 3: 9.19 usec per loop
Before going further with this, I'd suggest you have a look at your
compiler settings. Such optimizations are normally performed by the
compiler and don't need to be implemented in C, making maintenance
harder.
The fact that Windows doesn't exhibit the same performance difference
suggests that the optimizer is not using the same level or feature
set as on Linux. MSVC is at least as good at optimizing code as gcc,
often better.
I tested using memchr() when writing those "naive" loops. It turned
out that using memchr() was slower than using the direct loops. memchr()
is inlined by the compiler just like the direct loop and the generated
code for the direct version is often easier to optimize for the compiler
than the memchr() one, since it receives more knowledge about the used
data types. |
|
Date |
User |
Action |
Args |
2011-10-08 22:29:13 | lemburg | set | recipients:
+ lemburg, pitrou |
2011-10-08 22:29:12 | lemburg | link | issue13136 messages |
2011-10-08 22:29:12 | lemburg | create | |
|