Message160236
> Fill the ascii buffer and then copying can be cheaper than using
> _PyUnicodeWriter with general non-ascii string.
Here is a new patch using _PyUnicodeWriter directly in longobject.c.
According to my benchmark (see below), formating a small number (5 decimal digits) is 17% faster with my patch version 2 compared to tip, and 38% faster compared to Python 3.3 before my optimizations on str%tuples or str.format(). Creating a temporary PyUnicode is not cheap, at least for short strings.
str%tuple and str.format() allocates len(format_string)+100 ASCII characters at the beginning, which is enough for "x={}".format(12345) for example. So only a resize is needed, and it looks like resizing is cheap.
I'm not completly satisfied of the usage of Py_LOCAL_INLINE in unicodeobject.c for _PyUnicodeWriter methods. The same "hacks" (?) should be used in formatter_unicode.c.
Shell script (bench.sh) used to benchmark:
--------
echo -n "{0}.{1}.{2}: "; ./python -m timeit -r 10 -s 'fmt="{0}.{1}.{2}"' 'fmt.format("http", "client", "HTTPConnection")'
echo -n " [line {0:2d}] : "; ./python -m timeit -r 10 -s 'fmt=" [line {0:2d}] "' 'fmt.format(5)'
echo -n "str: "; ./python -m timeit -r 10 -s 'fmt="{0}"*100' 'fmt.format("ABCDEF")'
echo -n "str conv: "; ./python -m timeit -r 10 -s 'fmt="{0:s}"*100' 'fmt.format("ABCDEF")'
echo -n "long x 3: "; ./python -m timeit -r 10 -s 'fmt="x={0} x={0} x={0}"' 'fmt.format(12345)'
echo -n "float x 3: "; ./python -m timeit -r 10 -s 'fmt="x={0} x={0} x={0}"' 'fmt.format(12.345)'
echo -n "complex x 3: "; ./python -m timeit -r 10 -s 'fmt="x={0} x={0} x={0}"' 'fmt.format(12.345+2j)'
echo -n "long, float, complex: "; ./python -m timeit -r 10 -s 'fmt="x={} y={} z={}"' 'fmt.format(12345, 12.345, 12.345+2j)'
echo -n "huge long: "; ./python -m timeit -r 10 -s 'import math; huge=math.factorial(2000); fmt="x={}"' 'fmt.format(huge)'
--------
Results:
--------
3.3:
{0}.{1}.{2}: 1000000 loops, best of 10: 0.394 usec per loop
[line {0:2d}] : 1000000 loops, best of 10: 0.519 usec per loop
str: 100000 loops, best of 10: 7.01 usec per loop
str conv: 100000 loops, best of 10: 13.3 usec per loop
long x 3: 1000000 loops, best of 10: 0.569 usec per loop
float x 3: 1000000 loops, best of 10: 1.62 usec per loop
complex x 3: 100000 loops, best of 10: 3.34 usec per loop
long, float, complex: 100000 loops, best of 10: 2.08 usec per loop
huge long: 1000 loops, best of 10: 666 usec per loop
3.3 + format_writer.patch :
{0}.{1}.{2}: 1000000 loops, best of 10: 0.412 usec per loop (+5%)
[line {0:2d}] : 1000000 loops, best of 10: 0.461 usec per loop (-11%)
str: 100000 loops, best of 10: 6.85 usec per loop (-2%)
str conv: 100000 loops, best of 10: 11.1 usec per loop (-17%)
long x 3: 1000000 loops, best of 10: 0.605 usec per loop (+6%)
float x 3: 1000000 loops, best of 10: 1.57 usec per loop (-3%)
complex x 3: 100000 loops, best of 10: 3.54 usec per loop (+6%)
long, float, complex: 100000 loops, best of 10: 2.19 usec per loop (+5%)
huge long: 1000 loops, best of 10: 665 usec per loop (0%)
3.3 + format_writer-2.patch :
{0}.{1}.{2}: 1000000 loops, best of 10: 0.378 usec per loop (-4%)
[line {0:2d}] : 1000000 loops, best of 10: 0.454 usec per loop (-13%)
str: 100000 loops, best of 10: 6.18 usec per loop (-12%)
str conv: 100000 loops, best of 10: 10.9 usec per loop (-18%)
long x 3: 1000000 loops, best of 10: 0.471 usec per loop (-17%)
float x 3: 1000000 loops, best of 10: 1.37 usec per loop (-15%)
complex x 3: 100000 loops, best of 10: 3.4 usec per loop (+2%)
long, float, complex: 1000000 loops, best of 10: 1.93 usec per loop (-7%)
huge long: 1000 loops, best of 10: 665 usec per loop (0%)
-------- |
|
Date |
User |
Action |
Args |
2012-05-08 23:58:43 | vstinner | set | recipients:
+ vstinner, loewis, mark.dickinson, pitrou, serhiy.storchaka |
2012-05-08 23:58:42 | vstinner | set | messageid: <1336521522.38.0.748341957924.issue14744@psf.upfronthosting.co.za> |
2012-05-08 23:58:41 | vstinner | link | issue14744 messages |
2012-05-08 23:58:41 | vstinner | create | |
|