Message 253476 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	eric.smith
Recipients	Mark.Shannon, eric.smith, larry
Date	2015-10-26.15:44:56
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1445874299.22.0.773155754186.issue25483@psf.upfronthosting.co.za>
In-reply-to

Content
Currently, the f-string f'a{3!r:10}' evaluates to bytecode that does the same thing as: ''.join(['a', format(repr(3), '10')]) That is, it literally calls the functions format() and repr(). The same holds true for str() and ascii() with !s and !a, respectively. By redefining format, str, repr, and ascii, you can break or pervert the computation of the f-string's value: >>> def format(v, fmt=None): return '42' ... >>> f'{3}' '42' It's always been my intention to fix this. This patch adds an opcode FORMAT_VALUE, which instead of looking up format, etc., directly calls PyObject_Format, PyObject_Str, PyObject_Repr, and PyObject_ASCII. Thus, you can no longer modify what an f-string produces merely by overriding the named functions. In addition, because I'm now saving the name lookups and function calls, performance is improved. Here are the times without this patch: $ ./python -m timeit -s 'x="test"' 'f"{x}"' 1000000 loops, best of 3: 0.3 usec per loop $ ./python -m timeit -s 'x="test"' 'f"{x!s}"' 1000000 loops, best of 3: 0.511 usec per loop $ ./python -m timeit -s 'x="test"' 'f"{x!r}"' 1000000 loops, best of 3: 0.497 usec per loop $ ./python -m timeit -s 'x="test"' 'f"{x!a}"' 1000000 loops, best of 3: 0.461 usec per loop And with this patch: $ ./python -m timeit -s 'x="test"' 'f"{x}"' 10000000 loops, best of 3: 0.02 usec per loop $ ./python -m timeit -s 'x="test"' 'f"{x!s}"' 100000000 loops, best of 3: 0.02 usec per loop $ ./python -m timeit -s 'x="test"' 'f"{x!r}"' 10000000 loops, best of 3: 0.0896 usec per loop $ ./python -m timeit -s 'x="test"' 'f"{x!a}"' 10000000 loops, best of 3: 0.0923 usec per loop So a 90%+ speedup, for these simple cases. Also, now f-strings are faster than %-formatting, at least for some types: $ ./python -m timeit -s 'x="test"' '"%s"%x' 10000000 loops, best of 3: 0.0755 usec per loop $ ./python -m timeit -s 'x="test"' 'f"{x}"' 10000000 loops, best of 3: 0.02 usec per loop Note that people often "benchmark" %-formatting with code like the following. But the optimizer converts this to a constant string, so it's not a fair comparison: $ ./python -m timeit '"%s"%"test"' 100000000 loops, best of 3: 0.0161 usec per loop These microbenchmarks aren't the end of the story, since the string concatenation also takes some time. That's another optimization I might implement in the future. Thanks to Mark and Larry for some advice on this.

Currently, the f-string f'a{3!r:10}' evaluates to bytecode that does the same thing as:

''.join(['a', format(repr(3), '10')])

That is, it literally calls the functions format() and repr(). The same holds true for str() and ascii() with !s and !a, respectively.

By redefining format, str, repr, and ascii, you can break or pervert the computation of the f-string's value:

>>> def format(v, fmt=None): return '42'
...
>>> f'{3}'
'42'

It's always been my intention to fix this. This patch adds an opcode FORMAT_VALUE, which instead of looking up format, etc., directly calls PyObject_Format, PyObject_Str, PyObject_Repr, and PyObject_ASCII. Thus, you can no longer modify what an f-string produces merely by overriding the named functions.


In addition, because I'm now saving the name lookups and function calls, performance is improved.

Here are the times without this patch:

$ ./python -m timeit -s 'x="test"' 'f"{x}"'
1000000 loops, best of 3: 0.3 usec per loop

$ ./python -m timeit -s 'x="test"' 'f"{x!s}"'
1000000 loops, best of 3: 0.511 usec per loop

$ ./python -m timeit -s 'x="test"' 'f"{x!r}"'
1000000 loops, best of 3: 0.497 usec per loop

$ ./python -m timeit -s 'x="test"' 'f"{x!a}"'
1000000 loops, best of 3: 0.461 usec per loop


And with this patch:

$ ./python -m timeit -s 'x="test"' 'f"{x}"'
10000000 loops, best of 3: 0.02 usec per loop

$ ./python -m timeit -s 'x="test"' 'f"{x!s}"'
100000000 loops, best of 3: 0.02 usec per loop

$ ./python -m timeit -s 'x="test"' 'f"{x!r}"'
10000000 loops, best of 3: 0.0896 usec per loop

$ ./python -m timeit -s 'x="test"' 'f"{x!a}"'
10000000 loops, best of 3: 0.0923 usec per loop


So a 90%+ speedup, for these simple cases.

Also, now f-strings are faster than %-formatting, at least for some types:

$ ./python -m timeit -s 'x="test"' '"%s"%x'
10000000 loops, best of 3: 0.0755 usec per loop

$ ./python -m timeit -s 'x="test"' 'f"{x}"'
10000000 loops, best of 3: 0.02 usec per loop


Note that people often "benchmark" %-formatting with code like the following. But the optimizer converts this to a constant string, so it's not a fair comparison:

$ ./python -m timeit '"%s"%"test"'
100000000 loops, best of 3: 0.0161 usec per loop


These microbenchmarks aren't the end of the story, since the string concatenation also takes some time. That's another optimization I might implement in the future.

Thanks to Mark and Larry for some advice on this.

History
Date	User	Action	Args
2015-10-26 15:44:59	eric.smith	set	recipients: + eric.smith, larry, Mark.Shannon
2015-10-26 15:44:59	eric.smith	set	messageid: <1445874299.22.0.773155754186.issue25483@psf.upfronthosting.co.za>
2015-10-26 15:44:59	eric.smith	link	issue25483 messages
2015-10-26 15:44:58	eric.smith	create