classification
Title: Accelerate 'string' % (value, ...) by using formatted string literals
Type: performance Stage: patch review
Components: Interpreter Core Versions: Python 3.11
process
Status: open Resolution:
Dependencies: 11549 Superseder:
Assigned To: serhiy.storchaka Nosy List: Mark.Shannon, brandtbucher, eric.smith, serhiy.storchaka, taleinat, ztane
Priority: normal Keywords: patch

Created on 2016-09-29 08:49 by serhiy.storchaka, last changed 2021-05-23 13:40 by serhiy.storchaka.

Pull Requests
URL Status Linked Edit
PR 5012 merged serhiy.storchaka, 2017-12-26 00:04
PR 26160 open serhiy.storchaka, 2021-05-16 11:30
PR 26318 merged serhiy.storchaka, 2021-05-23 13:40
Messages (8)
msg277688 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-29 08:49
For now using formatted string literals (PEP498) is the fastest way of formatting strings.

$ ./python -m perf timeit -s 'k = "foo"; v = "bar"' -- '"%s = %r" % (k, v)'
Median +- std dev: 2.27 us +- 0.20 us

$ ./python -m perf timeit -s 'k = "foo"; v = "bar"' -- 'f"{k!s} = {v!r}"'
Median +- std dev: 1.09 us +- 0.08 us

The compiler could translate C-style formatting with literal format string to the equivalent formatted string literal. The code '%s = %r' % (k, v) could be translated to

    t1 = k; t2 = v; f'{t1!r} = {t2!s}'; del t1, t2

or even simpler if k and v are initialized local variables.

$ ./python -m perf timeit -s 'k = "foo"; v = "bar"' -- 't1 = k; t2 = v; f"{t1!s} = {t2!r}"; del t1, t2'
Median +- std dev: 1.22 us +- 0.05 us

This is not easy issue and needs first implementing the AST optimizer.
msg277694 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2016-09-29 09:42
There isn't a direct mapping between %-formatting and __format__ format specifiers. Off the top of my head, I can think of at least one difference:

>>> '%i' % 3
'3'
>>> '{:i}'.format(3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Unknown format code 'i' for object of type 'int'

So you'll need to be careful with edge cases like this.

Also, for all usages of %s, remember to call str() (or add !s):

>>> '%s' % 1
'1'
>>> f'{1:s}'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Unknown format code 's' for object of type 'int'
>>> f'{1!s:s}'
'1'

Although that also reminds me of this default alignment difference:
>>> x=0
>>> '%2s' % x
' 0'
>>> f'{x!s:2s}'
'0 '
>>> f'{x!s:>2s}'
' 0'

So, in general, the mapping will be difficult. On the other hand, if you can do it, and provide a function that maps between %-formatting codes and __format__ codes, then that might be a generally useful tool.
msg277700 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-29 10:44
'%s' % x should be translated to f'{x!s}', not to f'{x:s}'. Only %s, %r and %a can be supported. Formatting with %i should left untranslated. Or maybe translate '%r: %i' % (a, x) to f'{a!r}: {"%i" % x}'.

It is possible also to introduce special opcodes that converts argument to exact int or float. Then '%06i' % x could be translated to f'{__exact_int__(x):06}'.
msg277702 - (view) Author: Antti Haapala (ztane) * Date: 2016-09-29 12:01
Serhiy, you actually did make a mistake above; `'%s' % x` cannot be rewritten as `f'{x!s}'`, only `'%s' % (x,)` can be optimized... 

(just try with `x = 1, 2`)
msg277703 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-29 12:40
Thanks for the correction Antti. Yes, this is what I initially meant. This optimization is applicable only if the left argument of % is a literal string and the right argument is a tuple expression. Saying about `'%s' % x` I meant a component of the tuple.
msg309049 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-12-26 00:07
PR 5012 implements transformation simple format strings containing only %s, %r and %a into f-strings.
msg324795 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-09-07 21:15
I'm +1 on this optimization.
msg393740 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-05-16 11:50
PR 26160 adds support of %d, %i, %u, %o, %x, %X, %f, %e, %g, %F, %E, %G.

What is not supported:

* Formatting with a single value not wrapped into a 1-tuple (like in "%d bytes" % size). The behavior is non-trivial, it needs checking whether the value is a tuple or a mapping. It would require adding more opcodes and AST nodes and generating complex bytecode.
* Mapping keys (like in "%(name)s"). They are rarely used, and if used, it is not performance critical.
* Precision for integer formatting. Precision is not supported in new-style formatting of integers, and it is not trivial to reproduce this behavior.
* Variable width and precision (like in "%*.*s"). It is possible to support them, but the code would be pretty complex, and the benefit is small, because this feature is rarely used and is not performance critical.
* Format code %c. It is relatively rarely used.
* Length modifiers "h", "l" and "L" (like in "%ld"). They ignored in Python and I did not see them in real code. While supporting them is easy, it would requires adding more than one line of code, it is not worth it.
History
Date User Action Args
2021-05-23 13:40:03serhiy.storchakasetpull_requests: + pull_request24913
2021-05-19 22:21:33brandtbuchersetnosy: + brandtbucher
2021-05-16 11:50:56serhiy.storchakasetpriority: low -> normal

messages: + msg393740
2021-05-16 11:31:21serhiy.storchakasetnosy: + Mark.Shannon

versions: + Python 3.11, - Python 3.7
2021-05-16 11:30:17serhiy.storchakasetpull_requests: + pull_request24794
2018-09-07 21:15:06taleinatsetnosy: + taleinat
messages: + msg324795
2017-12-26 00:07:35serhiy.storchakasetmessages: + msg309049
2017-12-26 00:04:15serhiy.storchakasetkeywords: + patch
stage: patch review
pull_requests: + pull_request4901
2017-12-25 16:57:34serhiy.storchakasetassignee: serhiy.storchaka
2016-09-29 12:40:20serhiy.storchakasetmessages: + msg277703
2016-09-29 12:01:36ztanesetnosy: + ztane
messages: + msg277702
2016-09-29 10:44:19serhiy.storchakasetmessages: + msg277700
2016-09-29 09:42:27eric.smithsetmessages: + msg277694
2016-09-29 08:50:19serhiy.storchakasetdependencies: + Build-out an AST optimizer, moving some functionality out of the peephole optimizer
2016-09-29 08:49:56serhiy.storchakacreate