Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accelerate 'string' % (value, ...) by using formatted string literals #72494

Open
serhiy-storchaka opened this issue Sep 29, 2016 · 9 comments
Open
Assignees
Labels
3.11 only security fixes 3.12 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@serhiy-storchaka
Copy link
Member

BPO 28307
Nosy @vstinner, @taleinat, @ericvsmith, @markshannon, @serhiy-storchaka, @ztane, @brandtbucher
PRs
  • bpo-28307: Convert simple C-style formatting with literal format into f-string. #5012
  • bpo-28307: Optimize C-style formatting of numbers #26160
  • bpo-28307: Tests and fixes for optimization of C-style formatting #26318
  • Dependencies
  • bpo-11549: Build-out an AST optimizer, moving some functionality out of the peephole optimizer
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = None
    created_at = <Date 2016-09-29.08:49:56.387>
    labels = ['interpreter-core', '3.11', 'performance']
    title = "Accelerate 'string' % (value, ...) by using formatted string literals"
    updated_at = <Date 2021-09-22.13:56:33.450>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2021-09-22.13:56:33.450>
    actor = 'vstinner'
    assignee = 'serhiy.storchaka'
    closed = False
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2016-09-29.08:49:56.387>
    creator = 'serhiy.storchaka'
    dependencies = ['11549']
    files = []
    hgrepos = []
    issue_num = 28307
    keywords = ['patch']
    message_count = 9.0
    messages = ['277688', '277694', '277700', '277702', '277703', '309049', '324795', '393740', '402436']
    nosy_count = 7.0
    nosy_names = ['vstinner', 'taleinat', 'eric.smith', 'Mark.Shannon', 'serhiy.storchaka', 'ztane', 'brandtbucher']
    pr_nums = ['5012', '26160', '26318']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue28307'
    versions = ['Python 3.11']

    @serhiy-storchaka
    Copy link
    Member Author

    For now using formatted string literals (PEP-498) is the fastest way of formatting strings.

    $ ./python -m perf timeit -s 'k = "foo"; v = "bar"' -- '"%s = %r" % (k, v)'
    Median +- std dev: 2.27 us +- 0.20 us
    
    $ ./python -m perf timeit -s 'k = "foo"; v = "bar"' -- 'f"{k!s} = {v!r}"'
    Median +- std dev: 1.09 us +- 0.08 us

    The compiler could translate C-style formatting with literal format string to the equivalent formatted string literal. The code '%s = %r' % (k, v) could be translated to

        t1 = k; t2 = v; f'{t1!r} = {t2!s}'; del t1, t2

    or even simpler if k and v are initialized local variables.

    $ ./python -m perf timeit -s 'k = "foo"; v = "bar"' -- 't1 = k; t2 = v; f"{t1!s} = {t2!r}"; del t1, t2'
    Median +- std dev: 1.22 us +- 0.05 us

    This is not easy issue and needs first implementing the AST optimizer.

    @serhiy-storchaka serhiy-storchaka added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Sep 29, 2016
    @ericvsmith
    Copy link
    Member

    There isn't a direct mapping between %-formatting and __format__ format specifiers. Off the top of my head, I can think of at least one difference:

    >>> '%i' % 3
    '3'
    >>> '{:i}'.format(3)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: Unknown format code 'i' for object of type 'int'

    So you'll need to be careful with edge cases like this.

    Also, for all usages of %s, remember to call str() (or add !s):

    >>> '%s' % 1
    '1'
    >>> f'{1:s}'
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: Unknown format code 's' for object of type 'int'
    >>> f'{1!s:s}'
    '1'
    
    Although that also reminds me of this default alignment difference:
    >>> x=0
    >>> '%2s' % x
    ' 0'
    >>> f'{x!s:2s}'
    '0 '
    >>> f'{x!s:>2s}'
    ' 0'

    So, in general, the mapping will be difficult. On the other hand, if you can do it, and provide a function that maps between %-formatting codes and __format__ codes, then that might be a generally useful tool.

    @serhiy-storchaka
    Copy link
    Member Author

    '%s' % x should be translated to f'{x!s}', not to f'{x:s}'. Only %s, %r and %a can be supported. Formatting with %i should left untranslated. Or maybe translate '%r: %i' % (a, x) to f'{a!r}: {"%i" % x}'.

    It is possible also to introduce special opcodes that converts argument to exact int or float. Then '%06i' % x could be translated to f'{exact_int(x):06}'.

    @ztane
    Copy link
    Mannequin

    ztane mannequin commented Sep 29, 2016

    Serhiy, you actually did make a mistake above; '%s' % x cannot be rewritten as f'{x!s}', only '%s' % (x,) can be optimized...

    (just try with x = 1, 2)

    @serhiy-storchaka
    Copy link
    Member Author

    Thanks for the correction Antti. Yes, this is what I initially meant. This optimization is applicable only if the left argument of % is a literal string and the right argument is a tuple expression. Saying about '%s' % x I meant a component of the tuple.

    @serhiy-storchaka serhiy-storchaka self-assigned this Dec 25, 2017
    @serhiy-storchaka
    Copy link
    Member Author

    PR 5012 implements transformation simple format strings containing only %s, %r and %a into f-strings.

    @taleinat
    Copy link
    Contributor

    taleinat commented Sep 7, 2018

    I'm +1 on this optimization.

    @serhiy-storchaka serhiy-storchaka added 3.11 only security fixes and removed 3.7 (EOL) end of life labels May 16, 2021
    @serhiy-storchaka
    Copy link
    Member Author

    PR 26160 adds support of %d, %i, %u, %o, %x, %X, %f, %e, %g, %F, %E, %G.

    What is not supported:

    • Formatting with a single value not wrapped into a 1-tuple (like in "%d bytes" % size). The behavior is non-trivial, it needs checking whether the value is a tuple or a mapping. It would require adding more opcodes and AST nodes and generating complex bytecode.
    • Mapping keys (like in "%(name)s"). They are rarely used, and if used, it is not performance critical.
    • Precision for integer formatting. Precision is not supported in new-style formatting of integers, and it is not trivial to reproduce this behavior.
    • Variable width and precision (like in "%*.*s"). It is possible to support them, but the code would be pretty complex, and the benefit is small, because this feature is rarely used and is not performance critical.
    • Format code %c. It is relatively rarely used.
    • Length modifiers "h", "l" and "L" (like in "%ld"). They ignored in Python and I did not see them in real code. While supporting them is easy, it would requires adding more than one line of code, it is not worth it.

    @vstinner
    Copy link
    Member

    commit a0bd9e9
    Author: Serhiy Storchaka <storchaka@gmail.com>
    Date: Sat May 8 22:33:10 2021 +0300

    bpo-28307: Convert simple C-style formatting with literal format into f-string. (GH-5012)
    
    C-style formatting with literal format containing only format codes
    %s, %r and %a (with optional width, precision and alignment)
    will be converted to an equivalent f-string expression.
    
    It can speed up formatting more than 2 times by eliminating
    runtime parsing of the format string and creating temporary tuple.
    

    commit 8b01067
    Author: Serhiy Storchaka <storchaka@gmail.com>
    Date: Sun May 23 19:06:48 2021 +0300

    bpo-28307: Tests and fixes for optimization of C-style formatting (GH-26318)
    
    Fix errors:
    * "%10.s" should be equal to "%10.0s", not "%10s".
    * Tuples with starred expressions caused a SyntaxError.
    

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @iritkatriel iritkatriel added the 3.12 bugs and security fixes label Sep 12, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes 3.12 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants