Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use dedicated opcodes to speed up calls/attribute lookups with super() as receiver #87729

Closed
vladima mannequin opened this issue Mar 19, 2021 · 7 comments · Fixed by #103809
Closed

Use dedicated opcodes to speed up calls/attribute lookups with super() as receiver #87729

vladima mannequin opened this issue Mar 19, 2021 · 7 comments · Fixed by #103809
Labels
3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@vladima
Copy link
Mannequin

vladima mannequin commented Mar 19, 2021

BPO 43563
Nosy @gvanrossum, @rhettinger, @markshannon, @vladima, @isidentical
PRs
  • bpo-43563 : Introduce dedicated opcodes for super calls #24936
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2021-03-19.20:57:57.938>
    labels = ['interpreter-core', '3.10']
    title = 'Use dedicated opcodes to speed up calls/attribute lookups with super() as receiver'
    updated_at = <Date 2021-04-27.18:23:03.781>
    user = 'https://github.com/vladima'

    bugs.python.org fields:

    activity = <Date 2021-04-27.18:23:03.781>
    actor = 'v2m'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2021-03-19.20:57:57.938>
    creator = 'v2m'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 43563
    keywords = ['patch']
    message_count = 7.0
    messages = ['389114', '389156', '389168', '389337', '389340', '389344', '392116']
    nosy_count = 5.0
    nosy_names = ['gvanrossum', 'rhettinger', 'Mark.Shannon', 'v2m', 'BTaskaya']
    pr_nums = ['24936']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue43563'
    versions = ['Python 3.10']

    Linked PRs

    @vladima
    Copy link
    Mannequin Author

    vladima mannequin commented Mar 19, 2021

    Calling methods and lookup up attributes when receiver is super() has extra cost comparing to regular attribute lookup. It mainly comes from the need to allocate and initialize the instance of the super which for zero argument case also include peeking into frame/code object for the __class__ cell and first argument. In addition because PySuper_Type has custom implementation of tp_getattro - _PyObject_GetMethod would always return bound method.

    import timeit
    
    setup = """
    class A:
        def f(self): pass
    class B(A):
        def f(self): super().f()
        def g(self): A.f(self)
    b = B()
    """
    print(timeit.timeit("b.f()", setup=setup, number=20000000))
    print(timeit.timeit("b.g()", setup=setup, number=20000000))
    
    7.329449548968114
    3.892987059080042
    

    One option to improve it could be to make compiler/interpreter aware of super calls so they can be treated specially. Attached patch introduces two new opcodes LOAD_METHOD_SUPER and LOAD_ATTR_SUPER that are intended to be counterparts for LOAD_METHOD and LOAD_ATTR for cases when receiver is super with either zero or two arguments.

    Immediate argument for both LOAD_METHOD_SUPER and LOAD_ATTR_SUPER is a pair that consist of:
    0: index of method/attribute in co_names
    1: Py_True if super was originally called with 0 arguments and Py_False otherwise.

    Both LOAD_METHOD_SUPER and LOAD_ATTR_SUPER expect 3 elements on the stack:
    TOS3: global_super
    TOS2: type
    TOS1: self/cls

    Result of LOAD_METHOD_SUPER is the same as LOAD_METHOD.
    Result of LOAD_ATTR_SUPER is the same as LOAD_ATTR

    In runtime both LOAD_METHOD_SUPER and LOAD_ATTR_SUPER will check if global_super is PySuper_Type to handle situations when super is patched. If global_super is PySuper_Type then it can use dedicated routine to perform the lookup for provided __class__ and cls/self without allocating new super instance. If global_super is different from PySuper_Type then runtime will fallback to the original logic using global_super and original number of arguments that was captured in immediate.

    Benchmark results with patch:
    4.381768501014449
    3.9492998640052974

    @vladima vladima mannequin added 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Mar 19, 2021
    @markshannon
    Copy link
    Member

    Why?

    Do you have any evidence that the overhead of super() is significant in real programs, or that the proposed change actually speeds up anything beyond your micro-benchmark?

    @rhettinger
    Copy link
    Contributor

    Currently, super() is decoupled from the core language. It is just a builtin that provides customized attribute lookup. This PR makes super() more tightly integrated with the core language, treating it as if it were a keyword and part of the grammar. Also note, users can currently create their own versions of super(), shadowing the builtin super().

    @vladima
    Copy link
    Mannequin Author

    vladima mannequin commented Mar 22, 2021

    Currently, super() is decoupled from the core language. It is just a builtin that provides customized attribute lookup. This PR makes super() more tightly integrated with the core language, treating it as if it were a keyword and part of the grammar. Also note, users can currently create their own versions of super(), shadowing the builtin super().

    This is true however:

    • this patch does not block people from introducing custom version of super so this scenario still work. The idea was to streamline the common case
    • based on digging into Instagram codebase and its transitive dependencies (which is reasonably large amount of code) all spots where super() appear in sources assume super to be builtin and for a pretty common use-case its cost is noticeable in profiler.
    • zero-argument super() still a bit magical since it requires compiler support to create cell for __class__ and assumes certain shape of the frame object so this patch is a step forward with a better compiler support and removing runtime dependency on the frame

    Do you have any evidence that the overhead of super() is significant in real programs

    I do see the non-negligible cost of allocation/initialization of super object in IG profiling data.

    @gvanrossum
    Copy link
    Member

    This looks like a sensible idea to me. The safeguards to ensure that customized 'super' still works seem reasonable to me. I have to admit that I sometimes refrain from using super() where I should because of the expense, so this would be welcome.

    I do wonder -- is two-arg super() important enough to support it at all? Maybe the code would be somewhat simpler if the special opcodes were only generated for zero-arg super() calls.

    @markshannon
    Copy link
    Member

    Numbers please.

    What is "non-negligible cost of allocation/initialization" mean as a fraction of runtime?
    What sort of speed up are you seeing on whole programs?

    @vladima
    Copy link
    Mannequin Author

    vladima mannequin commented Apr 27, 2021

    Apologies for the delay in reply: in more concrete numbers for IG codebase enabling this optimization resulted in 0.2% CPU win.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    carljm added a commit to carljm/cpython that referenced this issue Apr 13, 2023
    carljm added a commit that referenced this issue Apr 24, 2023
    This speeds up `super()` (by around 85%, for a simple one-level
    `super().meth()` microbenchmark) by avoiding allocation of a new
    single-use `super()` object on each use.
    carljm added a commit to carljm/cpython that referenced this issue Apr 24, 2023
    * main:
      pythongh-87729: add LOAD_SUPER_ATTR instruction for faster super() (python#103497)
      pythongh-103791: Make contextlib.suppress also act on exceptions within an ExceptionGroup (python#103792)
    carljm added a commit to carljm/cpython that referenced this issue May 7, 2023
    carljm added a commit to carljm/cpython that referenced this issue May 11, 2023
    * main:
      pythongh-104057: Fix direct invocation of test_support (pythonGH-104069)
      pythongh-87729: improve hit rate of LOAD_SUPER_ATTR specialization (python#104270)
      pythongh-101819: Fix inverted debug preprocessor check in winconsoleio.c (python#104388)
    carljm added a commit to carljm/cpython that referenced this issue May 31, 2023
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 31, 2023
    (cherry picked from commit 7fbac51)
    
    Co-authored-by: Carl Meyer <carl@oddbird.net>
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    Successfully merging a pull request may close this issue.

    3 participants