Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove C stack use by specializing BINARY_SUBSCR, STORE_SUBSCR, LOAD_ATTR, and STORE_ATTR #89987

Closed
markshannon opened this issue Nov 17, 2021 · 10 comments
Assignees
Labels
3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) pending The issue will be closed if no feedback is provided performance Performance or resource usage

Comments

@markshannon
Copy link
Member

markshannon commented Nov 17, 2021

BPO 45829
Nosy @gvanrossum, @pmp-p, @markshannon, @pablogsal, @brandtbucher, @sweeneyde
PRs
  • bpo-45829: Specialize BINARY_SUBSCR for __getitem__ implemented in Python. #29592
  • bpo-45829: Check __getitem__'s version for overflow before specializing #30129
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/markshannon'
    closed_at = None
    created_at = <Date 2021-11-17.09:50:11.128>
    labels = ['interpreter-core', '3.11', 'performance']
    title = 'Remove C stack use by specializing BINARY_SUBSCR, STORE_SUBSCR, LOAD_ATTR, and STORE_ATTR'
    updated_at = <Date 2021-12-16.11:08:35.974>
    user = 'https://github.com/markshannon'

    bugs.python.org fields:

    activity = <Date 2021-12-16.11:08:35.974>
    actor = 'Mark.Shannon'
    assignee = 'Mark.Shannon'
    closed = False
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2021-11-17.09:50:11.128>
    creator = 'Mark.Shannon'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 45829
    keywords = ['patch']
    message_count = 7.0
    messages = ['406461', '406473', '406476', '406477', '406530', '406573', '408687']
    nosy_count = 6.0
    nosy_names = ['gvanrossum', 'pmpp', 'Mark.Shannon', 'pablogsal', 'brandtbucher', 'Dennis Sweeney']
    pr_nums = ['29592', '30129']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue45829'
    versions = ['Python 3.11']

    Linked PRs

    @markshannon
    Copy link
    Member Author

    We can remove the C stack use and general overhead of calling special methods implemented in Python for attribute access and indexing.

    Each operation has a special method that implements it. When that special method is implemented in Python, we should avoid the tp_xxx slot machinery and use the same mechanism we use for normal calls to Python functions.

    • BINARY_SUBSCR: __getitem__
    • STORE_SUBSCR: __setitem__
    • LOAD_ATTR: __getattribute__ (and maybe __getattr__)
    • STORE_ATTR: __setattr__

    It probably isn't worth bothering with the deletion forms.

    The getters (__getitem__ and __getattribute__) are relatively simple, as the call returns the result.

    The setters are a bit more complicated as the return value needs to be discarded, so an additional frame which discards the result of the call needs to be inserted.

    @markshannon markshannon added the 3.11 only security fixes label Nov 17, 2021
    @markshannon markshannon self-assigned this Nov 17, 2021
    @markshannon markshannon added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage 3.11 only security fixes labels Nov 17, 2021
    @markshannon markshannon self-assigned this Nov 17, 2021
    @markshannon markshannon added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Nov 17, 2021
    @gvanrossum
    Copy link
    Member

    Of these, presumably LOAD_GETATTR is by far the most used, so should we try that first?

    @markshannon
    Copy link
    Member Author

    I don't think it matter much which we do first.
    I happened to do BINARY_SUBSCR first.

    @gvanrossum
    Copy link
    Member

    That's a good one too, and perhaps simpler.

    @markshannon
    Copy link
    Member Author

    New changeset 21fa7a3 by Mark Shannon in branch 'main':
    bpo-45829: Specialize BINARY_SUBSCR for __getitem__ implemented in Python. (GH-29592)
    21fa7a3

    @sweeneyde
    Copy link
    Member

    This snippet occurs a couple of times in ceval.c (BINARY_SUBSCR_GETITEM and CALL_FUNCTION_PY_SIMPLE):

            new_frame->previous = frame;
            frame = cframe.current_frame = new_frame;
            new_frame->depth = frame->depth + 1;
    

    Maybe I'm reading it wrong, but I think the last line is just setting new_frame->depth++, leaving new_frame->depth = 1 instead of frame->previous->depth + 1.

    I think the second and third lines should be swapped?

    @markshannon
    Copy link
    Member Author

    New changeset 62a8a0c by Brandt Bucher in branch 'main':
    bpo-45829: Check __getitem__'s version for overflow before specializing (GH-30129)
    62a8a0c

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @iritkatriel
    Copy link
    Member

    Is there anything left to do here?

    @iritkatriel iritkatriel added the pending The issue will be closed if no feedback is provided label Sep 13, 2022
    @arhadthedev
    Copy link
    Member

    Is there anything left to do here?

    @markshannon ping?

    @arhadthedev
    Copy link
    Member

    The issue and associated PRs have neither activity for one and a half year nor a checklist or a comment stating further work. Thus closing as completed.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) pending The issue will be closed if no feedback is provided performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants