Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Fix performance issue in keyword extraction #83209

Closed
seberg mannequin opened this issue Dec 12, 2019 · 4 comments
Closed

ENH: Fix performance issue in keyword extraction #83209

seberg mannequin opened this issue Dec 12, 2019 · 4 comments
Labels
3.9 only security fixes topic-C-API

Comments

@seberg
Copy link
Mannequin

seberg mannequin commented Dec 12, 2019

BPO 39028
Nosy @methane, @serhiy-storchaka, @seberg, @pablogsal
PRs
  • bpo-39028: Performance enhancement in keyword extraction #17576
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-12-18.06:51:42.353>
    created_at = <Date 2019-12-12.04:04:55.578>
    labels = ['expert-C-API', '3.9']
    title = 'ENH: Fix performance issue in keyword extraction'
    updated_at = <Date 2019-12-18.06:51:48.340>
    user = 'https://github.com/seberg'

    bugs.python.org fields:

    activity = <Date 2019-12-18.06:51:48.340>
    actor = 'methane'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-12-18.06:51:42.353>
    closer = 'methane'
    components = ['C API']
    creation = <Date 2019-12-12.04:04:55.578>
    creator = 'seberg'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 39028
    keywords = ['patch']
    message_count = 4.0
    messages = ['358287', '358290', '358303', '358614']
    nosy_count = 4.0
    nosy_names = ['methane', 'serhiy.storchaka', 'seberg', 'pablogsal']
    pr_nums = ['17576']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue39028'
    versions = ['Python 3.9']

    @seberg
    Copy link
    Mannequin Author

    seberg mannequin commented Dec 12, 2019

    The keyword argument extraction/finding function seems to have a performance bug/enhancement (unless I am missing something here). It reads:

        for (i=0; i < nkwargs; i++) {
            PyObject *kwname = PyTuple_GET_ITEM(kwnames, i);
    
            /* ptr==ptr should match in most cases since keyword keys
               should be interned strings */
            if (kwname == key) {
                return kwstack[i];
            }
            assert(PyUnicode_Check(kwname));
            if (_PyUnicode_EQ(kwname, key)) {
                return kwstack[i];
            }
        }
    

    However, it should be split into two separate for loops, using the PyUnicode_EQ check only if it failed for all other arguments.

    I will open a PR for this (it seemed like a bpo number is wanted for almost everything.

    @seberg seberg mannequin added 3.8 only security fixes 3.9 only security fixes topic-C-API labels Dec 12, 2019
    @serhiy-storchaka
    Copy link
    Member

    Could you please provide benchmarks which demonstrate the performance issue?

    @seberg
    Copy link
    Mannequin Author

    seberg mannequin commented Dec 12, 2019

    Fair enough, we had code that does it the other way, so it seemed "too obvious" since the current check seems mainly useful with few kwargs. However, a single kwarg is super common in python, while many seem super rare (in any argument clinic function)

    Which is why I had even trouble finding a function where it could make a difference, since it needs many kwargs.

    With changes:

    sebastian@seberg-x1 ~/python-dev/bin
    % ./python3 -m timeit -s 'i = 4' 'i.to_bytes(length=5, byteorder="big", signed=True)'
    1000000 loops, best of 5: 205 nsec per loop
    sebastian@seberg-x1 ~/python-dev/bin
    % ./python3 -m timeit -s 'i = 4' 'i.to_bytes(length=5, byteorder="big", signed=True)'
    1000000 loops, best of 5: 207 nsec per loop

    On master:

    sebastian@seberg-x1 ~/python-dev/bin
    % ./python3 -m timeit -s 'i = 4' 'i.to_bytes(length=5, byteorder="big", signed=True)'
    1000000 loops, best of 5: 221 nsec per loop
    sebastian@seberg-x1 ~/python-dev/bin
    % ./python3 -m timeit -s 'i = 4' 'i.to_bytes(length=5, byteorder="big", signed=True)'
    1000000 loops, best of 5: 218 nsec per loop

    Which, at close to 5% is barely noticeable, I suppose with more kwargs it could start to make a difference. With less than 3, it just does not matter I guess. Also, I am currently not sure how likely non-interned strings could actually happen. But I am not sure it matters to optimize for them (unless PyArg_From* functions use them).

    In any case, sorry if this was noise, happy to fix things up or just drop it if many kwargs are considered non-existing.

    @methane methane removed the 3.8 only security fixes label Dec 18, 2019
    @methane methane closed this as completed Dec 18, 2019
    @methane methane removed the 3.8 only security fixes label Dec 18, 2019
    @methane methane closed this as completed Dec 18, 2019
    @methane
    Copy link
    Member

    methane commented Dec 18, 2019

    New changeset 75bb07e by Inada Naoki (Sebastian Berg) in branch 'master':
    bpo-39028: Performance enhancement in keyword extraction (GH-17576)
    75bb07e

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes topic-C-API
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants