ENH: Fix performance issue in keyword extraction #83209

seberg · 2019-12-12T04:04:56Z

BPO	39028
Nosy	@methane, @serhiy-storchaka, @seberg, @pablogsal
PRs	bpo-39028: Performance enhancement in keyword extraction #17576

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2019-12-18.06:51:42.353>
created_at = <Date 2019-12-12.04:04:55.578>
labels = ['expert-C-API', '3.9']
title = 'ENH: Fix performance issue in keyword extraction'
updated_at = <Date 2019-12-18.06:51:48.340>
user = 'https://github.com/seberg'

bugs.python.org fields:

activity = <Date 2019-12-18.06:51:48.340>
actor = 'methane'
assignee = 'none'
closed = True
closed_date = <Date 2019-12-18.06:51:42.353>
closer = 'methane'
components = ['C API']
creation = <Date 2019-12-12.04:04:55.578>
creator = 'seberg'
dependencies = []
files = []
hgrepos = []
issue_num = 39028
keywords = ['patch']
message_count = 4.0
messages = ['358287', '358290', '358303', '358614']
nosy_count = 4.0
nosy_names = ['methane', 'serhiy.storchaka', 'seberg', 'pablogsal']
pr_nums = ['17576']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue39028'
versions = ['Python 3.9']

seberg · 2019-12-12T04:04:55Z

The keyword argument extraction/finding function seems to have a performance bug/enhancement (unless I am missing something here). It reads:

    for (i=0; i < nkwargs; i++) {
        PyObject *kwname = PyTuple_GET_ITEM(kwnames, i);

        /* ptr==ptr should match in most cases since keyword keys
           should be interned strings */
        if (kwname == key) {
            return kwstack[i];
        }
        assert(PyUnicode_Check(kwname));
        if (_PyUnicode_EQ(kwname, key)) {
            return kwstack[i];
        }
    }

However, it should be split into two separate for loops, using the PyUnicode_EQ check only if it failed for all other arguments.

I will open a PR for this (it seemed like a bpo number is wanted for almost everything.

serhiy-storchaka · 2019-12-12T06:17:43Z

Could you please provide benchmarks which demonstrate the performance issue?

seberg · 2019-12-12T16:11:03Z

Fair enough, we had code that does it the other way, so it seemed "too obvious" since the current check seems mainly useful with few kwargs. However, a single kwarg is super common in python, while many seem super rare (in any argument clinic function)

Which is why I had even trouble finding a function where it could make a difference, since it needs many kwargs.

With changes:

sebastian@seberg-x1 ~/python-dev/bin
% ./python3 -m timeit -s 'i = 4' 'i.to_bytes(length=5, byteorder="big", signed=True)'
1000000 loops, best of 5: 205 nsec per loop
sebastian@seberg-x1 ~/python-dev/bin
% ./python3 -m timeit -s 'i = 4' 'i.to_bytes(length=5, byteorder="big", signed=True)'
1000000 loops, best of 5: 207 nsec per loop

On master:

sebastian@seberg-x1 ~/python-dev/bin
% ./python3 -m timeit -s 'i = 4' 'i.to_bytes(length=5, byteorder="big", signed=True)'
1000000 loops, best of 5: 221 nsec per loop
sebastian@seberg-x1 ~/python-dev/bin
% ./python3 -m timeit -s 'i = 4' 'i.to_bytes(length=5, byteorder="big", signed=True)'
1000000 loops, best of 5: 218 nsec per loop

Which, at close to 5% is barely noticeable, I suppose with more kwargs it could start to make a difference. With less than 3, it just does not matter I guess. Also, I am currently not sure how likely non-interned strings could actually happen. But I am not sure it matters to optimize for them (unless PyArg_From* functions use them).

In any case, sorry if this was noise, happy to fix things up or just drop it if many kwargs are considered non-existing.

methane · 2019-12-18T06:51:48Z

New changeset 75bb07e by Inada Naoki (Sebastian Berg) in branch 'master':
bpo-39028: Performance enhancement in keyword extraction (GH-17576)
75bb07e

seberg mannequin added 3.8 only security fixes 3.9 only security fixes topic-C-API labels Dec 12, 2019

methane removed the 3.8 only security fixes label Dec 18, 2019

methane closed this as completed Dec 18, 2019

methane removed the 3.8 only security fixes label Dec 18, 2019

methane closed this as completed Dec 18, 2019

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Fix performance issue in keyword extraction #83209

ENH: Fix performance issue in keyword extraction #83209

seberg mannequin commented Dec 12, 2019

seberg mannequin commented Dec 12, 2019

serhiy-storchaka commented Dec 12, 2019

seberg mannequin commented Dec 12, 2019

methane commented Dec 18, 2019

ENH: Fix performance issue in keyword extraction #83209

ENH: Fix performance issue in keyword extraction #83209

Comments

seberg mannequin commented Dec 12, 2019

seberg mannequin commented Dec 12, 2019

serhiy-storchaka commented Dec 12, 2019

seberg mannequin commented Dec 12, 2019

methane commented Dec 18, 2019