Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify and speed-up heaqp.nlargest() #65623

Closed
rhettinger opened this issue May 4, 2014 · 3 comments
Closed

Simplify and speed-up heaqp.nlargest() #65623

rhettinger opened this issue May 4, 2014 · 3 comments
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@rhettinger
Copy link
Contributor

BPO 21424
Nosy @tim-one, @rhettinger
Files
  • rid_nlargest.py: Fold all nlargest() work into pure Python
  • rip_nsmallest.diff: Draft patch for nsmallest()
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2014-05-11.21:40:03.236>
    created_at = <Date 2014-05-04.05:57:09.808>
    labels = ['library', 'performance']
    title = 'Simplify and speed-up heaqp.nlargest()'
    updated_at = <Date 2014-05-11.21:40:03.235>
    user = 'https://github.com/rhettinger'

    bugs.python.org fields:

    activity = <Date 2014-05-11.21:40:03.235>
    actor = 'rhettinger'
    assignee = 'none'
    closed = True
    closed_date = <Date 2014-05-11.21:40:03.236>
    closer = 'rhettinger'
    components = ['Library (Lib)']
    creation = <Date 2014-05-04.05:57:09.808>
    creator = 'rhettinger'
    dependencies = []
    files = ['35148', '35213']
    hgrepos = []
    issue_num = 21424
    keywords = ['patch']
    message_count = 3.0
    messages = ['217859', '218254', '218300']
    nosy_count = 3.0
    nosy_names = ['tim.peters', 'rhettinger', 'python-dev']
    pr_nums = []
    priority = 'low'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue21424'
    versions = ['Python 3.5']

    @rhettinger
    Copy link
    Contributor Author

    Consolidate the logic for nlargest() into a single function. Remove both the C and pure Python base underlying code.

    With all the logic in a single function, it only becomes necessary to create, store, and compare the data tuples when a need item is added to the heap. This means that the rest of the comparisons (checking to see whether a new item needs to be added to the heap) can run faster and not need to create a (key, order, elem) tuple.

    The change reduces the number of tuples created and the number of ordering integers created.

    When rich comparisons were introduced, tuple ordering comparisons became twice as expensive (they are compared elementwise for equality and then there is an additional comparison call to order the first differing element). Under the existing nlargest() code, we pay that price for every lement in the iterable. In the new code, we pay that price only for the small subset of the iterable that actually gets added to the heap.

    After this, another patch for simplifying nsmallest() is forthcoming.

    @rhettinger rhettinger added stdlib Python modules in the Lib dir performance Performance or resource usage labels May 4, 2014
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 11, 2014

    New changeset fadc06047314 by Raymond Hettinger in branch 'default':
    Issue bpo-21424: Optimize heaqp.nlargest() to make fewer tuple comparisons.
    http://hg.python.org/cpython/rev/fadc06047314

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 11, 2014

    New changeset 31950174f60f by Raymond Hettinger in branch 'default':
    bpo-21424: Apply the nlargest() optimizations to nsmallest() as well.
    http://hg.python.org/cpython/rev/31950174f60f

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant