This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Optimization to heapq module
Type: Stage:
Components: Library (Lib) Versions: Python 3.0, Python 3.1, Python 2.7, Python 2.6, Python 2.5
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: nilton, rhettinger
Priority: normal Keywords: patch

Created on 2008-12-31 01:07 by nilton, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
heapq_optimization.diff nilton, 2008-12-31 01:07
Messages (4)
msg78584 - (view) Author: Nilton Volpato (nilton) Date: 2008-12-31 01:07
The wrapper around heapq.nlargest and heapq.nsmallest is much slower
than it's C version.

$ python2.5 -mtimeit -s 'import random; random.seed(123); n=999999;
x=range(n); random.shuffle(x); import _heapq' '_heapq.nlargest(3, x)'
10 loops, best of 3: 142 msec per loop

$ python2.5 -mtimeit -s 'import random; random.seed(123); n=999999;
x=range(n); random.shuffle(x); import heapq' 'heapq.nlargest(3, x)'
10 loops, best of 3: 685 msec per loop


If the key argument is None, there is no need to use the wrapper. This
patch adds an a check to avoid this.
msg78587 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-12-31 01:48
Am rejecting the patch because it violates the sort equivalence
guarantee (including sort's promise to maintain stability).

If you need the speed and don't care about sort stability, then just use
_heapq.nsmallest() or _heapq.nlargest() directly.

We could complexify the code a bit to achieve some automatic speed-up in
the case of key==None, but my timings don't show much of an improvement:

    if key is None:
        it = izip(iterable, count())                 # decorate
        result = _nsmallest(n, it)
        return map(itemgetter(0), result)            # undecorate
    else:
        in1, in2 = tee(iterable)
        it = izip(imap(key, in1), count(), in2)      # decorate
        result = _nsmallest(n, it)
        return map(itemgetter(2), result)            # undecorate
msg78590 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-12-31 04:34
After looking at the ugly handling of this logic in the 3.0 translation,
I've reconsidered.  It is better to have a separate path for the case
where the key is None.  See r68095 and r68096 .
msg78654 - (view) Author: Nilton Volpato (nilton) Date: 2008-12-31 21:47
Nice!

Maybe we could add the decorate/undecorate step to guarantee stability
to the C implementation. I'll do some experiments and timings on this.
The heapq library has a lot of room for optimization, I think.
History
Date User Action Args
2022-04-11 14:56:43adminsetgithub: 49040
2008-12-31 21:47:38niltonsetmessages: + msg78654
2008-12-31 04:34:02rhettingersetresolution: rejected -> accepted
messages: + msg78590
2008-12-31 01:48:53rhettingersetstatus: open -> closed
assignee: rhettinger
messages: + msg78587
resolution: rejected
nosy: + rhettinger
2008-12-31 01:07:19niltoncreate