This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author jab
Recipients jab
Date 2009-07-31.16:55:59
SpamBayes Score 5.131018e-11
Marked as misclassified No
Message-id <1249059362.21.0.599959669928.issue6614@psf.upfronthosting.co.za>
In-reply-to
Content
From http://docs.python.org/library/heapq.html:
> The latter two functions (nlargest and nsmallest) perform best for
> smaller values of n. For larger values, it is more efficient to use
> the sorted() function. Also, when n==1, it is more efficient to use
> the builtin min() and max() functions.

A quick usability win for the heapq module: Be more specific than "smaller" and 
"larger", e.g. "when n is O(...(len(iterable)))".

From http://groups.google.com/group/comp.lang.python/msg/9556f56ae25ecf3b:
> I find it interesting that the heapq functions tell you in the
> documentation that they aren't suitable for use where n==1 or where
> n is near the total size of the sequence whereas random.sample()
> chooses what it thinks is the best algorithm based on the input. At
> the very least I would have thought the heapq functions could check
> for n==1 and call min/max when it is.

+1. It looks like the pure Python implementation of nsmallest is actually already 
choosing either an insort algorithm or a minheap algorithm based on whether n * 10 <= 
len(iterable), but the C implementation of nsmallest unconditionally uses a minheap 
algorithm. Neither the pure Python nor the C implementation of nlargest chooses its 
algorithm conditionally. For the sake of consistency and usability, all 
implementations of nsmallest and nlargest should choose the most efficient algorithm 
from among min/max, insort, and minheap, and the docs should be updated to describe 
this behavior.

Also, in looking through the heapq.py that came with my Python 2.6.2 distribution, I 
noticed that line 196 seems to have no reason to be there:
    _heappushpop = heappushpop                                      
This is the only occurrence of "_heappushpop" in the file.

I made a patch for heapq.py, but since I'm not a CPython hacker, I thought it might 
be better to leave the changes up to someone who could do both the pure Python and 
the C implementations in tandem. I'd be happy to contribute documentation when the 
time comes if that would help.
History
Date User Action Args
2009-07-31 16:56:02jabsetrecipients: + jab
2009-07-31 16:56:02jabsetmessageid: <1249059362.21.0.599959669928.issue6614@psf.upfronthosting.co.za>
2009-07-31 16:56:00jablinkissue6614 messages
2009-07-31 16:55:59jabcreate