Author jab
Recipients jab, rhettinger
Date 2009-07-31.19:22:32
SpamBayes Score 1.47479e-09
Marked as misclassified No
Message-id <1249068153.91.0.565336800522.issue6614@psf.upfronthosting.co.za>
In-reply-to
Content
> I prefer the docs the way they are.  They help the reader understand
> the relationship between min, max, nsmallest, nlargest, and sorted.

Except that it's no longer true that "when n==1, it is more efficient to use the 
builtin min() and max() functions." Shouldn't this be updated to say "when n==1, 
it is equivalent to using the builtin min() and max() functions"?

> I'm not sure where you got the n * 10 <= len(iterable) switch-over
> point.

It's right in the file you linked to. Search for "n * 10" in 
http://svn.python.org/view/python/branches/release31-maint/Lib/heapq.py?
revision=73579&view=markup.

> That is arbitrary.  The correct switchover point depends on the
> cost of the comparison function, whether the length of the input is
> known, whether the input data is partially ordered, memory constraints,
> whether a key function is used, and on other factors.   

So should that be removed, then?

> FWIW, I also wrote the logic for random.sample().  The switchover logic
> was straight-forward because performance depended on factors that were
> fully known (length of input, sample size, memory utilization, and
> average number of probes for each strategy).
> One other thought:  When memory is tight, the programmer needs to be
> able to select the heap algorithm in favor of sorted() even for
> relatively large values of n.  I do not want an automatic switchover
> point that takes away a programmer's choice between speed and space
> efficiency.

Agreed, and thanks for the additional info.
History
Date User Action Args
2009-07-31 19:22:33jabsetrecipients: + jab, rhettinger
2009-07-31 19:22:33jabsetmessageid: <1249068153.91.0.565336800522.issue6614@psf.upfronthosting.co.za>
2009-07-31 19:22:32jablinkissue6614 messages
2009-07-31 19:22:32jabcreate