Message 91142 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	jab
Recipients	jab, rhettinger
Date	2009-07-31.19:22:32
SpamBayes Score	1.4747886e-09
Marked as misclassified	No
Message-id	<1249068153.91.0.565336800522.issue6614@psf.upfronthosting.co.za>
In-reply-to

Content
> I prefer the docs the way they are. They help the reader understand > the relationship between min, max, nsmallest, nlargest, and sorted. Except that it's no longer true that "when n==1, it is more efficient to use the builtin min() and max() functions." Shouldn't this be updated to say "when n==1, it is equivalent to using the builtin min() and max() functions"? > I'm not sure where you got the n * 10 <= len(iterable) switch-over > point. It's right in the file you linked to. Search for "n * 10" in http://svn.python.org/view/python/branches/release31-maint/Lib/heapq.py? revision=73579&view=markup. > That is arbitrary. The correct switchover point depends on the > cost of the comparison function, whether the length of the input is > known, whether the input data is partially ordered, memory constraints, > whether a key function is used, and on other factors. So should that be removed, then? > FWIW, I also wrote the logic for random.sample(). The switchover logic > was straight-forward because performance depended on factors that were > fully known (length of input, sample size, memory utilization, and > average number of probes for each strategy). > One other thought: When memory is tight, the programmer needs to be > able to select the heap algorithm in favor of sorted() even for > relatively large values of n. I do not want an automatic switchover > point that takes away a programmer's choice between speed and space > efficiency. Agreed, and thanks for the additional info.

> I prefer the docs the way they are.  They help the reader understand
> the relationship between min, max, nsmallest, nlargest, and sorted.

Except that it's no longer true that "when n==1, it is more efficient to use the 
builtin min() and max() functions." Shouldn't this be updated to say "when n==1, 
it is equivalent to using the builtin min() and max() functions"?

> I'm not sure where you got the n * 10 <= len(iterable) switch-over
> point.

It's right in the file you linked to. Search for "n * 10" in 
http://svn.python.org/view/python/branches/release31-maint/Lib/heapq.py?
revision=73579&view=markup.

> That is arbitrary.  The correct switchover point depends on the
> cost of the comparison function, whether the length of the input is
> known, whether the input data is partially ordered, memory constraints,
> whether a key function is used, and on other factors.   

So should that be removed, then?

> FWIW, I also wrote the logic for random.sample().  The switchover logic
> was straight-forward because performance depended on factors that were
> fully known (length of input, sample size, memory utilization, and
> average number of probes for each strategy).
> One other thought:  When memory is tight, the programmer needs to be
> able to select the heap algorithm in favor of sorted() even for
> relatively large values of n.  I do not want an automatic switchover
> point that takes away a programmer's choice between speed and space
> efficiency.

Agreed, and thanks for the additional info.

History
Date	User	Action	Args
2009-07-31 19:22:33	jab	set	recipients: + jab, rhettinger
2009-07-31 19:22:33	jab	set	messageid: <1249068153.91.0.565336800522.issue6614@psf.upfronthosting.co.za>
2009-07-31 19:22:32	jab	link	issue6614 messages
2009-07-31 19:22:32	jab	create