Message 219448 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	thomasahle
Recipients	steven.daprano, terry.reedy, thomasahle, tim.peters, vajrasky
Date	2014-05-31.09:15:25
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1401527727.26.0.430936232383.issue21592@psf.upfronthosting.co.za>
In-reply-to

Content
I think "minimize expected-case time" is a good goal. If we wanted "minimize worst-case time" we would have to use k-means rather than quickselect. My trials on random data, where sort arguably has a disadvantage, suggests sorting is about twice as fast for most input sizes. With pypy quick-select is easily 5-10 times faster, which I take as a suggestion that a C-implementation might be worth a try. For designing a realistic test-suite, I suppose we need to look at what tasks medians are commonly used for. I'm thinking median filters from image processing, medians clustering, robust regressing, anything else?

I think "minimize expected-case time" is a good goal. If we wanted "minimize worst-case time" we would have to use k-means rather than quickselect.

My trials on random data, where sort arguably has a disadvantage, suggests sorting is about twice as fast for most input sizes. With pypy quick-select is easily 5-10 times faster, which I take as a suggestion that a C-implementation might be worth a try.

For designing a realistic test-suite, I suppose we need to look at what tasks medians are commonly used for. I'm thinking median filters from image processing, medians clustering, robust regressing, anything else?

History
Date	User	Action	Args
2014-05-31 09:15:27	thomasahle	set	recipients: + thomasahle, tim.peters, terry.reedy, steven.daprano, vajrasky
2014-05-31 09:15:27	thomasahle	set	messageid: <1401527727.26.0.430936232383.issue21592@psf.upfronthosting.co.za>
2014-05-31 09:15:27	thomasahle	link	issue21592 messages
2014-05-31 09:15:25	thomasahle	create