Message 219498 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	jtaylor
Recipients	alex, ezio.melotti, jtaylor, scoder, steven.daprano, terry.reedy, thomasahle, tim.peters, vajrasky
Date	2014-06-01.12:20:29
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1401625230.27.0.693622087854.issue21592@psf.upfronthosting.co.za>
In-reply-to

Content
in the case of the median you can archive similar performance to a multiselect by simply calling min([len(data) // 2 + 1]) for the second order statistic which you need for the averaging of even number of elements. maybe an interesting datapoint would be to compare with numpys selection algorithm which is a intromultiselect (implemented in C for native datattypes). It uses a standard median of 3 quickselect with a cutoff in recursion depth to median of median of group of 5. the multiselect is implemented using a sorted list of kth order statistics and reducing the search space for each kth by maintaining a stack of all visited pivots. E.g. if you search for 30 and 100, when during the search for 30 one has visited pivot 70 and 110, the search for 100 only needs to select in l[70:110]. The not particularly readable implementation is in: ./numpy/core/src/npysort/selection.c.src unfortunately for object types it currently falls back to quicksort so you can't directly compare performance with the pure python variants.

in the case of the median you can archive similar performance to a multiselect by simply calling min([len(data) // 2 + 1]) for the second order statistic which you need for the averaging of even number of elements.

maybe an interesting datapoint would be to compare with numpys selection algorithm which is a intromultiselect (implemented in C for native datattypes).
It uses a standard median of 3 quickselect with a cutoff in recursion depth to median of median of group of 5.
the multiselect is implemented using a sorted list of kth order statistics and reducing the search space for each kth by maintaining a stack of all visited pivots.
E.g. if you search for 30 and 100, when during the search for 30 one has visited pivot 70 and 110, the search for 100 only needs to select in l[70:110].
The not particularly readable implementation is in: ./numpy/core/src/npysort/selection.c.src
unfortunately for object types it currently falls back to quicksort so you can't directly compare performance with the pure python variants.

History
Date	User	Action	Args
2014-06-01 12:20:30	jtaylor	set	recipients: + jtaylor, tim.peters, terry.reedy, scoder, ezio.melotti, steven.daprano, alex, thomasahle, vajrasky
2014-06-01 12:20:30	jtaylor	set	messageid: <1401625230.27.0.693622087854.issue21592@psf.upfronthosting.co.za>
2014-06-01 12:20:30	jtaylor	link	issue21592 messages
2014-06-01 12:20:29	jtaylor	create