Author jtaylor
Recipients alex, ezio.melotti, jtaylor, scoder, steven.daprano, terry.reedy, thomasahle, tim.peters, vajrasky
Date 2014-06-01.12:20:29
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1401625230.27.0.693622087854.issue21592@psf.upfronthosting.co.za>
In-reply-to
Content
in the case of the median you can archive similar performance to a multiselect by simply calling min([len(data) // 2 + 1]) for the second order statistic which you need for the averaging of even number of elements.

maybe an interesting datapoint would be to compare with numpys selection algorithm which is a intromultiselect (implemented in C for native datattypes).
It uses a standard median of 3 quickselect with a cutoff in recursion depth to median of median of group of 5.
the multiselect is implemented using a sorted list of kth order statistics and reducing the search space for each kth by maintaining a stack of all visited pivots.
E.g. if you search for 30 and 100, when during the search for 30 one has visited pivot 70 and 110, the search for 100 only needs to select in l[70:110].
The not particularly readable implementation is in: ./numpy/core/src/npysort/selection.c.src
unfortunately for object types it currently falls back to quicksort so you can't directly compare performance with the pure python variants.
History
Date User Action Args
2014-06-01 12:20:30jtaylorsetrecipients: + jtaylor, tim.peters, terry.reedy, scoder, ezio.melotti, steven.daprano, alex, thomasahle, vajrasky
2014-06-01 12:20:30jtaylorsetmessageid: <1401625230.27.0.693622087854.issue21592@psf.upfronthosting.co.za>
2014-06-01 12:20:30jtaylorlinkissue21592 messages
2014-06-01 12:20:29jtaylorcreate