This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author steven.daprano
Recipients mark.dickinson, remi.lapeyre, rhettinger, steven.daprano
Date 2019-01-19.23:27:32
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1547940452.3.0.168782150517.issue35775@roundup.psfhosted.org>
In-reply-to
Content
Rémi. I've read over your patch and have some comments:

(1) You call sorted() to produce a list, but then instead of retrieving the item using ``data[i-1]`` you use ``itertools.islice``. That seems unnecessary to me. Do you have a reason for using ``islice``?

(2) select is not very useful on its own, we actually want it so we can calculate quantiles, e.g. percentiles, deciles, quartiles. If we want the k-quantile (e.g. k=100 for percentiles) then there are k+1 k-quantiles in total, including the minimum and maximum. E.g quartiles divide the data set into four equal sections, so there are five boundary values including the min and max.

So the caller is likely to be calling select repeatedly on the same data set, and hence making a copy of that data and sorting it repeatedly. If the data set is small, repeatedly making sorted copies is still cheap enough, but for large data sets, that will be expensive.

Do you have any thoughts on how to deal with that?
History
Date User Action Args
2019-01-19 23:27:34steven.dapranosetrecipients: + steven.daprano, rhettinger, mark.dickinson, remi.lapeyre
2019-01-19 23:27:32steven.dapranosetmessageid: <1547940452.3.0.168782150517.issue35775@roundup.psfhosted.org>
2019-01-19 23:27:32steven.dapranolinkissue35775 messages
2019-01-19 23:27:32steven.dapranocreate