Message 343398 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	remi.lapeyre
Recipients	mark.dickinson, remi.lapeyre, rhettinger, steven.daprano
Date	2019-05-24.15:35:28
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1558712128.33.0.512889885052.issue35775@roundup.psfhosted.org>
In-reply-to

Content
Hi Steven, thanks for taking the time to reviewing my patch. Regarding the relevance of add select(), I was looking for work to do in the bug tracker and found some references to it (https://bugs.python.org/issue21592#msg219934 for example). I knew that there is multiples definition of the percentiles but got sloppy in my previous response by wanting to answer quickly. I will try not to do this again. Regarding the use of sorting, I thought that sorting would be quicker than doing the other linear-time algorithm in Python given the general performance of Tim sort, some tests in https://bugs.python.org/issue21592 agreed with that. For the iterator, I was thinking about how to implement percentiles when writing select() and thought that by writing: def _select(data, i, key=None): if not len(data): raise StatisticsError("select requires at least one data point") if not (1 <= i <= len(data)): raise StatisticsError(f"The index looked for must be between 1 and {len(data)}") data = sorted(data, key=key) return islice(data, i-1, None) def select(data, i, key=None): return next(_select(data, y, key=key)) and then doing some variant of: it = _select(data, i, key=key) left, right = next(it), next(it) # compute percentile with left and right to implement the quantiles without sorting multiple time the list. Now that quantiles() has been implement by Raymond Hettinger, this is moot anyway. Since its probably not useful, feel free to disregard my PR.

Hi Steven, thanks for taking the time to reviewing my patch.

Regarding the relevance of add select(), I was looking for work to do in the bug tracker and found some references to it (https://bugs.python.org/issue21592#msg219934 for example).

I knew that there is multiples definition of the percentiles but got sloppy in my previous response by wanting to answer quickly. I will try not to do this again.


Regarding the use of sorting, I thought that sorting would be quicker than doing the other linear-time algorithm in Python given the general performance of Tim sort, some tests in https://bugs.python.org/issue21592 agreed with that.

For the iterator, I was thinking about how to implement percentiles when writing select() and thought that by writing:


def _select(data, i, key=None):
    if not len(data):
        raise StatisticsError("select requires at least one data point")
    if not (1 <= i <= len(data)):
        raise StatisticsError(f"The index looked for must be between 1 and {len(data)}")
    data = sorted(data, key=key)
    return islice(data, i-1, None)

def select(data, i, key=None):
    return next(_select(data, y, key=key))


and then doing some variant of:

    it = _select(data, i, key=key)
    left, right = next(it), next(it)
    # compute percentile with left and right

to implement the quantiles without sorting multiple time the list. Now that quantiles() has been implement by Raymond Hettinger, this is moot anyway.    

Since its probably not useful, feel free to disregard my PR.

History
Date	User	Action	Args
2019-05-24 15:35:28	remi.lapeyre	set	recipients: + remi.lapeyre, rhettinger, mark.dickinson, steven.daprano
2019-05-24 15:35:28	remi.lapeyre	set	messageid: <1558712128.33.0.512889885052.issue35775@roundup.psfhosted.org>
2019-05-24 15:35:28	remi.lapeyre	link	issue35775 messages
2019-05-24 15:35:28	remi.lapeyre	create