Message 298992 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	steven.daprano
Recipients	gerion, mark.dickinson, steven.daprano
Date	2017-07-24.16:54:09
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<20170724165404.GR3149@ando.pearwood.info>
In-reply-to	<1500860763.43.0.847214535342.issue30999@psf.upfronthosting.co.za>

Content
Thanks for explaining your use-case. Although the median_* functions don't perform arithmetic on their data, they are still conceptually mathematical functions that operate on numbers and I'm reluctant to support arbitrary objects with a key function without a solid reason. In your example, I think there are existing ways to get the result you want: (1) Use a dict: data = dict([(1, ['Anna']), (3, ['Paul', 'Henry']), (4, ['Kate'])]) people = data[median_low(data)] (2) Use a custom numeric type with associated data: class MyNum(int): def __new__(cls, num, data): instance = super().__new__(cls, num) instance.data = data return instance data = [MyNum(1, ['Anna']), MyNum(3, ['Paul', 'Henry']), MyNum(4, ['Kate'])] people = median_low(data).data As for your second example, do you have a use-case for wanting to know the position of the median in the original, unsorted list? When would that be useful? One other reason for my reluctance: although median_low and median_high guarantee to only return an actual data point, that's a fairly special case. There are other order statistics (such as quartiles, quantiles, etc) which are conceptually related to median but don't necessarily return a data value. Indeed, the regular median() function doesn't always do so. I would be reluctant for median() and median_low() to have different signatures without an excellent reason. I'm not completely ruling this out. One thing which might sway me is if there are other languages or statistics libraries which offer this feature. (I say might, not that it definitely will.)

Thanks for explaining your use-case.

Although the median_* functions don't perform arithmetic on their data, 
they are still conceptually mathematical functions that operate on 
numbers and I'm reluctant to support arbitrary objects with a key 
function without a solid reason. In your example, I think there are 
existing ways to get the result you want:

(1) Use a dict:

data = dict([(1, ['Anna']), (3, ['Paul', 'Henry']), (4, ['Kate'])])
people = data[median_low(data)]

(2) Use a custom numeric type with associated data:

class MyNum(int):
    def __new__(cls, num, data):
        instance = super().__new__(cls, num)
        instance.data = data
        return instance

data = [MyNum(1, ['Anna']), MyNum(3, ['Paul', 'Henry']), 
        MyNum(4, ['Kate'])]

people = median_low(data).data

As for your second example, do you have a use-case for wanting to know 
the position of the median in the original, unsorted list? When would 
that be useful?

One other reason for my reluctance: although median_low and median_high 
guarantee to only return an actual data point, that's a fairly special 
case. There are other order statistics (such as quartiles, quantiles, 
etc) which are conceptually related to median but don't necessarily 
return a data value. Indeed, the regular median() function doesn't 
always do so. I would be reluctant for median() and median_low() to have 
different signatures without an excellent reason.

I'm not completely ruling this out. One thing which might sway me is if 
there are other languages or statistics libraries which offer this 
feature. (I say *might*, not that it definitely will.)

History
Date	User	Action	Args
2017-07-24 16:54:09	steven.daprano	set	recipients: + steven.daprano, mark.dickinson, gerion
2017-07-24 16:54:09	steven.daprano	link	issue30999 messages
2017-07-24 16:54:09	steven.daprano	create