Message 334039 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	steven.daprano
Recipients	gregory.p.smith, ncoghlan, oscarbenjamin, remi.lapeyre, rhettinger, steven.daprano, wolma
Date	2019-01-19.06:30:43
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<20190119063036.GR13616@ando.pearwood.info>
In-reply-to	<1547825225.26.0.147780198807.issue20479@roundup.psfhosted.org>

Content
> Is this proposal still relevant? Yes. As Raymond says, deciding on a good API is the hard part. Its relatively simple to change a poor implementation for a better one, but backwards compatibility means that changing the API is very difficult. I would find it very helpful if somebody has time to do a survey of other statistics libraries or languages (e.g. numpy, R, Octave, Matlab, SAS etc) and see how they handle data with weights. - what APIs do they provide? - do they require weights to be positive integers, or do they support arbitrary float weights? - including negative weights? (what physical meaning does a negative weight have?) At the moment, a simple helper function seems to do the trick for non-negative integer weights: def flatten(items): for item in items: yield from item py> data = [1, 2, 3, 4] py> weights = [1, 4, 1, 2] py> statistics.mean(flatten([x]w for x, w in zip(data, weights))) 2.5 In principle, the implementation could be as simple as a single recursive call: def mean(data, weights=None): if weights is not None: return mean(flatten([x]w for x, w in zip(data, weights))) # base case without weights is unchanged or perhaps it could be just a recipe in the docs.

> Is this proposal still relevant?

Yes.

As Raymond says, deciding on a good API is the hard part. Its relatively 
simple to change a poor implementation for a better one, but backwards 
compatibility means that changing the API is very difficult.

I would find it very helpful if somebody has time to do a survey of 
other statistics libraries or languages (e.g. numpy, R, Octave, Matlab, 
SAS etc) and see how they handle data with weights.

- what APIs do they provide?
- do they require weights to be positive integers, or do they 
  support arbitrary float weights?
- including negative weights? 
  (what physical meaning does a negative weight have?)

At the moment, a simple helper function seems to do the trick for 
non-negative integer weights:

def flatten(items):
    for item in items:
        yield from item

py> data = [1, 2, 3, 4]
py> weights = [1, 4, 1, 2]
py> statistics.mean(flatten([x]*w for x, w in zip(data, weights)))
2.5

In principle, the implementation could be as simple as a single 
recursive call:

def mean(data, weights=None):
    if weights is not None:
        return mean(flatten([x]*w for x, w in zip(data, weights)))
    # base case without weights is unchanged

or perhaps it could be just a recipe in the docs.

History
Date	User	Action	Args
2019-01-19 06:30:45	steven.daprano	set	recipients: + steven.daprano, rhettinger, gregory.p.smith, ncoghlan, oscarbenjamin, wolma, remi.lapeyre
2019-01-19 06:30:43	steven.daprano	link	issue20479 messages
2019-01-19 06:30:43	steven.daprano	create