Message 334101 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	oscarbenjamin
Recipients	gregory.p.smith, ncoghlan, oscarbenjamin, remi.lapeyre, rhettinger, steven.daprano, wolma
Date	2019-01-20.21:11:48
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<CAHVvXxQFLpafZ+u5uyw-mN3OK=zfV7Y_CcgB9prOsrTDHZadcA@mail.gmail.com>
In-reply-to	<20190119063036.GR13616@ando.pearwood.info>

Content
> I would find it very helpful if somebody has time to do a survey of > other statistics libraries or languages (e.g. numpy, R, Octave, Matlab, > SAS etc) and see how they handle data with weights. Numpy has only sporadic support for this. The standard mean function does not have any way to provide weights but there is an alternative called average that computes the mean and has an optional weights argument. I've never heard of average before searching for "numpy weighted mean" just now. Numpy's API often has bits of old cruft from where various numerical packages were joined together so I'm not sure they would recommend their current approach. I don't think there are any other numpy functions for providing weighted statistics. Statsmodels does provide an API for this as explained here: https://stackoverflow.com/a/36464881/9450991 Their API is that you create an object with data and weights and can then call methods/attributes for statistics. Matlab doesn't support even weighted mean as far as I can tell. There is wmean on the matlab file exchange: > > - what APIs do they provide? > - do they require weights to be positive integers, or do they > support arbitrary float weights? > - including negative weights? > (what physical meaning does a negative weight have?) > > At the moment, a simple helper function seems to do the trick for > non-negative integer weights: > > def flatten(items): > for item in items: > yield from item > > py> data = [1, 2, 3, 4] > py> weights = [1, 4, 1, 2] > py> statistics.mean(flatten([x]w for x, w in zip(data, weights))) > 2.5 > > In principle, the implementation could be as simple as a single > recursive call: > > def mean(data, weights=None): > if weights is not None: > return mean(flatten([x]w for x, w in zip(data, weights))) > # base case without weights is unchanged > > or perhaps it could be just a recipe in the docs. > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <https://bugs.python.org/issue20479> > _______________________________________

> I would find it very helpful if somebody has time to do a survey of
> other statistics libraries or languages (e.g. numpy, R, Octave, Matlab,
> SAS etc) and see how they handle data with weights.

Numpy has only sporadic support for this. The standard mean function
does not have any way to provide weights but there is an alternative
called average that computes the mean and has an optional weights
argument. I've never heard of average before searching for "numpy
weighted mean" just now. Numpy's API often has bits of old cruft from
where various numerical packages were joined together so I'm not sure
they would recommend their current approach. I don't think there are
any other numpy functions for providing weighted statistics.

Statsmodels does provide an API for this as explained here:
https://stackoverflow.com/a/36464881/9450991
Their API is that you create an object with data and weights and can
then call methods/attributes for statistics.

Matlab doesn't support even weighted mean as far as I can tell. There
is wmean on the matlab file exchange:

>
> - what APIs do they provide?
> - do they require weights to be positive integers, or do they
>   support arbitrary float weights?
> - including negative weights?
>   (what physical meaning does a negative weight have?)
>
> At the moment, a simple helper function seems to do the trick for
> non-negative integer weights:
>
> def flatten(items):
>     for item in items:
>         yield from item
>
> py> data = [1, 2, 3, 4]
> py> weights = [1, 4, 1, 2]
> py> statistics.mean(flatten([x]*w for x, w in zip(data, weights)))
> 2.5
>
> In principle, the implementation could be as simple as a single
> recursive call:
>
> def mean(data, weights=None):
>     if weights is not None:
>         return mean(flatten([x]*w for x, w in zip(data, weights)))
>     # base case without weights is unchanged
>
> or perhaps it could be just a recipe in the docs.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue20479>
> _______________________________________

History
Date	User	Action	Args
2019-01-20 21:11:49	oscarbenjamin	set	recipients: + oscarbenjamin, rhettinger, gregory.p.smith, ncoghlan, steven.daprano, wolma, remi.lapeyre
2019-01-20 21:11:48	oscarbenjamin	link	issue20479 messages
2019-01-20 21:11:48	oscarbenjamin	create