This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author oscarbenjamin
Recipients gregory.p.smith, ncoghlan, oscarbenjamin, remi.lapeyre, rhettinger, steven.daprano, wolma
Date 2019-01-20.21:11:48
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <CAHVvXxQFLpafZ+u5uyw-mN3OK=zfV7Y_CcgB9prOsrTDHZadcA@mail.gmail.com>
In-reply-to <20190119063036.GR13616@ando.pearwood.info>
Content
> I would find it very helpful if somebody has time to do a survey of
> other statistics libraries or languages (e.g. numpy, R, Octave, Matlab,
> SAS etc) and see how they handle data with weights.

Numpy has only sporadic support for this. The standard mean function
does not have any way to provide weights but there is an alternative
called average that computes the mean and has an optional weights
argument. I've never heard of average before searching for "numpy
weighted mean" just now. Numpy's API often has bits of old cruft from
where various numerical packages were joined together so I'm not sure
they would recommend their current approach. I don't think there are
any other numpy functions for providing weighted statistics.

Statsmodels does provide an API for this as explained here:
https://stackoverflow.com/a/36464881/9450991
Their API is that you create an object with data and weights and can
then call methods/attributes for statistics.

Matlab doesn't support even weighted mean as far as I can tell. There
is wmean on the matlab file exchange:

>
> - what APIs do they provide?
> - do they require weights to be positive integers, or do they
>   support arbitrary float weights?
> - including negative weights?
>   (what physical meaning does a negative weight have?)
>
> At the moment, a simple helper function seems to do the trick for
> non-negative integer weights:
>
> def flatten(items):
>     for item in items:
>         yield from item
>
> py> data = [1, 2, 3, 4]
> py> weights = [1, 4, 1, 2]
> py> statistics.mean(flatten([x]*w for x, w in zip(data, weights)))
> 2.5
>
> In principle, the implementation could be as simple as a single
> recursive call:
>
> def mean(data, weights=None):
>     if weights is not None:
>         return mean(flatten([x]*w for x, w in zip(data, weights)))
>     # base case without weights is unchanged
>
> or perhaps it could be just a recipe in the docs.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue20479>
> _______________________________________
History
Date User Action Args
2019-01-20 21:11:49oscarbenjaminsetrecipients: + oscarbenjamin, rhettinger, gregory.p.smith, ncoghlan, steven.daprano, wolma, remi.lapeyre
2019-01-20 21:11:48oscarbenjaminlinkissue20479 messages
2019-01-20 21:11:48oscarbenjamincreate