Message210038
> -----Ursprüngliche Nachricht-----
> Von: Steven D'Aprano [mailto:report@bugs.python.org]
> Gesendet: Sonntag, 2. Februar 2014 12:55
> An: wolfgang.maier@biologie.uni-freiburg.de
> Betreff: [issue20479] Efficiently support weight/frequency mappings in the
> statistics module
>
>
> Steven D'Aprano added the comment:
>
> Off the top of my head, I can think of three APIs:
>
> (1) separate functions, as Nick suggests:
> mean vs weighted_mean, stdev vs weighted_stdev
>
> (2) treat mappings as an implied (value, frequency) pairs
>
(2) is clearly my favourite. (1) may work well, if you have a module with a small fraction of functions, for which you need an alternate API.
In the statistics module, however, almost all of its current functions could profit from having a way to treat mappings specially.
In such a case, (1) is prone to create lots of redundancies.
I do not share Oscar's opinion that
> apart from mode() the implementation of each function on
> map-format data will be completely different from the iterable version
> so you'd want to have it as a separate function at least internally
> anyway.
Consider _sum's current code (docstring omitted for brevity):
def _sum(data, start=0):
n, d = _exact_ratio(start)
T = type(start)
partials = {d: n} # map {denominator: sum of numerators}
# Micro-optimizations.
coerce_types = _coerce_types
exact_ratio = _exact_ratio
partials_get = partials.get
# Add numerators for each denominator, and track the "current" type.
for x in data:
T = _coerce_types(T, type(x))
n, d = exact_ratio(x)
partials[d] = partials_get(d, 0) + n
if None in partials:
assert issubclass(T, (float, Decimal))
assert not math.isfinite(partials[None])
return T(partials[None])
total = Fraction()
for d, n in sorted(partials.items()):
total += Fraction(n, d)
if issubclass(T, int):
assert total.denominator == 1
return T(total.numerator)
if issubclass(T, Decimal):
return T(total.numerator)/total.denominator
return T(total)
all you'd have to do to treat mappings as proposed here is to add a check whether we are dealing with a mapping, then in this case, instead of the for loop:
for x in data:
T = _coerce_types(T, type(x))
n, d = exact_ratio(x)
partials[d] = partials_get(d, 0) + n
use this:
for x,m in data.items():
T = _coerce_types(T, type(x))
n, d = exact_ratio(x)
partials[d] = partials_get(d, 0) + n*m
and no other changes (though I haven't tested this carefully).
Wolfgang |
|
Date |
User |
Action |
Args |
2014-02-02 22:27:09 | wolma | set | recipients:
+ wolma, gregory.p.smith, ncoghlan, steven.daprano, serhiy.storchaka, oscarbenjamin |
2014-02-02 22:27:09 | wolma | set | messageid: <1391380029.88.0.0457298939895.issue20479@psf.upfronthosting.co.za> |
2014-02-02 22:27:09 | wolma | link | issue20479 messages |
2014-02-02 22:27:09 | wolma | create | |
|