This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author wolma
Recipients gregory.p.smith, ncoghlan, oscarbenjamin, serhiy.storchaka, steven.daprano, wolma
Date 2014-02-02.22:27:09
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1391380029.88.0.0457298939895.issue20479@psf.upfronthosting.co.za>
In-reply-to
Content
> -----Ursprüngliche Nachricht-----
> Von: Steven D'Aprano [mailto:report@bugs.python.org]
> Gesendet: Sonntag, 2. Februar 2014 12:55
> An: wolfgang.maier@biologie.uni-freiburg.de
> Betreff: [issue20479] Efficiently support weight/frequency mappings in the
> statistics module
> 
> 
> Steven D'Aprano added the comment:
> 
> Off the top of my head, I can think of three APIs:
> 
> (1) separate functions, as Nick suggests:
> mean vs weighted_mean, stdev vs weighted_stdev
> 
> (2) treat mappings as an implied (value, frequency) pairs
> 

(2) is clearly my favourite. (1) may work well, if you have a module with a small fraction of functions, for which you need an alternate API.
In the statistics module, however, almost all of its current functions could profit from having a way to treat mappings specially.
In such a case, (1) is prone to create lots of redundancies.

I do not share Oscar's opinion that

> apart from mode() the implementation of each function on
> map-format data will be completely different from the iterable version
> so you'd want to have it as a separate function at least internally
> anyway.

Consider _sum's current code (docstring omitted for brevity):
def _sum(data, start=0):
    n, d = _exact_ratio(start)
    T = type(start)
    partials = {d: n}  # map {denominator: sum of numerators}
    # Micro-optimizations.
    coerce_types = _coerce_types
    exact_ratio = _exact_ratio
    partials_get = partials.get
    # Add numerators for each denominator, and track the "current" type.
    for x in data:
        T = _coerce_types(T, type(x))
        n, d = exact_ratio(x)
        partials[d] = partials_get(d, 0) + n
    if None in partials:
        assert issubclass(T, (float, Decimal))
        assert not math.isfinite(partials[None])
        return T(partials[None])
    total = Fraction()
    for d, n in sorted(partials.items()):
        total += Fraction(n, d)
    if issubclass(T, int):
        assert total.denominator == 1
        return T(total.numerator)
    if issubclass(T, Decimal):
        return T(total.numerator)/total.denominator
    return T(total)

all you'd have to do to treat mappings as proposed here is to add a check whether we are dealing with a mapping, then in this case, instead of the for loop:

    for x in data:
        T = _coerce_types(T, type(x))
        n, d = exact_ratio(x)
        partials[d] = partials_get(d, 0) + n

use this:

    for x,m in data.items():
        T = _coerce_types(T, type(x))
        n, d = exact_ratio(x)
        partials[d] = partials_get(d, 0) + n*m

and no other changes (though I haven't tested this carefully).

Wolfgang
History
Date User Action Args
2014-02-02 22:27:09wolmasetrecipients: + wolma, gregory.p.smith, ncoghlan, steven.daprano, serhiy.storchaka, oscarbenjamin
2014-02-02 22:27:09wolmasetmessageid: <1391380029.88.0.0457298939895.issue20479@psf.upfronthosting.co.za>
2014-02-02 22:27:09wolmalinkissue20479 messages
2014-02-02 22:27:09wolmacreate