Message 210038 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	wolma
Recipients	gregory.p.smith, ncoghlan, oscarbenjamin, serhiy.storchaka, steven.daprano, wolma
Date	2014-02-02.22:27:09
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1391380029.88.0.0457298939895.issue20479@psf.upfronthosting.co.za>
In-reply-to

Content
> -----Ursprüngliche Nachricht----- > Von: Steven D'Aprano [mailto:report@bugs.python.org] > Gesendet: Sonntag, 2. Februar 2014 12:55 > An: wolfgang.maier@biologie.uni-freiburg.de > Betreff: [issue20479] Efficiently support weight/frequency mappings in the > statistics module > > > Steven D'Aprano added the comment: > > Off the top of my head, I can think of three APIs: > > (1) separate functions, as Nick suggests: > mean vs weighted_mean, stdev vs weighted_stdev > > (2) treat mappings as an implied (value, frequency) pairs > (2) is clearly my favourite. (1) may work well, if you have a module with a small fraction of functions, for which you need an alternate API. In the statistics module, however, almost all of its current functions could profit from having a way to treat mappings specially. In such a case, (1) is prone to create lots of redundancies. I do not share Oscar's opinion that > apart from mode() the implementation of each function on > map-format data will be completely different from the iterable version > so you'd want to have it as a separate function at least internally > anyway. Consider _sum's current code (docstring omitted for brevity): def _sum(data, start=0): n, d = _exact_ratio(start) T = type(start) partials = {d: n} # map {denominator: sum of numerators} # Micro-optimizations. coerce_types = _coerce_types exact_ratio = _exact_ratio partials_get = partials.get # Add numerators for each denominator, and track the "current" type. for x in data: T = _coerce_types(T, type(x)) n, d = exact_ratio(x) partials[d] = partials_get(d, 0) + n if None in partials: assert issubclass(T, (float, Decimal)) assert not math.isfinite(partials[None]) return T(partials[None]) total = Fraction() for d, n in sorted(partials.items()): total += Fraction(n, d) if issubclass(T, int): assert total.denominator == 1 return T(total.numerator) if issubclass(T, Decimal): return T(total.numerator)/total.denominator return T(total) all you'd have to do to treat mappings as proposed here is to add a check whether we are dealing with a mapping, then in this case, instead of the for loop: for x in data: T = _coerce_types(T, type(x)) n, d = exact_ratio(x) partials[d] = partials_get(d, 0) + n use this: for x,m in data.items(): T = _coerce_types(T, type(x)) n, d = exact_ratio(x) partials[d] = partials_get(d, 0) + n*m and no other changes (though I haven't tested this carefully). Wolfgang

> -----Ursprüngliche Nachricht-----
> Von: Steven D'Aprano [mailto:report@bugs.python.org]
> Gesendet: Sonntag, 2. Februar 2014 12:55
> An: wolfgang.maier@biologie.uni-freiburg.de
> Betreff: [issue20479] Efficiently support weight/frequency mappings in the
> statistics module
> 
> 
> Steven D'Aprano added the comment:
> 
> Off the top of my head, I can think of three APIs:
> 
> (1) separate functions, as Nick suggests:
> mean vs weighted_mean, stdev vs weighted_stdev
> 
> (2) treat mappings as an implied (value, frequency) pairs
> 

(2) is clearly my favourite. (1) may work well, if you have a module with a small fraction of functions, for which you need an alternate API.
In the statistics module, however, almost all of its current functions could profit from having a way to treat mappings specially.
In such a case, (1) is prone to create lots of redundancies.

I do not share Oscar's opinion that

> apart from mode() the implementation of each function on
> map-format data will be completely different from the iterable version
> so you'd want to have it as a separate function at least internally
> anyway.

Consider _sum's current code (docstring omitted for brevity):
def _sum(data, start=0):
    n, d = _exact_ratio(start)
    T = type(start)
    partials = {d: n}  # map {denominator: sum of numerators}
    # Micro-optimizations.
    coerce_types = _coerce_types
    exact_ratio = _exact_ratio
    partials_get = partials.get
    # Add numerators for each denominator, and track the "current" type.
    for x in data:
        T = _coerce_types(T, type(x))
        n, d = exact_ratio(x)
        partials[d] = partials_get(d, 0) + n
    if None in partials:
        assert issubclass(T, (float, Decimal))
        assert not math.isfinite(partials[None])
        return T(partials[None])
    total = Fraction()
    for d, n in sorted(partials.items()):
        total += Fraction(n, d)
    if issubclass(T, int):
        assert total.denominator == 1
        return T(total.numerator)
    if issubclass(T, Decimal):
        return T(total.numerator)/total.denominator
    return T(total)

all you'd have to do to treat mappings as proposed here is to add a check whether we are dealing with a mapping, then in this case, instead of the for loop:

    for x in data:
        T = _coerce_types(T, type(x))
        n, d = exact_ratio(x)
        partials[d] = partials_get(d, 0) + n

use this:

    for x,m in data.items():
        T = _coerce_types(T, type(x))
        n, d = exact_ratio(x)
        partials[d] = partials_get(d, 0) + n*m

and no other changes (though I haven't tested this carefully).

Wolfgang

History
Date	User	Action	Args
2014-02-02 22:27:09	wolma	set	recipients: + wolma, gregory.p.smith, ncoghlan, steven.daprano, serhiy.storchaka, oscarbenjamin
2014-02-02 22:27:09	wolma	set	messageid: <1391380029.88.0.0457298939895.issue20479@psf.upfronthosting.co.za>
2014-02-02 22:27:09	wolma	link	issue20479 messages
2014-02-02 22:27:09	wolma	create