Author oscarbenjamin
Recipients agthorr, belopolsky, christian.heimes, ethan.furman, gregory.p.smith, mark.dickinson, oscarbenjamin, pitrou, ronaldoussoren, sjt, steven.daprano, stutzbach, terry.reedy, tshepang, vajrasky
Date 2013-08-19.13:15:55
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <CAHVvXxSoDxb2MWeQfzRdBhFHGj-f3CAfPs0BHdGSVLwbrsZH9A@mail.gmail.com>
In-reply-to <1376884746.55.0.0278802992198.issue18606@psf.upfronthosting.co.za>
Content
I've just checked over the new patch and it all looks good to me apart
from one quibble.

It is documented that statistics.sum() will respect rounding errors
due to decimal context (returning the same result that sum() would). I
would prefer it if statistics.sum would use compensated summation with
Decimals since in my view they are a floating point number
representation and are subject to arithmetic rounding error in the
same way as floats. I expect that the implementation of sum() will
change but it would be good to at least avoid documenting this IMO
undesirable behaviour.

So with the current implementation I can do:

>>> from decimal import Decimal as D, localcontext, Context, ROUND_DOWN
>>> data = [D("0.1375"), D("0.2108"), D("0.3061"), D("0.0419")]
>>> print(statistics.variance(data))
0.01252909583333333333333333333
>>> with localcontext() as ctx:
...     ctx.prec = 2
...     ctx.rounding = ROUND_DOWN
...     print(statistics.variance(data))
...
0.010

The final result is not accurate to 2 d.p. rounded down. This is
because the decimal context has affected all intermediate computations
not just the final result. Why would anyone prefer this behaviour over
an implementation that could compensate for rounding errors and return
a more accurate result?

If statistics.sum and statistics.add_partial are modified in such a
way that they use the same compensated algorithm for Decimals as they
would for floats then you can have the following:

>>> statistics.sum([D('-1e50'), D('1'), D('1e50')])
Decimal('1')

whereas it currently does:

>>> statistics.sum([D('-1e50'), D('1'), D('1e50')])
Decimal('0E+23')
>>> statistics.sum([D('-1e50'), D('1'), D('1e50')]) == 0
True

It still doesn't fix the variance calculation but I'm not sure exactly
how to do better than the current implementation for that. Either way
though I don't think the current behaviour should be a documented
guarantee. The meaning of "honouring the context" implies using a
specific sum algorithm, since an alternative algorithm would give a
different result and I don't think you should constrain yourself in
that way.
History
Date User Action Args
2013-08-19 13:15:55oscarbenjaminsetrecipients: + oscarbenjamin, terry.reedy, gregory.p.smith, ronaldoussoren, mark.dickinson, belopolsky, pitrou, agthorr, christian.heimes, stutzbach, steven.daprano, sjt, ethan.furman, tshepang, vajrasky
2013-08-19 13:15:55oscarbenjaminlinkissue18606 messages
2013-08-19 13:15:55oscarbenjamincreate