This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author oscarbenjamin
Recipients oscarbenjamin, steven.daprano
Date 2014-02-03.12:46:18
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
The mean/variance functions in the statistics module don't quite round correctly.

The reasons for this are that although exact rational arithmetic is used internally in the _sum function it is not used throughout the module. In particular the _sum function should be changed to return an exact result and exact arithmetic should be used right up to the point before returning to the user at which point a rounding/coercion should be used to give the user their answer in the appropriate type correctly rounded once.

Using exact arithmetic everywhere makes it possible to replace all of the variance* functions with single-pass algorithms based on the computational formula for variance which should be more efficient as well.

For example the following implements pvariance so that it returns a perfectly rounded result for float input and output in a single pass:

def pvariance(data):
    sx = 0
    sx2 = 0
    for n, x in enumerate(map(Fraction, data), 1):
        sx += x
        sx2 += x ** 2
    Ex = sx / n
    Ex2 = sx2 / n
    var = Ex2 - Ex ** 2
    return float(var)

Comparing the above with the statistics module:

>>> pvariance([0, 0, 1])
>>> statistics.pvariance([0, 0, 1])

The true answer is:

>>> from fractions import Fraction as F
>>> float(statistics.pvariance([F(0), F(0), F(1)]))

The logic in the _sum function for computing exact integer ratios and coercing back to the output type could be moved into utility functions so that it does not need to be duplicated.

Some examples of rounding issues:

>>> from statistics import variance, mean
>>> from decimal import Decimal as D, getcontext
>>> from fractions import Fraction as F

Variance with ints or floats returns a float but the float is not quite the nearest possible float:

>>> variance([0, 0, 2])
>>> float(variance([F(0), F(0), F(2)]))  # true result rounded once

Another example with Decimal:

>>> getcontext().prec = 5
>>> getcontext()
Context(prec=5, rounding=ROUND_HALF_EVEN, Emin=-999999999, Emax=999999999, capitals=1, clamp=0, flags=[Rounded, Inexact], traps=[DivisionByZero, Overflow, InvalidOperation])

>>> variance([D(0), D(0), D(2)] * 2)  # Rounded down instead of up
>>> r = (variance([F(0), F(0), F(2)] * 2))
>>> D(r.numerator) / r.denominator  # Correctly rounded

The mean function may also not be correctly rounded:

>>> getcontext().prec = 2
>>> r = mean((F('1.2'), F('1.3'), F('1.55')))
>>> r
Fraction(27, 20)
>>> D(r.numerator) / r.denominator # Correctly rounded
>>> mean([D('1.2'), D('1.3'), D('1.55')])
Date User Action Args
2014-02-03 12:46:19oscarbenjaminsetrecipients: + oscarbenjamin, steven.daprano
2014-02-03 12:46:19oscarbenjaminsetmessageid: <>
2014-02-03 12:46:19oscarbenjaminlinkissue20499 messages
2014-02-03 12:46:18oscarbenjamincreate