Message 359323 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	reed
Recipients	reed
Date	2020-01-05.05:34:51
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1578202492.29.0.981939629191.issue39218@roundup.psfhosted.org>
In-reply-to

Content
If a float32 Numpy array is passed to statistics.variance(), an assertion failure occurs. For example: import statistics import numpy as np x = np.array([1, 2], dtype=np.float32) statistics.variance(x) The assertion error is: assert T == U and count == count2 Even if you convert x to a list with `x = list(x)`, the issue still occurs. The issue is caused by the following lines in statistics.py (https://github.com/python/cpython/blob/ec007cb43faf5f33d06efbc28152c7fdcb2edb9c/Lib/statistics.py#L687-L691): T, total, count = _sum((x-c)2 for x in data) # The following sum should mathematically equal zero, but due to rounding # error may not. U, total2, count2 = _sum((x-c) for x in data) assert T == U and count == count2 When a float32 Numpy value is squared in the term (x-c)2, it turns into a float64 value, causing the `T == U` assertion to fail. I think the best way to fix this would be to replace (x-c)*2 with (x-c)(x-c). This fix would no longer assume the input's ** operator returns the same type.

If a float32 Numpy array is passed to statistics.variance(), an assertion failure occurs. For example:

    import statistics
    import numpy as np
    x = np.array([1, 2], dtype=np.float32)
    statistics.variance(x)

The assertion error is:

    assert T == U and count == count2

Even if you convert x to a list with `x = list(x)`, the issue still occurs. The issue is caused by the following lines in statistics.py (https://github.com/python/cpython/blob/ec007cb43faf5f33d06efbc28152c7fdcb2edb9c/Lib/statistics.py#L687-L691):

    T, total, count = _sum((x-c)**2 for x in data)
    # The following sum should mathematically equal zero, but due to rounding
    # error may not.
    U, total2, count2 = _sum((x-c) for x in data)
    assert T == U and count == count2

When a float32 Numpy value is squared in the term (x-c)**2, it turns into a float64 value, causing the `T == U` assertion to fail. I think the best way to fix this would be to replace (x-c)**2 with (x-c)*(x-c). This fix would no longer assume the input's ** operator returns the same type.

History
Date	User	Action	Args
2020-01-05 05:34:52	reed	set	recipients: + reed
2020-01-05 05:34:52	reed	set	messageid: <1578202492.29.0.981939629191.issue39218@roundup.psfhosted.org>
2020-01-05 05:34:52	reed	link	issue39218 messages
2020-01-05 05:34:51	reed	create