New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
statistics module - incorrect results with boolean input #68256
Comments
the mean function in the statistics module gives nonsensical results with boolean values in the input, e.g.: >>> mean([True, True, False, False])
0.25
>>> mean([True, 1027])
0.5 This is an issue with the module's internal _sum function that mean relies on. Other functions relying on _sum are affected more subtly, e.g.: >>> variance([1, 1027, 0])
351234.3333333333
>>> variance([True, 1027, 0])
351234.3333333334 The problem with _sum is that it will try to coerce its result to any non-int type found in the input (so bool in the examples), but bool(1028) is just True so information gets lost. I've attached a patch preventing the type cast when it would be to bool. |
I wonder if it would be better to reject Bool data in this context? Bool is only a numeric type for historical reasons. |
The patch seems simple and straightforward enough. It just needs some tests, and a Round Tuit. |
uploading an alternate, possibly slightly clearer version of the patch |
It's not uncommon (and quite useful) in NumPy world to compute basic statistics on arrays of boolean dtype: the sum of such an array gives a count of the |
Fixed as part of resolving bpo-25177. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: