statistics module - incorrect results with boolean input #68256

wm75 · 2015-04-28T08:53:27Z

BPO	24068
Nosy	@rhettinger, @mdickinson, @stevendaprano, @bitdancer, @wm75
Files	statistics._sum.patch statistics._sum.v2.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/stevendaprano'
closed_at = <Date 2018-04-08.20:03:14.377>
created_at = <Date 2015-04-28.08:53:26.540>
labels = ['type-bug', 'library']
title = 'statistics module - incorrect results with boolean input'
updated_at = <Date 2018-04-08.20:03:14.376>
user = 'https://github.com/wm75'

bugs.python.org fields:

activity = <Date 2018-04-08.20:03:14.376>
actor = 'wolma'
assignee = 'steven.daprano'
closed = True
closed_date = <Date 2018-04-08.20:03:14.377>
closer = 'wolma'
components = ['Library (Lib)']
creation = <Date 2015-04-28.08:53:26.540>
creator = 'wolma'
dependencies = []
files = ['39221', '39269']
hgrepos = []
issue_num = 24068
keywords = ['patch']
message_count = 6.0
messages = ['242169', '242362', '242370', '242428', '242451', '315095']
nosy_count = 6.0
nosy_names = ['rhettinger', 'mark.dickinson', 'steven.daprano', 'r.david.murray', 'della', 'wolma']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue24068'
versions = ['Python 3.4', 'Python 3.5']

wm75 · 2015-04-28T08:53:25Z

the mean function in the statistics module gives nonsensical results with boolean values in the input, e.g.:

>>> mean([True, True, False, False])
0.25

>>> mean([True, 1027])
0.5

This is an issue with the module's internal _sum function that mean relies on. Other functions relying on _sum are affected more subtly, e.g.:

>>> variance([1, 1027, 0])
351234.3333333333

>>> variance([True, 1027, 0])
351234.3333333334

The problem with _sum is that it will try to coerce its result to any non-int type found in the input (so bool in the examples), but bool(1028) is just True so information gets lost.

I've attached a patch preventing the type cast when it would be to bool.
I don't have time to write a separate test though so if somebody wants to take over .. :)

bitdancer · 2015-05-02T01:00:16Z

I wonder if it would be better to reject Bool data in this context? Bool is only a numeric type for historical reasons.

stevendaprano · 2015-05-02T02:20:27Z

The patch seems simple and straightforward enough. It just needs some tests, and a Round Tuit.

wm75 · 2015-05-02T20:05:30Z

uploading an alternate, possibly slightly clearer version of the patch

mdickinson · 2015-05-03T06:09:27Z

I wonder if it would be better to reject Bool data in this context?

It's not uncommon (and quite useful) in NumPy world to compute basic statistics on arrays of boolean dtype: the sum of such an array gives a count of the Trues, and the mean gives the proportion of True entries. I think it would be handy to allow the statistics module to work with lists of bools, if possible.

wm75 · 2018-04-08T20:03:14Z

Fixed as part of resolving bpo-25177.

wm75 mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Apr 28, 2015

stevendaprano self-assigned this May 2, 2015

wm75 mannequin closed this as completed Apr 8, 2018

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

statistics module - incorrect results with boolean input #68256

statistics module - incorrect results with boolean input #68256

wm75 mannequin commented Apr 28, 2015

wm75 mannequin commented Apr 28, 2015

bitdancer commented May 2, 2015

stevendaprano commented May 2, 2015

wm75 mannequin commented May 2, 2015

mdickinson commented May 3, 2015

wm75 mannequin commented Apr 8, 2018

statistics module - incorrect results with boolean input #68256

statistics module - incorrect results with boolean input #68256

Comments

wm75 mannequin commented Apr 28, 2015

wm75 mannequin commented Apr 28, 2015

bitdancer commented May 2, 2015

stevendaprano commented May 2, 2015

wm75 mannequin commented May 2, 2015

mdickinson commented May 3, 2015

wm75 mannequin commented Apr 8, 2018