Issue 24068: statistics module - incorrect results with boolean input

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/68256

classification

Title:	statistics module - incorrect results with boolean input
Type:	behavior	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.4, Python 3.5

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	steven.daprano	Nosy List:	della, mark.dickinson, r.david.murray, rhettinger, steven.daprano, wolma
Priority:	normal	Keywords:	patch

Created on 2015-04-28 08:53 by wolma, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
statistics._sum.patch	wolma, 2015-04-28 08:56		review
statistics._sum.v2.patch	wolma, 2015-05-02 20:05		review

Messages (6)
msg242169 - (view)	Author: Wolfgang Maier (wolma) *	Date: 2015-04-28 08:53
the mean function in the statistics module gives nonsensical results with boolean values in the input, e.g.: >>> mean([True, True, False, False]) 0.25 >>> mean([True, 1027]) 0.5 This is an issue with the module's internal _sum function that mean relies on. Other functions relying on _sum are affected more subtly, e.g.: >>> variance([1, 1027, 0]) 351234.3333333333 >>> variance([True, 1027, 0]) 351234.3333333334 The problem with _sum is that it will try to coerce its result to any non-int type found in the input (so bool in the examples), but bool(1028) is just True so information gets lost. I've attached a patch preventing the type cast when it would be to bool. I don't have time to write a separate test though so if somebody wants to take over .. :)
msg242362 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2015-05-02 01:00
I wonder if it would be better to reject Bool data in this context? Bool is only a numeric type for historical reasons.
msg242370 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2015-05-02 02:20
The patch seems simple and straightforward enough. It just needs some tests, and a Round Tuit.
msg242428 - (view)	Author: Wolfgang Maier (wolma) *	Date: 2015-05-02 20:05
uploading an alternate, possibly slightly clearer version of the patch
msg242451 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2015-05-03 06:09
> I wonder if it would be better to reject Bool data in this context? It's not uncommon (and quite useful) in NumPy world to compute basic statistics on arrays of boolean dtype: the sum of such an array gives a count of the `True`s, and the mean gives the proportion of `True` entries. I think it would be handy to allow the statistics module to work with lists of bools, if possible.
msg315095 - (view)	Author: Wolfgang Maier (wolma) *	Date: 2018-04-08 20:03
Fixed as part of resolving issue 25177.

History
Date	User	Action	Args
2022-04-11 14:58:16	admin	set	github: 68256
2018-04-08 20:03:14	wolma	set	status: open -> closed resolution: fixed messages: + msg315095 stage: test needed -> resolved
2016-05-02 21:41:14	r.david.murray	link	issue26913 superseder
2015-05-20 12:48:52	della	set	nosy: + della
2015-05-11 06:25:01	rhettinger	set	nosy: + rhettinger
2015-05-03 06:09:27	mark.dickinson	set	nosy: + mark.dickinson messages: + msg242451
2015-05-02 20:05:30	wolma	set	files: + statistics._sum.v2.patch messages: + msg242428
2015-05-02 02:20:49	steven.daprano	set	stage: test needed
2015-05-02 02:20:27	steven.daprano	set	assignee: steven.daprano messages: + msg242370
2015-05-02 01:00:15	r.david.murray	set	nosy: + r.david.murray messages: + msg242362
2015-04-28 08:56:15	wolma	set	files: + statistics._sum.patch
2015-04-28 08:54:54	wolma	set	files: - statistics._sum.patch
2015-04-28 08:53:26	wolma	create