Issue 40855: statistics.stdev ignore xbar argument

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/85032

classification

Title:	statistics.stdev ignore xbar argument
Type:	behavior	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.10, Python 3.9, Python 3.8

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	steven.daprano	Nosy List:	Folket, miss-islington, rhettinger, steven.daprano
Priority:	normal	Keywords:	patch

Created on 2020-06-03 12:31 by Folket, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 20835	merged	rhettinger, 2020-06-12 18:30
PR 20862	merged	miss-islington, 2020-06-13 22:56
PR 20863	merged	miss-islington, 2020-06-13 22:56

Messages (13)
msg370659 - (view)	Author: Matti (Folket)	Date: 2020-06-03 12:31
statistics.variance also has the same problem. >>> import statistics >>> statistics.stdev([1,2]) 0.7071067811865476 >>> statistics.stdev([1,2], 3) 0.7071067811865476 >>> statistics.stdev([1,2], 1.5) 0.7071067811865476 should be 0.7071067811865476 2.23606797749979 0.5
msg370680 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2020-06-03 16:45
The relevant code is in the _ss() helper function: # The following sum should mathematically equal zero, but due to rounding # error may not. U, total2, count2 = _sum((x-c) for x in data) assert T == U and count == count2 total -= total2**2/len(data) The intent was to correct for small rounding errors, but the effect is to undo any xbar value that differs from the true mean. From a user point-of-view the xbar parameter should have two effects, saving the computation time for the mean and also giving the ability to recenter the stdev/variance around a different point. It does save a call to mean; however, that effort is mostly throw-away by the rounding adjustment code which does even more work than computing the mean. Likely, the fix for this is skip the rounding adjustment code if the user supplies an xbar value.
msg370684 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2020-06-03 20:17
Perhaps this would work: diff --git a/Lib/statistics.py b/Lib/statistics.py index c76a6ca519..93a4633464 100644 --- a/Lib/statistics.py +++ b/Lib/statistics.py @@ -682,8 +682,10 @@ def _ss(data, c=None): calculated from ``c`` as given. Use the second case with care, as it can lead to garbage results. """ - if c is None: - c = mean(data) + if c is not None: + T, total, count = _sum((x-c)2 for x in data) + return (T, total) + c = mean(data) T, total, count = _sum((x-c)2 for x in data) # The following sum should mathematically equal zero, but due to rounding # error may not. Matti, where do you get 0.5 as the expected outcome for the third example? The actual mean is 1.5, so I would expect the third case to give sqrt(2)/2 or 0.707.
msg370687 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2020-06-03 21:55
Thanks Raymond, that is the intended effect, and your analysis seems plausible.
msg370701 - (view)	Author: Matti (Folket)	Date: 2020-06-04 08:59
If we estimate the mean using a sample we loose one degree of freedom so it will be divided by N-1, while if we have the mean independent of the sample it should be divided by N to be unbiased. i.e. example 1 sqrt(((1-1.5)²+(2-1.5)²)/(2-1)) = 0.7... example 3 sqrt(((1-1.5)²+(2-1.5)²)/(2)) = 0.5
msg371262 - (view)	Author: Matti (Folket)	Date: 2020-06-11 11:22
Hi Raymond and Steven! I'm happy that you are solving this issue but do you have any comment on my previous answer?
msg371313 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2020-06-11 17:17
> do you have any comment on my previous answer? I see what you're trying to do but think that interpretation is surprising and is at odds with the existing and intended uses of the xbar argument. The goals were to allow the mean to be precomputed (common case) or to be recentered (uncommon). Neither case should have the effect of changing the divisor. We can't break existing code that assumes that stdev(data) is equal to stdev(data, xbar=mean(data)). >>> data = [1, 2] >>> stdev(data) 0.7071067811865476 >>> stdev(data, xbar=mean(data)) 0.7071067811865476
msg371471 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2020-06-13 22:55
New changeset d71ab4f73887a6e2b380ddbbfe35b600d236fd4a by Raymond Hettinger in branch 'master': bpo-40855: Fix ignored mu and xbar parameters (GH-20835) https://github.com/python/cpython/commit/d71ab4f73887a6e2b380ddbbfe35b600d236fd4a
msg371475 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2020-06-13 23:56
New changeset 55c1d21761e2e5feda5665065ea9e2280fa76113 by Miss Islington (bot) in branch '3.9': bpo-40855: Fix ignored mu and xbar parameters (GH-20835) (#GH-20862) https://github.com/python/cpython/commit/55c1d21761e2e5feda5665065ea9e2280fa76113
msg371476 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2020-06-13 23:57
New changeset 811e040b6e0241339545c2f055db8259b408802f by Miss Islington (bot) in branch '3.8': bpo-40855: Fix ignored mu and xbar parameters (GH-20835) (GH-20863) https://github.com/python/cpython/commit/811e040b6e0241339545c2f055db8259b408802f
msg371477 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2020-06-13 23:58
Thanks for the bug report 😊
msg371723 - (view)	Author: Matti (Folket)	Date: 2020-06-17 09:37
>I see what you're trying to do but think that interpretation is surprising >and is at odds with the existing and intended uses of the xbar argument. > >The goals were to allow the mean to be precomputed (common case) or to be recentered (uncommon). Neither case should have the effect of changing the divisor. > >We can't break existing code that assumes that stdev(data) is equal to stdev(data, xbar=mean(data)). Maybe the requirement are buged? It seems to me that recalculating the mean is a very niche use case. You will very little time on a call you do once. But what good is it to supply a re-centered mean if you get a wrong estimation of the standard deviation? If the mean is not the mean of the sample it was not calculated using the sample so there is no loos of degrees of freedom.
msg371724 - (view)	Author: Matti (Folket)	Date: 2020-06-17 09:38
I meant to write "pre-calculate".

History
Date	User	Action	Args
2022-04-11 14:59:31	admin	set	github: 85032
2020-06-17 09:38:49	Folket	set	messages: + msg371724
2020-06-17 09:37:31	Folket	set	messages: + msg371723
2020-06-13 23:58:57	rhettinger	set	status: open -> closed resolution: fixed messages: + msg371477 stage: patch review -> resolved
2020-06-13 23:57:24	rhettinger	set	messages: + msg371476
2020-06-13 23:56:23	rhettinger	set	messages: + msg371475
2020-06-13 22:56:19	miss-islington	set	pull_requests: + pull_request20054
2020-06-13 22:56:12	miss-islington	set	nosy: + miss-islington pull_requests: + pull_request20053
2020-06-13 22:55:56	rhettinger	set	messages: + msg371471
2020-06-12 18:30:10	rhettinger	set	keywords: + patch stage: patch review pull_requests: + pull_request20029
2020-06-11 17:17:39	rhettinger	set	messages: + msg371313
2020-06-11 11:22:07	Folket	set	messages: + msg371262
2020-06-04 08:59:42	Folket	set	messages: + msg370701
2020-06-03 21:55:09	steven.daprano	set	messages: + msg370687
2020-06-03 20:17:43	rhettinger	set	messages: + msg370684 components: + Library (Lib) versions: + Python 3.8, Python 3.9, Python 3.10, - Python 3.6
2020-06-03 16:45:19	rhettinger	set	assignee: steven.daprano messages: + msg370680 nosy: + steven.daprano
2020-06-03 16:26:38	rhettinger	set	nosy: + rhettinger
2020-06-03 12:31:21	Folket	create