Issue 39094: Add a default to statistics.mean and related functions

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/83275

classification

Title:	Add a default to statistics.mean and related functions
Type:	enhancement	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.9

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:		Nosy List:	Yoni Lavi, mark.dickinson, rhettinger, steven.daprano, taleinat, vstinner
Priority:	normal	Keywords:	patch

Created on 2019-12-19 03:06 by Yoni Lavi, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 17657	closed	Yoni Lavi, 2019-12-19 03:10

Messages (9)
msg358653 - (view)	Author: Yoni Lavi (Yoni Lavi) *	Date: 2019-12-19 03:06
I would like to put forward an argument in favour of a `default` parameter in the statistics.mean function and the related function. What motivated me to open this is that my code would more often than not include a check (or try-except) whenever I calculate a mean and add a default/sentinel value, and I felt that there should be a better way. Please also note that we have a precedent for this in a similar parameter added to min & max in 3.4 (https://bugs.python.org/issue18111)
msg358658 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2019-12-19 06:55
I vote -1. We don't have defaults for stdev() or median() or mode(). And it isn't clear what one would use for a meaningful default value in most cases. Also, I'm not seeing anything like this in Pandas, Excel, etc. So, I recommend keeping the current simple and clean APIs.
msg358659 - (view)	Author: Tal Einat (taleinat) *	Date: 2019-12-19 07:49
It seems to me that this would follow the same argument as in issue #18111: The real issue is that there's no good way to check if an arbitrary iterable is empty, unlike with sequences. Currently, callers need to wrap with try/except to handle empty iterators properly, or do non-trivial iterator "magic" to check whether the iterator is empty before passing it in. I've tried think of other solutions, such as a generic wrapper for such functions or a helper to check whether an iterable is empty, and they all turn out to be very clunky to use and un-Pythonic. Since we provide first-class support for iterators, and many builtins return iterators, giving the tools to handle the case where they are empty elegantly and simply seems prudent.
msg358671 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2019-12-19 10:54
What would the proposal look like for `statistics.stdev`? There you need at least two data points to compute a result, and a user might want to do different things for an empty dataset versus a single data point.
msg358672 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-12-19 11:12
> I've tried think of other solutions, such as a generic wrapper for such functions or a helper to check whether an iterable is empty, and they all turn out to be very clunky to use and un-Pythonic. So the main use case would be to detect an empty iterable in an efficient fashion? Something like the following code? sentinel = objet() avg = mean(data, default=sentinel) if avg is sentinel: ... # special code path Why not adding a statistics.StatisticsError subclass for empty set (ex: StatisticsEmptyError)? Something like: try: avg = mean(data) except statistics.StatisticsEmptyError: ... # special code path, ex: avg = default Or is there another use case for the proposed default parameter?
msg358674 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2019-12-19 11:21
TL;DR: I'm not likely to accept this feature request without at least one of (1) a practical use-case, (2) prior art in other statistics software, or (3) a strong mathematical justification for why this is meaningful and useful. I'm not categorically against this idea, but it seems a bit fishy to me. If you have no data, how do you know what default value to give that would be appropriate for your (non-existent) observations? It might help if you could show a real-life example of how, and why, you would use this, and how you would choose the default? Another possibility would be to find prior-art: another language, library or stats calculator which already offers this feature. Alternatively, a mathematical/statistical justification for a default. For example, the empty sum is normally taken as 0 and the empty product as 1. R returns either a NAN or NA for the empty mean (depending on precisely how you calculate it). While I'm personally sympathetic to the nuisance factor of having to wrap code in try...except blocks (my personal preference would have been for mean to return NAN on empty input) I think you will need to make a stronger case than just the analogy with min and max.
msg358696 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2019-12-20 05:48
Thought experiment: Suppose someone proposed, "math.log(x) should take an optional default argument because it is inconvenient to a catch a ValueError if the input is non-positive". Or more generally, what if someone proposed, "every function in Python that can raise a ValueError should offer a default argument." One could imagine a use case for both of these proposals but that doesn't mean that the API extensions would be warranted. Also, ISTM the analogy to min() and max() is imperfect. Those aren't descriptive statistics. For min() and max() we can know a priori that a probability is never lower than 0.0 or greater than 1.0 for example. Lastly, in common cases where the input is a sequence (rather than just an iterator), we already have a ternary operator to does the job nicely: central_value = mean(data) if data else 'unknown' For the less common case, a try/except is not an undue burden; after all, it is a basic core language feature.
msg358700 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2019-12-20 10:02
I agree with Raymond's comments, except that because I'm sometimes a bit of a pedant, I have to make one minor correction: max and min can be descriptive statistics. The sample minimum is the 1st order statistic, and the sample maximum is the N-th order statistic: https://www2.stat.duke.edu/courses/Spring12/sta104.1/Lectures/Lec15.pdf This doesn't invalidate the rest of what Raymond says. Yoni Lavi, thank you for the suggestion, but I'm going to close this ticket. If you think you have a really strong argument for the feature, please feel free to make it here, and we will rethink the closure. But I don't want to give you false hope: it would have to be a very strong argument.
msg358708 - (view)	Author: Yoni Lavi (Yoni Lavi) *	Date: 2019-12-20 13:32
Thanks for the good feedback everyone and apologies for the unresponsiveness over the past day. I understand that my use cases may not reflect wider usage patterns and am not looking to argue against the closing. But anyway, for future reference, I'll add two real-life usage examples, which I should have included originally (again, apologies for the delay, things have been hectic). The context is that I'm involved in running a coding bootcamp, and these are two recent cases when I needed a default of zero recently: 1. (Separately of the final grade calculations) We are interested in students' average grades on their projects as an indicator of their skills gained and their striving for excellence. When calculating this indicator, we use an average of 0 for a student who haven't yet submitted anything. 2. When providing tutoring support, we classify the "complexity" of each student issue, and then one of our indicators involves the average complexity of questions in a particular slice of time and the programme (this is particularly interesting around changes to the content). For this as well, a slice of time/programme/tutor during which there were no issues would be considered as having a complexity of 0. Again, not disputing the decision to close, just adding these examples for future reference. Thanks

History
Date	User	Action	Args
2022-04-11 14:59:24	admin	set	github: 83275
2019-12-20 13:32:13	Yoni Lavi	set	messages: + msg358708
2019-12-20 10:02:55	steven.daprano	set	status: open -> closed resolution: rejected messages: + msg358700 stage: patch review -> resolved
2019-12-20 05:48:45	rhettinger	set	messages: + msg358696
2019-12-19 11:21:33	steven.daprano	set	messages: + msg358674
2019-12-19 11:12:06	vstinner	set	nosy: + vstinner messages: + msg358672
2019-12-19 10:54:46	mark.dickinson	set	nosy: + mark.dickinson messages: + msg358671
2019-12-19 07:49:18	taleinat	set	messages: + msg358659
2019-12-19 06:55:44	rhettinger	set	messages: + msg358658
2019-12-19 03:10:37	Yoni Lavi	set	keywords: + patch stage: patch review pull_requests: + pull_request17124
2019-12-19 03:07:43	xtreak	set	nosy: + rhettinger, taleinat, steven.daprano
2019-12-19 03:06:41	Yoni Lavi	create