Message 327281 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	steven.daprano
Recipients	dcasmr, eric.smith, maheshwark97, mark.dickinson, steven.daprano
Date	2018-10-07.14:35:30
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1538922930.86.0.545547206417.issue33084@psf.upfronthosting.co.za>
In-reply-to

Content
I want to revisit this for 3.8. I agree that the current implementation-dependent behaviour when there are NANs in the data is troublesome. But I don't think that there is a single right answer. I also agree with Mark that if we change median, we ought to change the other functions so that people can get consistent behaviour. It wouldn't be good for median to ignore NANs and mean to process them. I'm inclined to add a parameter to the statistics functions to deal with NANs, that allow the caller to select from: - implementation-dependent, i.e. what happens now; (for speed, and backwards compatibility, this would be the default) - raise an exception; - return a NAN; - skip any NANs (treat them as missing values to be ignored). I think that raise/return/ignore will cover most use-cases for NANs, and the default will be suitable for the "easy cases" where there are no NANs, without paying any performance penalty if you already know your data has no NANs. Thoughts? I'm especially looking for ideas on what to call the first option.

I want to revisit this for 3.8.

I agree that the current implementation-dependent behaviour when there are NANs in the data is troublesome. But I don't think that there is a single right answer.

I also agree with Mark that if we change median, we ought to change the other functions so that people can get consistent behaviour. It wouldn't be good for median to ignore NANs and mean to process them.

I'm inclined to add a parameter to the statistics functions to deal with NANs, that allow the caller to select from:

- implementation-dependent, i.e. what happens now;
  (for speed, and backwards compatibility, this would be the default)

- raise an exception;

- return a NAN;

- skip any NANs (treat them as missing values to be ignored).

I think that raise/return/ignore will cover most use-cases for NANs, and the default will be suitable for the "easy cases" where there are no NANs, without paying any performance penalty if you already know your data has no NANs.

Thoughts?

I'm especially looking for ideas on what to call the first option.

History
Date	User	Action	Args
2018-10-07 14:35:30	steven.daprano	set	recipients: + steven.daprano, mark.dickinson, eric.smith, maheshwark97, dcasmr
2018-10-07 14:35:30	steven.daprano	set	messageid: <1538922930.86.0.545547206417.issue33084@psf.upfronthosting.co.za>
2018-10-07 14:35:30	steven.daprano	link	issue33084 messages
2018-10-07 14:35:30	steven.daprano	create