Author steven.daprano
Recipients dcasmr, eric.smith, maheshwark97, mark.dickinson, steven.daprano
Date 2018-10-07.14:35:30
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1538922930.86.0.545547206417.issue33084@psf.upfronthosting.co.za>
In-reply-to
Content
I want to revisit this for 3.8.

I agree that the current implementation-dependent behaviour when there are NANs in the data is troublesome. But I don't think that there is a single right answer.

I also agree with Mark that if we change median, we ought to change the other functions so that people can get consistent behaviour. It wouldn't be good for median to ignore NANs and mean to process them.

I'm inclined to add a parameter to the statistics functions to deal with NANs, that allow the caller to select from:

- implementation-dependent, i.e. what happens now;
  (for speed, and backwards compatibility, this would be the default)

- raise an exception;

- return a NAN;

- skip any NANs (treat them as missing values to be ignored).

I think that raise/return/ignore will cover most use-cases for NANs, and the default will be suitable for the "easy cases" where there are no NANs, without paying any performance penalty if you already know your data has no NANs.

Thoughts?

I'm especially looking for ideas on what to call the first option.
History
Date User Action Args
2018-10-07 14:35:30steven.dapranosetrecipients: + steven.daprano, mark.dickinson, eric.smith, maheshwark97, dcasmr
2018-10-07 14:35:30steven.dapranosetmessageid: <1538922930.86.0.545547206417.issue33084@psf.upfronthosting.co.za>
2018-10-07 14:35:30steven.dapranolinkissue33084 messages
2018-10-07 14:35:30steven.dapranocreate