Author josh.r
Recipients jfine2358, josh.r, remi.lapeyre, vstinner
Date 2019-01-09.18:05:03
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1547057104.04.0.306130813383.issue35698@roundup.psfhosted.org>
In-reply-to
Content
vstinner: The problem isn't the averaging, it's the type inconsistency. In both examples (median([1]), median([1, 1])), the median is unambiguously 1 (no actual average is needed; the values are identical), yet it gets converted to 1.0 only in the latter case.

I'm not sure it's possible to fix this though; right now, there is consistency among two cases:

1. When the length is odd, you get the median by identity (and therefore type and value are unchanged)
2. When the length is even, you get the median by adding and dividing by 2 (so for ints, the result is always float).

A fix that changed that would add yet another layer of complexity:

1. When the length is odd, you get the median by identity (and therefore type and value are unchanged)
2. When the length is even, 
  a. If the two middle values are equal (possibly only if they have equal types as well, to resolve the issue with [1, 1.0] or [1, True]), return the first of the two middle values (median by identity as in #1)
  b. Otherwise, you get the median by adding and dividing by 2

And note the required type checking in 2a required to even make it that consistent. Even if we accepted that, we'd pretty quickly get into a debate over whether median([3, 5]) should try to return 4 instead of 4.0, given that the median is representable in the source type (which would further damage consistency).

If anything, I think the best design would have been to *always* include a division step (so odd length cases performed middle_elem / 1, while even did (middle_elem1 + middle_elem2) / 2) so the behavior was consistent regardless odd vs. even input length, but that shipped has probably sailed, given the documented behavior specifically notes that the precise middle data point is itself returned for the odd case.

I think the solution for people concerned is to explicitly convert int values to be median-ed to fractions.Fraction (or decimal.Decimal) ahead of time, so floating point math never gets involved, and the return type is consistent regardless of length.
History
Date User Action Args
2019-01-09 18:05:06josh.rsetrecipients: + josh.r, vstinner, remi.lapeyre, jfine2358
2019-01-09 18:05:04josh.rsetmessageid: <1547057104.04.0.306130813383.issue35698@roundup.psfhosted.org>
2019-01-09 18:05:04josh.rlinkissue35698 messages
2019-01-09 18:05:03josh.rcreate