Author dcasmr
Recipients dcasmr, maheshwark97, mark.dickinson, steven.daprano
Date 2018-03-16.16:45:41
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1521218741.48.0.467229070634.issue33084@psf.upfronthosting.co.za>
In-reply-to
Content
Just to make sure we are focused on the issue, the reported bug is with the statistics library (not with numpy). It happens, when there is at least one missing value in the data and involves the computation of the median, median_low and median_high using the statistics library.
The test was performed on Python 3.6.4.

When there is no missing values (NaNs) in the data, computing the median, median_high and median_low from the statistics library work fine.
So, yes, removing the NaNs (or imputing for them) before computing the median(s) resolve the issue.
Also, just like statistics.mean(data) when data has missing return a nan, the median, median_high and median_low should  behave the same way.

import numpy
import statistics as stats

data = [75, 90,85, 92, 95, 80, np.nan]

Median = stats.median(data) 
Median_high = stats.median_high(data)
Median_low = stats.median_low(data)
print("The incorrect Median is", Median)
The incorrect Median is, 90
print("The incorrect median high is", Median_high)
The incorrect median high is, 90
print("The incorrect median low is", Median_low)
The incorrect median low is, 90

## Mean returns nan
Mean = stats.mean(data)
prin("The mean is", Mean)
The mean is, nan

Now, when we drop the missing values, we have:
data2 = [75, 90,85, 92, 95, 80]
stats.median(data2)
87.5
stats.median_high(data2)
90
stats.median_low(data2)
85
History
Date User Action Args
2018-03-16 16:45:41dcasmrsetrecipients: + dcasmr, mark.dickinson, steven.daprano, maheshwark97
2018-03-16 16:45:41dcasmrsetmessageid: <1521218741.48.0.467229070634.issue33084@psf.upfronthosting.co.za>
2018-03-16 16:45:41dcasmrlinkissue33084 messages
2018-03-16 16:45:41dcasmrcreate