Message 313956 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	dcasmr
Recipients	dcasmr, maheshwark97, mark.dickinson, steven.daprano
Date	2018-03-16.16:45:41
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1521218741.48.0.467229070634.issue33084@psf.upfronthosting.co.za>
In-reply-to

Content
Just to make sure we are focused on the issue, the reported bug is with the statistics library (not with numpy). It happens, when there is at least one missing value in the data and involves the computation of the median, median_low and median_high using the statistics library. The test was performed on Python 3.6.4. When there is no missing values (NaNs) in the data, computing the median, median_high and median_low from the statistics library work fine. So, yes, removing the NaNs (or imputing for them) before computing the median(s) resolve the issue. Also, just like statistics.mean(data) when data has missing return a nan, the median, median_high and median_low should behave the same way. import numpy import statistics as stats data = [75, 90,85, 92, 95, 80, np.nan] Median = stats.median(data) Median_high = stats.median_high(data) Median_low = stats.median_low(data) print("The incorrect Median is", Median) The incorrect Median is, 90 print("The incorrect median high is", Median_high) The incorrect median high is, 90 print("The incorrect median low is", Median_low) The incorrect median low is, 90 ## Mean returns nan Mean = stats.mean(data) prin("The mean is", Mean) The mean is, nan Now, when we drop the missing values, we have: data2 = [75, 90,85, 92, 95, 80] stats.median(data2) 87.5 stats.median_high(data2) 90 stats.median_low(data2) 85

Just to make sure we are focused on the issue, the reported bug is with the statistics library (not with numpy). It happens, when there is at least one missing value in the data and involves the computation of the median, median_low and median_high using the statistics library.
The test was performed on Python 3.6.4.

When there is no missing values (NaNs) in the data, computing the median, median_high and median_low from the statistics library work fine.
So, yes, removing the NaNs (or imputing for them) before computing the median(s) resolve the issue.
Also, just like statistics.mean(data) when data has missing return a nan, the median, median_high and median_low should  behave the same way.

import numpy
import statistics as stats

data = [75, 90,85, 92, 95, 80, np.nan]

Median = stats.median(data) 
Median_high = stats.median_high(data)
Median_low = stats.median_low(data)
print("The incorrect Median is", Median)
The incorrect Median is, 90
print("The incorrect median high is", Median_high)
The incorrect median high is, 90
print("The incorrect median low is", Median_low)
The incorrect median low is, 90

## Mean returns nan
Mean = stats.mean(data)
prin("The mean is", Mean)
The mean is, nan

Now, when we drop the missing values, we have:
data2 = [75, 90,85, 92, 95, 80]
stats.median(data2)
87.5
stats.median_high(data2)
90
stats.median_low(data2)
85

History
Date	User	Action	Args
2018-03-16 16:45:41	dcasmr	set	recipients: + dcasmr, mark.dickinson, steven.daprano, maheshwark97
2018-03-16 16:45:41	dcasmr	set	messageid: <1521218741.48.0.467229070634.issue33084@psf.upfronthosting.co.za>
2018-03-16 16:45:41	dcasmr	link	issue33084 messages
2018-03-16 16:45:41	dcasmr	create