Author dcasmr
Recipients dcasmr, maheshwark97, mark.dickinson, steven.daprano
Date 2018-03-16.21:14:39
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1521234879.72.0.467229070634.issue33084@psf.upfronthosting.co.za>
In-reply-to
Content
If we are trying to fix this, the behavior should be like computing the mean or harmonic mean with the statistics library when there are missing values in the data.  At least that way, it is consistent with how the statistics library works when computing with NaNs in the data.  Then again, it should be mentioned somewhere in the docs.

import statistics as stats
import numpy as np
import pandas as pd
data = [75, 90,85, 92, 95, 80, np.nan]
stats.mean(data)
nan
stats.harmonic_mean(data)
nan
stats.stdev(data)
nan
As you can see, when there is a missing value, computing the mean, harmonic mean and sample standard deviation with the statistics library 
return a nan.
However, with the median, median_high and median_low, it computes those statistics incorrectly with the missing values present in the data.
It is better to return a nan, then let the user drop (or resolve) any missing values before computing.
## Another example using pandas serie
df = pd.DataFrame(data, columns=['data'])
df.head()
        data
0	75.0
1	90.0
2	85.0
3	92.0
4	95.0
5	80.0
6	NaN

### Use the statistics library to compute the median of the serie
stats.median(df1['data'])
90
 
## Pandas returns the correct median by dropping the missing values
## Now use pandas to compute the median of the serie with missing value
df['data'].median()
87.5

I did not test the median_grouped in statistics library, but will let you know afterwards if its affected as well.
History
Date User Action Args
2018-03-16 21:14:39dcasmrsetrecipients: + dcasmr, mark.dickinson, steven.daprano, maheshwark97
2018-03-16 21:14:39dcasmrsetmessageid: <1521234879.72.0.467229070634.issue33084@psf.upfronthosting.co.za>
2018-03-16 21:14:39dcasmrlinkissue33084 messages
2018-03-16 21:14:39dcasmrcreate