Message339596
Thanks for taking a detailed look. I'll explore the links you provided shortly.
The API is designed to be extendable so that we don't get trapped by the choice of computation method. If needed, any or all of the following extensions are possible without breaking backward compatibility:
quantiles(data, n=4, already_sorted=True) # Skip resorting
quantiles(data, cut_points=[0.02, 0.25, 0.50, 0.75, 0.98]) # box-and-whiskers
quantiles(data, interp_method='nearest') # also: "low", "high", "midpoint"
quantiles(data, inclusive=True) # For description of a complete population
The default approach used in the PR matches what is used by MS Excel's PERCENTILE.EXC function¹. That has several virtues. It is easy to explain. It allows two unequal sized datasets to be compared (perhaps with a QQ plot) to explore whether they are drawn from the same distribution. For sampled data, the quantiles tend to remain stable as more samples are added. For samples from a known distribution (i.e normal variates), it tends to give the same results as ihv_cdf():
>>> iq = NormalDist(100, 15)
>>> cohort = iq.samples(10_000)
>>> for ref, est in zip(quantiles(iq, n=10), quantiles(cohort, n=10)):
... print(f'{ref:5.1f}\t{est:5.1f}')
...
80.8 81.0
87.4 87.8
92.1 92.3
96.2 96.3
100.0 100.1
103.8 104.0
107.9 108.0
112.6 112.9
119.2 119.3
My thought was to start with something like this and only add options if they get requested (the most likely request is an inclusive=True option to emulate MS Excel's PERCENTILE.INC).
If we need to leave the exact method unguaranteed, that's fine. But I think it would be better to guarantee the match to PERCENTILE.EXC and then handle other requests through API extensions rather than revisions.
¹ https://exceljet.net/excel-functions/excel-percentile.exc-function |
|
Date |
User |
Action |
Args |
2019-04-08 07:42:33 | rhettinger | set | recipients:
+ rhettinger, mark.dickinson, steven.daprano |
2019-04-08 07:42:33 | rhettinger | set | messageid: <1554709353.07.0.925020172524.issue36546@roundup.psfhosted.org> |
2019-04-08 07:42:33 | rhettinger | link | issue36546 messages |
2019-04-08 07:42:32 | rhettinger | create | |
|