This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author rhettinger
Recipients mark.dickinson, rhettinger, steven.daprano
Date 2019-04-08.07:42:32
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1554709353.07.0.925020172524.issue36546@roundup.psfhosted.org>
In-reply-to
Content
Thanks for taking a detailed look.  I'll explore the links you provided shortly.

The API is designed to be extendable so that we don't get trapped by the choice of computation method.  If needed, any or all of the following extensions are possible without breaking backward compatibility:

  quantiles(data, n=4, already_sorted=True) # Skip resorting
  quantiles(data, cut_points=[0.02, 0.25, 0.50, 0.75, 0.98]) # box-and-whiskers
  quantiles(data, interp_method='nearest') # also: "low", "high", "midpoint"
  quantiles(data, inclusive=True)    # For description of a complete population

The default approach used in the PR matches what is used by MS Excel's PERCENTILE.EXC function¹.  That has several virtues. It is easy to explain.  It allows two unequal sized datasets to be compared (perhaps with a QQ plot) to explore whether they are drawn from the same distribution.  For sampled data, the quantiles tend to remain stable as more samples are added.  For samples from a known distribution (i.e normal variates), it tends to give the same results as ihv_cdf():

    >>> iq = NormalDist(100, 15)
    >>> cohort = iq.samples(10_000)
    >>> for ref, est in zip(quantiles(iq, n=10), quantiles(cohort, n=10)):
    ...     print(f'{ref:5.1f}\t{est:5.1f}')
    ...
     80.8	 81.0
     87.4	 87.8
     92.1	 92.3
     96.2	 96.3
    100.0	100.1
    103.8	104.0
    107.9	108.0
    112.6	112.9
    119.2	119.3

My thought was to start with something like this and only add options if they get requested (the most likely request is an inclusive=True option to emulate MS Excel's PERCENTILE.INC).  

If we need to leave the exact method unguaranteed, that's fine.  But I think it would be better to guarantee the match to PERCENTILE.EXC and then handle other requests through API extensions rather than revisions.


¹ https://exceljet.net/excel-functions/excel-percentile.exc-function
History
Date User Action Args
2019-04-08 07:42:33rhettingersetrecipients: + rhettinger, mark.dickinson, steven.daprano
2019-04-08 07:42:33rhettingersetmessageid: <1554709353.07.0.925020172524.issue36546@roundup.psfhosted.org>
2019-04-08 07:42:33rhettingerlinkissue36546 messages
2019-04-08 07:42:32rhettingercreate