Message334027
On Fri, Jan 18, 2019 at 11:13:41PM +0000, Rémi Lapeyre wrote:
> Wouldn't be the 5-th percentile be select(data, round(len(data)/20)?
Oh if only it were that simple!
Using the method you suggest, the 50th percentile is not the same as the
median unless the length of the list is three more than a multiple of
four. It also runs into problems for small lists where the index rounds
down to zero.
Langford (2006) does a literature review and finds fifteen methods for
calculating the quartiles (Q1, Q2, Q3), of which twelve are distinct and
incompatible; Hyndman & Fan (1996) did similar for general quantiles and
came up with nine, of which seven match Langford's.
I know of at least six other methods, which gives a total of 20 distinct
ways of calculating quartiles or quantiles.
http://jse.amstat.org/v14n3/langford.html
https://robjhyndman.com/publications/quantiles/
I stress that these are not merely different algorithms which give the
same answer, but different methods which sometimes disagree on their
answers. So whichever method you use, some people are going to be
annoyed or confused or both.
http://mathforum.org/library/drmath/view/60969.html
Other statistics libraries provide a choice, e.g.:
- R and Octave provide the same 9 as H&F.
- Maple provides 6 of those, plus 2 others.
- Wessa provides 5 that match H&F, plus another 3.
- SAS provides 5.
- even Excel provides 2 different ways.
Statisticians don't even agree on which is the "best" method. H&F
recommend their method number 8. Langford recommends his method 4. I
think that your suggestion matches Langford's method 14, which is H&F's
method 3.
Selecting the i-th item from a list is the easy part. Turning that into
meaningful quantiles, percentiles etc is where it gets really hairy. My
favourite quote on this comes from J Nash on the Gnumeric mailing list:
Ultimately, this question boils down to where to cut to
divide 4 candies among 5 children. No matter what you do,
things get ugly. |
|
Date |
User |
Action |
Args |
2019-01-19 03:22:07 | steven.daprano | set | recipients:
+ steven.daprano, rhettinger, mark.dickinson, remi.lapeyre |
2019-01-19 03:22:05 | steven.daprano | link | issue35775 messages |
2019-01-19 03:22:05 | steven.daprano | create | |
|