Message283090
On Tue, Dec 13, 2016 at 09:35:22AM +0000, Srikanth Anantharam wrote:
>
> Srikanth Anantharam added the comment:
>
> A better choice would be to return a tuple of values (sliced from the
> table). And let the user decide which one to use.
The current mode() function is designed for a very basic use-case, where
you have an obvious single mode from discrete data.
The problem with dealing with multiple modes is that its not easy to
tell the difference between a genuinely multi-modal sample and one which
just happens to have a few samples with the same value:
data = [1, 2, 3, 4, 4, 4, 5, 6, 7, 7, 8, 8, 8, 8, 8, 8, 8, 9, 9]
Assuming the sampling is fair, 8 is clearly the mode; but is it bimodal
with 4 the second mode? Or perhaps even four modes, 8, 4, 7 and 9?
I have plans for introducing a binning function to collect data into
bins and run statistics on the bins. That might be a better way to deal
with multi-modal samples: if you bin the data (for discrete data, use a
bin size of 1) and then look at the frequencies, you can decide how many
modes there are.
Thanks for the suggestion. |
|
Date |
User |
Action |
Args |
2016-12-13 09:54:04 | steven.daprano | set | recipients:
+ steven.daprano, wolma, sria91 |
2016-12-13 09:54:04 | steven.daprano | link | issue28956 messages |
2016-12-13 09:54:04 | steven.daprano | create | |
|