This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author steven.daprano
Recipients sria91, steven.daprano, wolma
Date 2016-12-13.09:54:04
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <20161213095356.GB3365@ando.pearwood.info>
In-reply-to <CAN3Ck4Aw12W1iFaR3ddvak+UaNAewVyP0GgPKseBzvNCpDKFUQ@mail.gmail.com>
Content
On Tue, Dec 13, 2016 at 09:35:22AM +0000, Srikanth Anantharam wrote:
> 
> Srikanth Anantharam added the comment:
> 
> A better choice would be to return a tuple of values (sliced from the
> table). And let the user decide which one to use.

The current mode() function is designed for a very basic use-case, where 
you have an obvious single mode from discrete data.

The problem with dealing with multiple modes is that its not easy to 
tell the difference between a genuinely multi-modal sample and one which 
just happens to have a few samples with the same value:

data = [1, 2, 3, 4, 4, 4, 5, 6, 7, 7, 8, 8, 8, 8, 8, 8, 8, 9, 9]

Assuming the sampling is fair, 8 is clearly the mode; but is it bimodal 
with 4 the second mode? Or perhaps even four modes, 8, 4, 7 and 9?

I have plans for introducing a binning function to collect data into 
bins and run statistics on the bins. That might be a better way to deal 
with multi-modal samples: if you bin the data (for discrete data, use a 
bin size of 1) and then look at the frequencies, you can decide how many 
modes there are.

Thanks for the suggestion.
History
Date User Action Args
2016-12-13 09:54:04steven.dapranosetrecipients: + steven.daprano, wolma, sria91
2016-12-13 09:54:04steven.dapranolinkissue28956 messages
2016-12-13 09:54:04steven.dapranocreate