Author rhettinger
Recipients Windson Yang, francismb, rhettinger, steven.daprano
Date 2019-02-17.19:27:53
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1550431673.6.0.733016047475.issue35892@roundup.psfhosted.org>
In-reply-to
Content
The attraction to having a first_tie=False flag is that it is backwards compatible with the current API.

However on further reflection, people would almost never want the existing behavior of raising an exception rather than returning a useful result.

So, we could make first_tie the default behavior and not have a flag at all.  We would also add a separate multimode() function with a clean signature, always returning a list.  FWIW, MS Excel evolved to this solution as well (it has MODE.SGNL and MODE.MULT).

This would give us a clean, fast API with no flags:

    mode(Iterable) -> scalar
    multimode(Iterable) -> list  

ISTM this is an all round win:

* Fixes the severe usability problems with mode().  Currently, it can't be used without a try/except. The except-clause can't distinguish between an empty data condition and no unique mode condition, nor can the except clause access the dataset counts to get to a useful answer.

* It makes mode() memory friendly and faster.  Currently, mode() has to convert the counter items into a list() and run a full sort.  However, if we only need the first mode, we can feed the items iterator to max() and make only a single pass, never having to create a list.

However, there would be an incompatibility in the API change.  It is possible that a person really does want an exception instead of a value when there are multiple modes.  In that case, this would break their code so that they have to switch to multimode() as a remediation.  OTOH, it is also possible that people have existing code that uses mode() but their code has a latent bug because they weren't using a try/except and haven't yet encountered a multi-modal dataset. 

So here are some options:

* Change mode() to return the first tie instead of raising an exception.  This is a behavior change but leaves you with the cleanest API going forward.

* Add a Deprecation warning to the current behavior of mode() when it finds multimodal data.  Then change the behavior in Python 3.9.  This is uncomfortable for one release, but does get us to the cleanest API and best performance.

* Change mode() to have a flag where first_tie defaults to False.  This is backwards compatible, but it doesn't fix latent bugs, it leaves us with on-going API complexities, and the default case remains the least usable option.

* Leave mode() as-is and just document its limitations.

For any of those options, we should still add a separate multimode() function.
History
Date User Action Args
2019-02-17 19:27:53rhettingersetrecipients: + rhettinger, steven.daprano, francismb, Windson Yang
2019-02-17 19:27:53rhettingersetmessageid: <1550431673.6.0.733016047475.issue35892@roundup.psfhosted.org>
2019-02-17 19:27:53rhettingerlinkissue35892 messages
2019-02-17 19:27:53rhettingercreate