Message 331293 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	kellerfuchs
Recipients	kellerfuchs, mark.dickinson, rhettinger, steven.daprano, tim.peters
Date	2018-12-07.11:25:42
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1544181942.88.0.788709270274.issue35431@psf.upfronthosting.co.za>
In-reply-to

Content
@Raymond Hettinger: > it's worth considering whether the function should be called "binomial", "choose", "combinations", or "comb" Here goes the bike shed! :) Kidding, this is a pretty important ergonomics/discoverability concern, and I hadn't given it much thought yet. I'd rather not call it comb, because it collides with a completely-unrelated English word, and it's not obvious what it stands for unless one already knows. > The word "combinations" fits nicely with the related counting functions, "combinations, permutations, and factorial". That's a good point; math.permutations doesn't exist, but itertools.permutations does, so I would expect (by analogy) a “combinations” functions to produce all possible k-sized subsets (rather than counting them), and that's exactly what itertools.combinations does. On the other hand, combinations and permutations are names that make it perhaps more obvious what is being counted, so perhaps math.{combinations,permutations} should be aliases for math.{binomial,factorial} ? Is the name collision with itertools a problem? TL;DR: “binomial” is more consistent with the current naming in math and itertools, but perhaps it makes sense to introduce math.{combination,permutations} as aliases? (Note that I used math.binomial as the name in the PR so far, but that's only because I needed a name, not because I consider the discussion ended.) @Mark Dickinson: > The current factorial implementation is significantly optimised, and using it directly may be faster than using an iterative solution. Yes, I avoided pushing specifically for a given algorithm (rather than getting upfront agreement on the functional behaviour) because the performance characteristics will likely be quite different once implemented in C in the math module. (Unless I'm mistaken and there is a way to add pure-Python functions to the math module?) > `binomial(n, k)` for `k > n` should either return 0 or be a `ValueError`, but which? From a mathematical standpoint, (n choose k) is defined for all non-negative k, n, with (n chooze k) = 0 when k>n or k=0. It's necessary behaviour for the usual equations to hold (like (n + 1 choose k + 1) = (n choose k) + (n choose k + 1)). As such, I'd argue that returning 0 is both more likely to be the thing the user wants (as in, it's necessary behaviour for combinatorics) and easier to reason about. Negative k and n, on the other hand, should clearly be a ValueError, and so should non-integers inputs; this is consistent with factorial's behaviour. I started a pull request and (for now) only added tests which document that (proposed) behaviour, so we can more easily discuss whether that's what we want. > Note that this needs fixing with both of the code snippets shown so far: they both return 1 for k > n. Yes :) I noticed last night, as I wrote Hypothesis tests for the snippet, but didn't think it was super important to send an updated version.

@Raymond Hettinger:
> it's worth considering whether the function should be called "binomial", "choose", "combinations", or "comb"

Here goes the bike shed!  :)
Kidding, this is a pretty important ergonomics/discoverability concern, and I hadn't given it much thought yet.

I'd rather not call it comb, because it collides with a completely-unrelated English word, and it's not obvious what it stands for unless one already knows.


> The word "combinations" fits nicely with the related counting functions, "combinations, permutations, and factorial".

That's a good point; math.permutations doesn't exist, but itertools.permutations does, so I would expect (by analogy) a “combinations” functions to produce all possible k-sized subsets (rather than counting them), and that's exactly what itertools.combinations does.

On the other hand, combinations and permutations are names that make it perhaps more obvious what is being counted, so perhaps math.{combinations,permutations} should be aliases for math.{binomial,factorial} ?  Is the name collision with itertools a problem?

TL;DR: “binomial” is more consistent with the current naming in math and itertools, but perhaps it makes sense to introduce math.{combination,permutations} as aliases?

(Note that I used math.binomial as the name in the PR so far, but that's only because I needed a name, not because I consider the discussion ended.)


@Mark Dickinson:
> The current factorial implementation is significantly optimised, and using it directly may be faster than using an iterative solution.

Yes, I avoided pushing specifically for a given algorithm (rather than getting upfront agreement on the functional behaviour) because the performance characteristics will likely be quite different once implemented in C in the math module.

(Unless I'm mistaken and there is a way to add pure-Python functions to the math module?)


> `binomial(n, k)` for `k > n` should either return 0 or be a `ValueError`, but which?

From a mathematical standpoint, (n choose k) is defined for all non-negative k, n, with (n chooze k) = 0 when k>n or k=0.

It's necessary behaviour for the usual equations to hold (like (n + 1 choose k + 1) = (n choose k) + (n choose k + 1)).
As such, I'd argue that returning 0 is both more likely to be the thing the user wants (as in, it's necessary behaviour for combinatorics) and easier to reason about.

Negative k and n, on the other hand, should clearly be a ValueError, and so should non-integers inputs; this is consistent with factorial's behaviour.

I started a pull request and (for now) only added tests which document that (proposed) behaviour, so we can more easily discuss whether that's what we want.


> Note that this needs fixing with both of the code snippets shown so far: they both return 1 for k > n.

Yes :)
I noticed last night, as I wrote Hypothesis tests for the snippet, but didn't think it was super important to send an updated version.

History
Date	User	Action	Args
2018-12-07 11:25:42	kellerfuchs	set	recipients: + kellerfuchs, tim.peters, rhettinger, mark.dickinson, steven.daprano
2018-12-07 11:25:42	kellerfuchs	set	messageid: <1544181942.88.0.788709270274.issue35431@psf.upfronthosting.co.za>
2018-12-07 11:25:42	kellerfuchs	link	issue35431 messages
2018-12-07 11:25:42	kellerfuchs	create