New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
random.choices ought to check that cumulative weights are in ascending order #77675
Comments
As mentioned on the Python-List: https://mail.python.org/pipermail/python-list/2018-May/733061.html random.choices() silently returns the wrong values when cumulative weights are not given, i.e. if the user misreads the documentation and provides non-cumulative weights, expecting that cumulative weights will be constructed for them. I think that the documentation should be more clear, and preferably the choices() function ought to fail early when given invalid weights. |
Supplying cum_weights allows the code to use bisection to locate the correct value to return. This is O(log n), and is significantly faster for large populations than supplying weights (which need to be totalled for the calculation). Requiring a pre-check on cum_weights (for example, the obvious check that the sequence is nondecreasing) would add an O(n) step, and so significantly impact performance for that case. |
On Mon, May 14, 2018 at 11:54:32AM +0000, Paul Moore wrote:
You may very well be right, but we should at least think about ways to If an ahead-of-time check is too slow, can we make it just-in-time? |
This will just add an overhead for all correct use of bisect. We are adult here. If the user want to break bisect or random.choices, let him to do this. |
I concur with Serhiy. The proposed check would cripple the intended use of cumulative weights which is provided as a way to avoid the cost of rebuilding the weights on every call. FWIW, bisect() has a long history of requiring sorted inputs and has a similar possibility of misuse, but it too would cripple its core use case if it were to have to validate that the user input was sorted. I'm open to a docs change but am dubious that that would have helped the OP on python-ideas. The docs already have a specific example showing the relationship between weights and cum_weights, as well as indicating that the purpose of cum_weights is to save the work of computing the weights accumulation. There is an additional example in the recipes section. |
btw, this was my suggestion. Steven opened the issue on my behalf (I'm new).
The following sentence be added: [BTW, in the current documentation, when I read this sentence:
thanks |
oops, if "k=400" |
or, for a minimal doc change, change this sentence: to: |
Sorry, that doesn't seem like an improvement at all to me. Adding "call" just makes the sentence read awkwardly. Also, this week I did some user testing on the existing docs and didn't find a single case of misreading what "cumulative weights" meant. I'm marking this as closed. The suggestion was appreciated but adding additional input checks would defeat the entire purpose of the feature. The user testing suggest that the docs are okay as-is (and there are additional examples in the recipes section below). |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: