classification
Title: random.choices has unexpected behavior with negative weights
Type: behavior Stage: resolved
Components: Documentation Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Ted Whalen, aldwinaldwin, docs@python, mark.dickinson, rhettinger
Priority: low Keywords: patch

Created on 2019-07-18 20:51 by Ted Whalen, last changed 2019-07-19 09:27 by rhettinger. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 14854 closed aldwinaldwin, 2019-07-19 03:36
PR 14855 merged rhettinger, 2019-07-19 07:47
PR 14858 merged miss-islington, 2019-07-19 08:57
Messages (6)
msg348128 - (view) Author: Ted Whalen (Ted Whalen) Date: 2019-07-18 20:51
The behavior of random.choices when negative weights are provided is unexpected. While giving a negative weight for one value is probably bad, it's really unfortunate that providing a negative weight for one value affects the probability of selecting an adjacent value.

Throwing a ValueError exception when there was a use of negative weights was considered in #31689, but at that time, there wasn't an example provided that failed silently.

Note below that providing a weight of -1 for 'c' causes both 'c' and 'd' to drop out of the results.

Python 3.7.2 (default, Jan 13 2019, 12:50:01)
[Clang 10.0.0 (clang-1000.11.45.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from collections import Counter
>>> from random import choices
>>> Counter(choices("abcdefg", weights=(1,1,-1,1,1,1,1), k=10000))
Counter({'f': 2040, 'a': 2019, 'e': 2017, 'g': 2009, 'b': 1915})
msg348140 - (view) Author: Aldwin Pollefeyt (aldwinaldwin) * Date: 2019-07-19 03:18
This is what happens with your weights:

>>> list(itertools.accumulate(weights))
[1, 2, 1, 2, 3, 3, 4]

using bisect.bisect certain amount of times, will distribute on this:
a<1<b<2<c<1<d<2<e<3<f<4<g

random numbers between 1 and 2 will go to 1<'b'<2 and not 'd' that is also 1<'d'<2

As discussed in issue31689, if no negative weights should be used, then a ValueError exception should be re-concidered.
msg348144 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-07-19 06:51
We can add a note the the docs that weights are assumed to be non-negative, but I don't want to add an extra O(n) step to check for unusual inputs with undefined meaning -- that would just impair the normal use cases for near zero benefit.
msg348154 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-07-19 08:56
New changeset 8dbe563aa6bd18b8c581951537dfb406d36e8063 by Raymond Hettinger in branch 'master':
bpo-37624: Document weight assumptions for random.choices() (GH-14855)
https://github.com/python/cpython/commit/8dbe563aa6bd18b8c581951537dfb406d36e8063
msg348156 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-07-19 09:17
New changeset a50a6225a06e5a83ce2a880a7eb4496043fdbb55 by Raymond Hettinger (Miss Islington (bot)) in branch '3.8':
bpo-37624: Document weight assumptions for random.choices() (GH-14855) (GH-14858)
https://github.com/python/cpython/commit/a50a6225a06e5a83ce2a880a7eb4496043fdbb55
msg348158 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-07-19 09:27
Misc side notes:

* There is no expected behavior for negative a negative weight.  Arguably, the only reasonable interpretation is what it already does (reduce the cumulative total).

* Running simulations is a primary use case for choices().  Generally, running time there is important. 

* During the design phase, none of the other implementations studied had incorporated a scan of the inputs for negative weights.

*  bisect() does not check to make sure its inputs are sorted.  The results for unsorted data are undefined.  It is a documented precondition.
History
Date User Action Args
2019-07-19 09:27:50rhettingersetstatus: open -> closed
resolution: fixed
messages: + msg348158

stage: patch review -> resolved
2019-07-19 09:17:38rhettingersetmessages: + msg348156
2019-07-19 08:57:01miss-islingtonsetpull_requests: + pull_request14648
2019-07-19 08:56:56rhettingersetmessages: + msg348154
2019-07-19 07:47:33rhettingersetpull_requests: + pull_request14645
2019-07-19 06:51:49rhettingersetpriority: normal -> low

nosy: + docs@python
messages: + msg348144

assignee: docs@python
components: + Documentation, - Library (Lib)
2019-07-19 03:36:12aldwinaldwinsetkeywords: + patch
stage: patch review
pull_requests: + pull_request14644
2019-07-19 03:19:23xtreaksetnosy: + rhettinger, mark.dickinson
2019-07-19 03:18:54aldwinaldwinsetnosy: + aldwinaldwin
messages: + msg348140
2019-07-18 20:51:27Ted Whalencreate