unexpected behaviour of random.choices with zero weights #83062

izaromanowska · 2019-11-21T17:35:23Z

BPO	38881
Nosy	@tim-one, @rhettinger, @mdickinson, @tirkarthi, @izaromanowska
PRs	bpo-38881: choices() raises ValueError when all weights are zero #17362

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/rhettinger'
closed_at = <Date 2019-11-23.10:23:11.502>
created_at = <Date 2019-11-21.17:35:22.751>
labels = ['type-bug', '3.9', 'docs']
title = 'unexpected behaviour of random.choices with zero weights'
updated_at = <Date 2019-11-25.10:40:56.684>
user = 'https://github.com/izaromanowska'

bugs.python.org fields:

activity = <Date 2019-11-25.10:40:56.684>
actor = 'IRomanowska'
assignee = 'rhettinger'
closed = True
closed_date = <Date 2019-11-23.10:23:11.502>
closer = 'rhettinger'
components = ['Documentation']
creation = <Date 2019-11-21.17:35:22.751>
creator = 'IRomanowska'
dependencies = []
files = []
hgrepos = []
issue_num = 38881
keywords = ['patch']
message_count = 12.0
messages = ['357185', '357242', '357243', '357257', '357259', '357265', '357267', '357358', '357370', '357372', '357373', '357436']
nosy_count = 5.0
nosy_names = ['tim.peters', 'rhettinger', 'mark.dickinson', 'xtreak', 'IRomanowska']
pr_nums = ['17362']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue38881'
versions = ['Python 3.9']

izaromanowska · 2019-11-21T17:35:22Z

Hi,
When zero weights are given, the last element of a sequence is always chosen. Example:

hits= []
for i in range(100):
    hits.append(random.choices(["A","B","C","D"], [0, 0, 0, 0])[0])
print (set(hits))

> {'D'}

I guess that most users would expect that in case of zero weights it will default into a random.choice behaviour and select one option at random since this is what happens in cases when all weights are equal. Alternatively, it should return an empty array if the assumption was that all choices have a zero probability of being selected. Either way, if it is consistently choosing one option, this may be potentially difficult to spot in situations when a sequence of weights all equal to zero only happen sporadically.

rhettinger · 2019-11-22T04:25:29Z

When zero weights are given, the last element of a sequence
is always chosen.

Given non-sensical input, that behavior is as reasonable as any other (fwiw, the same is also observed with all negative weights, even if the negative weights are unequal).

The documentation currently says, "weights are assumed to be non-negative." Perhaps it should say, "weights are assumed to be non-negative and have at least one positive weight."

rhettinger · 2019-11-22T04:37:23Z

Alternatively, we could raise an exception if the weight total isn't positive. Am not sure that has any real worth, but it could be done with only a small additional cost to the critical path.

izaromanowska · 2019-11-22T09:15:04Z

Dear Raymond,

I understand that passing all zero weights may look nonsensical but random.choices is an implementation of the roulette wheel which is widely used across different scientific disciplines and the situation of passing all zeros is completely plausible.

In genetics:
A genome may consist of a set of genes none of which increases fitness thus their relative probability of being copied over other genes is all zero.

In political sciences or cultural evolution:
A voter may hate all parties (ie. their individual preference for any one party is zero). An agent may happen to have no preference for either of the options.

In engineering:
All solutions may carry zero increase in performance.

You are absolutely right that negative weights make no sense (how can you choose option A with a -10% chance. But a 0% chance is entirely possible.

I consulted with colleagues working in other languages and it looks that the default for roulette wheel with zero weights is choosing at random.
This should probably be consulted with a mathematician who knows the definition of the algorithm.

tirkarthi · 2019-11-22T09:28:38Z

IIUC, there was a similar discussion on negative weights and all weights being zero to raise ValueError at https://bugs.python.org/issue18844#msg196252

rhettinger · 2019-11-22T10:35:37Z

xtreak, the other discussion isn't similar at all. The OP is proposing that weights all equal to zero be a well-defined way to specify an equiprobable selection. That is a completely new proposal which is easy to implement, but doesn't make much sense to me.

The current concept is that the weights express an odds ratio where a zero weight means that an event has no chance of being selected. This view implies that if all weights are zero, the result is undefined (or an error).

Iza, it seems to me that the provided examples are conflating a zero incremental preference with a zero chance of occurrence.

izaromanowska · 2019-11-22T11:18:30Z

Hi,

Many thanks for engaging with this.
I agree that we should very clearly separate negative weights from zero weights. A negative number is illegal and that's the end of it. However, a zero weight is not illegal, e.g., [0, 0, 0, 0.1] is a legal sequence to pass as weight.

Raymond, I agree with you that this is conflating incremental preference with zero chance of occurring. From a standard user perspective, if the [0, 0, 0, 0.1] sequence is passed as weights the first three options have a zero probability of selection thus that interpretation (even if in your opinion erroneous) is very likely to happen for most of the users.

I think we all agree that an output that always chooses the last element of the sequence is not ok. We differ in opinion as to what should happen instead: raising an error or returning a value at random. My arguments for the latter are:

this seems to be the standard for other programming languages (I've checked for R and NetLogo but this should be confirmed by others);
a weight sequence [1, 1, 1, 1] is equivalent to [10, 10, 10, 10] so if we don't want to make [0, 0, 0, 0] 'a special case' it should give the same behaviour (equal probability);
when a weight sequence is not provided (i.e., there are no odds given) a random selection is made. One can argue that the odds [,,,,] are similar to [0, 0, 0, 0 ]. Perhaps the zero weights option could be pushed into the if-loop of no weights?

I see the logic of the second solution, i.e., raising an error. It may make it more difficult to catch the issue for those doing simulations but at least it's not giving a wrong result.

As mentioned this is a key algorithm for many scientific applications with predominantly non-computer science users like myself. So please do take into consideration that it will be often used naively.

Many thanks.

tim-one · 2019-11-23T02:54:54Z

There are a number of "obvious" properties that should obtain, given the intended meaning of weights:

An input with weight 0 should never be returned.
The distribution shouldn't be affected by adding a new input with weight 0.
The distribution shouldn't be affected by removing an input with weight 0.

Especially because of the 3rd, the only sensible thing to do is to treat an input with weights all 0 much like an empty population - although, in context, ValueError would make more immediate sense for "all weights are 0" than the IndexError raised for an empty population.

Anything other than that is "too clever" by half, and I just don't believe would be of real use often enough to be worth violating the "obvious" properties above.

mdickinson · 2019-11-23T09:56:18Z

Either raising, or treating a zero-weight-sum as undefined behaviour and documenting that the sum of the weights should be positive sounds fine to me.

-1 on the suggestion to (deliberately, by documented design) choose at random in this case. Mathematically, this situation doesn't make sense: as Tim said, it's analogous to choosing from an empty population.

Ex: you have nred red balls and nblue blue balls in a bag. If you want to simulate drawing a single ball from the bag, then

   random.choices(["red", "blue"], [nred, nblue])

does the job. But in the case where nred = nblue = 0, the bag is empty and it's not possible to draw anything; in that case I'd expect an error. I definitely wouldn't expect to get either a red ball or a blue ball (with equal probability).

rhettinger · 2019-11-23T10:22:23Z

New changeset 041d8b4 by Raymond Hettinger in branch 'master':
bpo-38881: choices() raises ValueError when all weights are zero (GH-17362)
041d8b4

rhettinger · 2019-11-23T10:23:11Z

Thanks for the suggestions.

izaromanowska · 2019-11-25T10:40:57Z

Many thanks for patching it!
Much appreciated.

izaromanowska mannequin added stdlib Python modules in the Lib dir 3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error labels Nov 21, 2019

rhettinger added docs Documentation in the Doc dir 3.8 only security fixes 3.9 only security fixes and removed stdlib Python modules in the Lib dir labels Nov 22, 2019

rhettinger self-assigned this Nov 22, 2019

rhettinger removed 3.7 (EOL) end of life 3.8 only security fixes labels Nov 23, 2019

rhettinger closed this as completed Nov 23, 2019

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unexpected behaviour of random.choices with zero weights #83062

unexpected behaviour of random.choices with zero weights #83062

izaromanowska mannequin commented Nov 21, 2019

izaromanowska mannequin commented Nov 21, 2019

rhettinger commented Nov 22, 2019

rhettinger commented Nov 22, 2019

izaromanowska mannequin commented Nov 22, 2019

tirkarthi commented Nov 22, 2019

rhettinger commented Nov 22, 2019

izaromanowska mannequin commented Nov 22, 2019

tim-one commented Nov 23, 2019

mdickinson commented Nov 23, 2019

rhettinger commented Nov 23, 2019

rhettinger commented Nov 23, 2019

izaromanowska mannequin commented Nov 25, 2019

unexpected behaviour of random.choices with zero weights #83062

unexpected behaviour of random.choices with zero weights #83062

Comments

izaromanowska mannequin commented Nov 21, 2019

izaromanowska mannequin commented Nov 21, 2019

rhettinger commented Nov 22, 2019

rhettinger commented Nov 22, 2019

izaromanowska mannequin commented Nov 22, 2019

tirkarthi commented Nov 22, 2019

rhettinger commented Nov 22, 2019

izaromanowska mannequin commented Nov 22, 2019

tim-one commented Nov 23, 2019

mdickinson commented Nov 23, 2019

rhettinger commented Nov 23, 2019

rhettinger commented Nov 23, 2019

izaromanowska mannequin commented Nov 25, 2019