In general NaNs wreak havoc on any function that uses comparisons:
>>> max(10, nan)
10
>>> max(nan, 10)
nan
It isn't the responsibility of the functions to test for NaNs. Likewise, it would not make sense to document the NaN behavior in every function that uses a comparison. That would clutter the docs and it puts the responsibility in the wrong place.
It is the NaN itself that is responsible for the behavior you're seeing. It is up to the NaN documentation to discuss its effects on downstream code. For example, sorting isn't consistent when NaNs are present:
>>> sorted([10, nan, 5])
[10, nan, 5]
>>> sorted([5, nan, 10])
[ 5, nan, 10]
Also, a NaN is just one of many possible objects that create weird downstream effects. For examples, sets have a partial ordering and would also create odd results with sorted(), bisect(), max(), min(), etc.
The situation with Infinity is similar. It is a special object that has the unusual property that inf+1==inf, so bisection of cumulative sums will give the rightmost infinity.
Consider a population 'ABC' and weights of [5, 7, 2]. We have
Element Weight Cumulative Weight Range (half-open interval)
------- ------ ----------------- --------------------------
A 5 5 0.0 <= X < 5.0
B 7 12 5.0 <= X < 12.0
C 2 14 12.0 <= X < 14.0
------
14
The selector X comes from: random() * 14.0 which gives a range of: 0.0 <= X < 14.0.
Now consider a population 'ABC' and weights of [5, 0, 2]. We have
Element Weight Cumulative Weight Range (half-open interval)
------- ------ ----------------- --------------------------
A 5 5 0.0 <= X < 5.0
B 0 5 5.0 <= X < 5.0
C 2 7 5.0 <= X < 7.0
------
7
If X is 5.0, we have to choose C because B has a zero selection probabliity. So have to pick the rightmost (bottommost) range that has 5.0, giving the C.
Now, replace B's weight with float('Inf'):
Element Weight Cumulative Weight Range (half-open interval)
------- ------ ----------------- --------------------------
A 5 5 0.0 <= X < 5.0
B Inf Inf 5.0 <= X < Inf
C 2 Inf Inf <= X < Inf
------
Inf
Since Inf+2 is Inf and Inf==Inf, the latter two ranges are undifferentiated. The selector itself in always Inf because "X = random() * inf" always gives inf. Using the previous rule, we have to choose the rightmost Inf which is C. This is in fact what choices() does:
>>> choices('ABC', [5, float('Inf'), 2])
['C']
It isn't an error. That is the same behavior that bisect() has when searching for a infinite value (or any other object that universally compares larger than anything except itself):
>>> bisect([10, 20, inf], inf)
3
>>> bisect([10, 20, 30], inf)
3
When searching for infinity, you always get the rightmost insertion point even if the cuts points are finite. Accordingly, it makes sense that if one of the weights is infinite, then the total infinite, and the selector is infinite, so you always get the rightmost value. |