Message 376816 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	rhettinger
Recipients	cool-RR, rhettinger
Date	2020-09-13.01:17:11
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1599959832.35.0.767013061693.issue41773@roundup.psfhosted.org>
In-reply-to

Content
In general NaNs wreak havoc on any function that uses comparisons: >>> max(10, nan) 10 >>> max(nan, 10) nan It isn't the responsibility of the functions to test for NaNs. Likewise, it would not make sense to document the NaN behavior in every function that uses a comparison. That would clutter the docs and it puts the responsibility in the wrong place. It is the NaN itself that is responsible for the behavior you're seeing. It is up to the NaN documentation to discuss its effects on downstream code. For example, sorting isn't consistent when NaNs are present: >>> sorted([10, nan, 5]) [10, nan, 5] >>> sorted([5, nan, 10]) [ 5, nan, 10] Also, a NaN is just one of many possible objects that create weird downstream effects. For examples, sets have a partial ordering and would also create odd results with sorted(), bisect(), max(), min(), etc. The situation with Infinity is similar. It is a special object that has the unusual property that inf+1==inf, so bisection of cumulative sums will give the rightmost infinity. Consider a population 'ABC' and weights of [5, 7, 2]. We have Element Weight Cumulative Weight Range (half-open interval) ------- ------ ----------------- -------------------------- A 5 5 0.0 <= X < 5.0 B 7 12 5.0 <= X < 12.0 C 2 14 12.0 <= X < 14.0 ------ 14 The selector X comes from: random() * 14.0 which gives a range of: 0.0 <= X < 14.0. Now consider a population 'ABC' and weights of [5, 0, 2]. We have Element Weight Cumulative Weight Range (half-open interval) ------- ------ ----------------- -------------------------- A 5 5 0.0 <= X < 5.0 B 0 5 5.0 <= X < 5.0 C 2 7 5.0 <= X < 7.0 ------ 7 If X is 5.0, we have to choose C because B has a zero selection probabliity. So have to pick the rightmost (bottommost) range that has 5.0, giving the C. Now, replace B's weight with float('Inf'): Element Weight Cumulative Weight Range (half-open interval) ------- ------ ----------------- -------------------------- A 5 5 0.0 <= X < 5.0 B Inf Inf 5.0 <= X < Inf C 2 Inf Inf <= X < Inf ------ Inf Since Inf+2 is Inf and Inf==Inf, the latter two ranges are undifferentiated. The selector itself in always Inf because "X = random() * inf" always gives inf. Using the previous rule, we have to choose the rightmost Inf which is C. This is in fact what choices() does: >>> choices('ABC', [5, float('Inf'), 2]) ['C'] It isn't an error. That is the same behavior that bisect() has when searching for a infinite value (or any other object that universally compares larger than anything except itself): >>> bisect([10, 20, inf], inf) 3 >>> bisect([10, 20, 30], inf) 3 When searching for infinity, you always get the rightmost insertion point even if the cuts points are finite. Accordingly, it makes sense that if one of the weights is infinite, then the total infinite, and the selector is infinite, so you always get the rightmost value.

In general NaNs wreak havoc on any function that uses comparisons:

    >>> max(10, nan)
    10
    >>> max(nan, 10)
    nan

It isn't the responsibility of the functions to test for NaNs.  Likewise, it would not make sense to document the NaN behavior in every function that uses a comparison.  That would clutter the docs and it puts the responsibility in the wrong place.

It is the NaN itself that is responsible for the behavior you're seeing.  It is up to the NaN documentation to discuss its effects on downstream code.  For example, sorting isn't consistent when NaNs are present:

    >>> sorted([10, nan, 5])
    [10, nan, 5]
    >>> sorted([5, nan, 10])
[   5, nan, 10]

Also, a NaN is just one of many possible objects that create weird downstream effects.  For examples, sets have a partial ordering and would also create odd results with sorted(), bisect(), max(), min(), etc.

The situation with Infinity is similar.  It is a special object that has the unusual property that inf+1==inf, so bisection of cumulative sums will give the rightmost infinity.

Consider a population 'ABC' and weights of [5, 7, 2].  We have

    Element   Weight   Cumulative Weight  Range (half-open interval)
    -------   ------   -----------------  --------------------------
      A         5      5                  0.0  <= X < 5.0
      B         7      12                 5.0  <= X < 12.0
      C         2      14                 12.0 <= X < 14.0
              ------
               14

The selector X comes from:  random() * 14.0 which gives a range of: 0.0 <= X < 14.0.

Now consider a population 'ABC' and weights of [5, 0, 2].  We have

    Element   Weight   Cumulative Weight  Range (half-open interval)
    -------   ------   -----------------  --------------------------
      A         5      5                  0.0  <= X < 5.0
      B         0      5                  5.0  <= X < 5.0
      C         2      7                  5.0  <= X < 7.0
              ------
                7

If X is 5.0, we have to choose C because B has a zero selection probabliity.  So have to pick the rightmost (bottommost) range that has 5.0, giving the C.

Now, replace B's weight with float('Inf'):

    Element   Weight   Cumulative Weight  Range (half-open interval)
    -------   ------   -----------------  --------------------------
      A         5      5                  0.0  <= X < 5.0
      B        Inf     Inf                5.0  <= X < Inf
      C         2      Inf                Inf  <= X < Inf
              ------
               Inf

Since Inf+2 is Inf and Inf==Inf, the latter two ranges are undifferentiated.  The selector itself in always Inf because "X = random() * inf" always gives inf.  Using the previous rule, we have to choose the rightmost Inf which is C.  This is in fact what choices() does:

    >>> choices('ABC', [5, float('Inf'), 2])
    ['C']

It isn't an error.  That is the same behavior that bisect() has when searching for a infinite value (or any other object that universally compares larger than anything except itself):

    >>> bisect([10, 20, inf], inf)
    3
    >>> bisect([10, 20, 30], inf)
    3

When searching for infinity, you always get the rightmost insertion point even if the cuts points are finite.  Accordingly, it makes sense that if one of the weights is infinite, then the total infinite, and the selector is infinite, so you always get the rightmost value.

History
Date	User	Action	Args
2020-09-13 01:17:12	rhettinger	set	recipients: + rhettinger, cool-RR
2020-09-13 01:17:12	rhettinger	set	messageid: <1599959832.35.0.767013061693.issue41773@roundup.psfhosted.org>
2020-09-13 01:17:12	rhettinger	link	issue41773 messages
2020-09-13 01:17:11	rhettinger	create