This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author rhettinger
Recipients cool-RR, rhettinger
Date 2020-09-13.01:17:11
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1599959832.35.0.767013061693.issue41773@roundup.psfhosted.org>
In-reply-to
Content
In general NaNs wreak havoc on any function that uses comparisons:

    >>> max(10, nan)
    10
    >>> max(nan, 10)
    nan

It isn't the responsibility of the functions to test for NaNs.  Likewise, it would not make sense to document the NaN behavior in every function that uses a comparison.  That would clutter the docs and it puts the responsibility in the wrong place.

It is the NaN itself that is responsible for the behavior you're seeing.  It is up to the NaN documentation to discuss its effects on downstream code.  For example, sorting isn't consistent when NaNs are present:

    >>> sorted([10, nan, 5])
    [10, nan, 5]
    >>> sorted([5, nan, 10])
[   5, nan, 10]

Also, a NaN is just one of many possible objects that create weird downstream effects.  For examples, sets have a partial ordering and would also create odd results with sorted(), bisect(), max(), min(), etc.

The situation with Infinity is similar.  It is a special object that has the unusual property that inf+1==inf, so bisection of cumulative sums will give the rightmost infinity.

Consider a population 'ABC' and weights of [5, 7, 2].  We have

    Element   Weight   Cumulative Weight  Range (half-open interval)
    -------   ------   -----------------  --------------------------
      A         5      5                  0.0  <= X < 5.0
      B         7      12                 5.0  <= X < 12.0
      C         2      14                 12.0 <= X < 14.0
              ------
               14

The selector X comes from:  random() * 14.0 which gives a range of: 0.0 <= X < 14.0.

Now consider a population 'ABC' and weights of [5, 0, 2].  We have

    Element   Weight   Cumulative Weight  Range (half-open interval)
    -------   ------   -----------------  --------------------------
      A         5      5                  0.0  <= X < 5.0
      B         0      5                  5.0  <= X < 5.0
      C         2      7                  5.0  <= X < 7.0
              ------
                7

If X is 5.0, we have to choose C because B has a zero selection probabliity.  So have to pick the rightmost (bottommost) range that has 5.0, giving the C.

Now, replace B's weight with float('Inf'):

    Element   Weight   Cumulative Weight  Range (half-open interval)
    -------   ------   -----------------  --------------------------
      A         5      5                  0.0  <= X < 5.0
      B        Inf     Inf                5.0  <= X < Inf
      C         2      Inf                Inf  <= X < Inf
              ------
               Inf

Since Inf+2 is Inf and Inf==Inf, the latter two ranges are undifferentiated.  The selector itself in always Inf because "X = random() * inf" always gives inf.  Using the previous rule, we have to choose the rightmost Inf which is C.  This is in fact what choices() does:

    >>> choices('ABC', [5, float('Inf'), 2])
    ['C']

It isn't an error.  That is the same behavior that bisect() has when searching for a infinite value (or any other object that universally compares larger than anything except itself):

    >>> bisect([10, 20, inf], inf)
    3
    >>> bisect([10, 20, 30], inf)
    3

When searching for infinity, you always get the rightmost insertion point even if the cuts points are finite.  Accordingly, it makes sense that if one of the weights is infinite, then the total infinite, and the selector is infinite, so you always get the rightmost value.
History
Date User Action Args
2020-09-13 01:17:12rhettingersetrecipients: + rhettinger, cool-RR
2020-09-13 01:17:12rhettingersetmessageid: <1599959832.35.0.767013061693.issue41773@roundup.psfhosted.org>
2020-09-13 01:17:12rhettingerlinkissue41773 messages
2020-09-13 01:17:11rhettingercreate