Random.paretovariate sometimes raises ZeroDivisionError for small alpha #85593

DavidMacIver · 2020-07-28T17:59:09Z

BPO	41421
Nosy	@tim-one, @rhettinger, @mdickinson, @serhiy-storchaka
PRs	bpo-41421: Algebraic simplification for random.paretovariate() #21695

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2020-08-02.00:40:14.827>
created_at = <Date 2020-07-28.17:59:09.066>
labels = ['3.8', 'type-bug', 'library', '3.9', '3.10']
title = 'Random.paretovariate sometimes raises ZeroDivisionError for small alpha'
updated_at = <Date 2020-08-02.00:40:14.823>
user = 'https://bugs.python.org/DavidMacIver'

bugs.python.org fields:

activity = <Date 2020-08-02.00:40:14.823>
actor = 'rhettinger'
assignee = 'none'
closed = True
closed_date = <Date 2020-08-02.00:40:14.827>
closer = 'rhettinger'
components = ['Library (Lib)']
creation = <Date 2020-07-28.17:59:09.066>
creator = 'David MacIver'
dependencies = []
files = []
hgrepos = []
issue_num = 41421
keywords = ['patch']
message_count = 12.0
messages = ['374516', '374517', '374537', '374538', '374546', '374548', '374564', '374565', '374659', '374666', '374672', '374674']
nosy_count = 5.0
nosy_names = ['tim.peters', 'rhettinger', 'mark.dickinson', 'serhiy.storchaka', 'David MacIver']
pr_nums = ['21695']
priority = 'normal'
resolution = 'works for me'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue41421'
versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

DavidMacIver · 2020-07-28T17:59:09Z

The following code raises a ZeroDivisionError:

from random import Random

Random(14064741636871487939).paretovariate(0.01)

This raises:

random.py, line 692, in paretovariate
return 1.0 / u ** (1.0/alpha)
ZeroDivisionError: float division by zero

That specific stack trace is from 3.8.5 but I've tried this on 3.6 and 3.7 as well.

It's a little hard to tell what the intended correct range of parameter values is for paretovariate, and this may just be lack of validation for the alpha < 1 case - perhaps that was never intended to be supported?

Based on some very informal inspection, what seems to happen is that the probability of getting a ZeroDivisionError approaches one as you approach zero. They rarely occur at this level of alpha (0.01), but for alpha=0.001 they seem to occur just under half the time, and for alpha=0.0001 they occur more than 90% of the time.

(For the interested, this bug was found by Hypothesis as part of its own testing of our integration with the Random API)

DavidMacIver · 2020-07-28T18:03:11Z

I guess on actual inspection of the code (which I should have done before, sorry) it's obvious why this happens: u ** (1.0 / alpha) will round to 0.0 for small values of u, and the smaller alpha is the higher the probability of that happening, in that it happens with probability c ** alpha for some c.

tim-one · 2020-07-28T23:35:48Z

I'm inclined to ignore this. No actual user has complained about this, and I doubt any ever will: it's got to be rare as hen's teeth to use a parameter outside of, say, [0.1, 10.0], in real life. The division error can't happen for those. For "small" alpha, the distribution simply does contain values too large to represent as binary doubles. So the only other defensible thing to do be done in this case is to return math.inf (or raise OverflowError) - but why slow it down for something no actual user cares about?

rhettinger · 2020-07-28T23:36:08Z

The point of the "u = 1.0 - self.random()" line was to prevent the case where *u* was exactly equal to zero; however, as you noted, raising a very small number to a small can round down to zero.

We could wrap the calculation in a try/except to catch the ZeroDivisionError but that is probably insufficient. The codomain can easily spew beyond the range of a C double:

   >>> alpha = 0.01
   >>> u = 0.005
   >>> 1.0 / u ** (1.0 / alpha)
   1.2676506002282268e+230             <== Huge!

We could clamp the values as is done in gammavariate(). Or we can raise a clean Overflow or Underflow exception but that would be unpleasant for users if it were to arise unexpectedly.

tim-one · 2020-07-28T23:54:42Z

BTW, if we have to "do something", how about changing

return 1.0 / u ** (1.0/alpha)

to the mathematically equivalent

return (1.0 / u) ** (1.0/alpha)

? Not sure about Linux-y boxes, but on Windows that would raise OverflowError instead of ZeroDivisionError. Which makes more sense, because the Pareto variable we're _trying_ to produce is too large to represent. If it returns math.inf on some box instead, same thing.

Or, also faster, and suffering fewer rounding errors,

return u ** (-1.0 / alpha)

rhettinger · 2020-07-29T00:05:17Z

I don't feel a need to change anything, my post hit a few seconds after yours :-)

That said, "return u ** (-1.0 / alpha)" would be a slight, but nice improvement.

It might also be reasonable to document a safe range of alpha values.

DavidMacIver · 2020-07-29T08:34:00Z

I should say, I don't actually care about this bug at all: I only ran into it because of trying to recreate the random API and testing that my recreation worked sensibly. I thought I'd do the good citizen thing and report it, but I'm totally fine with any resolution and given that that code has been unchanged for at least 18 years I doubt anyone else has ever run into this problem.

I do think it would be beneficial to document the range of reasonable alpha on paretovariate - the lack of such documentation is the only reason I hit this at all - but I think leaving the behaviour as is would be fine.

serhiy-storchaka · 2020-07-29T09:01:58Z

+1 for "u ** (-1.0 / alpha)"!

rhettinger · 2020-08-01T08:18:50Z

New changeset 5c32709 by Raymond Hettinger in branch 'master':
bpo-41421: Algebraic simplification for random.paretovariate() (GH-21695)
5c32709

rhettinger · 2020-08-01T19:36:19Z

How about this wording?

If alpha is smaller than 0.1, an occasional OverflowError can
arise when the variate exceeds the range of Python float.

I like using 0.1 because it's easy and gives us some wiggle room. The actual cutoff is lower:

>>> alpha = 0.05079
>>> sys.float_info.epsilon ** (-1.0 / alpha)

    >>> alpha = 0.05078
    >>> sys.float_info.epsilon ** (-1.0 / alpha)
    Traceback (most recent call last):
       File "<pyshell#48>", line 1, in <module>
         sys.float_info.epsilon ** (-1.0 / alpha)
    1.590779741838475e+308

tim-one · 2020-08-01T21:07:18Z

That text is fine, if you feel something needs to be said at all. I really don't. A Pareto distribution of this kind with parameter <= 1.0 has infinite expected value - VERY long tail. Anyone who knows what they're doing already knows that. The reason the implementation can't "blow up" for parameters >= (roughly) 0.1 isn't that IEEE doubles have such a large dynamic range but rather that they can't represent a number < 1.0 larger than 1 - 2**-53 (so u = 1 - random.random() is always at least 2**-53). The actual distribution has infinite expected value nonetheless, but the implementation is incapable of generating any of its very large values (which, while very rare, are also very large).

rhettinger · 2020-08-02T00:40:15Z

Okay, marking this as closed.

DavidMacIver mannequin added 3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir labels Jul 28, 2020

rhettinger added 3.9 only security fixes 3.10 only security fixes type-bug An unexpected behavior, bug, or error and removed 3.7 (EOL) end of life labels Jul 28, 2020

rhettinger closed this as completed Aug 2, 2020

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Random.paretovariate sometimes raises ZeroDivisionError for small alpha #85593

Random.paretovariate sometimes raises ZeroDivisionError for small alpha #85593

DavidMacIver mannequin commented Jul 28, 2020

DavidMacIver mannequin commented Jul 28, 2020

DavidMacIver mannequin commented Jul 28, 2020

tim-one commented Jul 28, 2020

rhettinger commented Jul 28, 2020

tim-one commented Jul 28, 2020

rhettinger commented Jul 29, 2020

DavidMacIver mannequin commented Jul 29, 2020

serhiy-storchaka commented Jul 29, 2020

rhettinger commented Aug 1, 2020

rhettinger commented Aug 1, 2020

tim-one commented Aug 1, 2020

rhettinger commented Aug 2, 2020

Random.paretovariate sometimes raises ZeroDivisionError for small alpha #85593

Random.paretovariate sometimes raises ZeroDivisionError for small alpha #85593

Comments

DavidMacIver mannequin commented Jul 28, 2020

DavidMacIver mannequin commented Jul 28, 2020

DavidMacIver mannequin commented Jul 28, 2020

tim-one commented Jul 28, 2020

rhettinger commented Jul 28, 2020

tim-one commented Jul 28, 2020

rhettinger commented Jul 29, 2020

DavidMacIver mannequin commented Jul 29, 2020

serhiy-storchaka commented Jul 29, 2020

rhettinger commented Aug 1, 2020

rhettinger commented Aug 1, 2020

tim-one commented Aug 1, 2020

rhettinger commented Aug 2, 2020