Message 196750 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	madison.may
Recipients	aisaac, madison.may, mark.dickinson, pitrou, rhettinger, serhiy.storchaka, tim.peters, westley.martinez
Date	2013-09-01.23:08:54
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1378076934.77.0.274572144087.issue18844@psf.upfronthosting.co.za>
In-reply-to

Content
> What do R, SciPy, Fortran, Matlab or other statistical packages already do? Numpy avoids recalculating the cumulative distribution by introducing a 'size' argument to numpy.random.choice(). The cumulative distribution is calculated once, then 'size' random choices are generated and returned. Their overall implementation is quite similar to the method suggested in the python docs. >>> choices, weights = zip(weighted_choices) >>> cumdist = list(itertools.accumulate(weights)) >>> x = random.random() cumdist[-1] >>> choices[bisect.bisect(cumdist, x)] The addition of a 'size' argument to random.choice() has already been discussed (and rejected) in Issue18414, but this was on the grounds that the standard idiom for generating a list of random choices ([random.choice(seq) for i in range(k)]) is obvious and efficient.

> What do R, SciPy, Fortran, Matlab or other statistical packages already do? 

Numpy avoids recalculating the cumulative distribution by introducing a 'size' argument to numpy.random.choice().  The cumulative distribution is calculated once, then 'size' random choices are generated and returned.

Their overall implementation is quite similar to the method suggested in the python docs.  

>>> choices, weights = zip(*weighted_choices)
>>> cumdist = list(itertools.accumulate(weights))
>>> x = random.random() * cumdist[-1]
>>> choices[bisect.bisect(cumdist, x)]

The addition of a 'size' argument to random.choice() has already been discussed (and rejected) in Issue18414, but this was on the grounds that the standard idiom for generating a list of random choices ([random.choice(seq) for i in range(k)]) is obvious and efficient.

History
Date	User	Action	Args
2013-09-01 23:08:54	madison.may	set	recipients: + madison.may, tim.peters, rhettinger, mark.dickinson, pitrou, aisaac, westley.martinez, serhiy.storchaka
2013-09-01 23:08:54	madison.may	set	messageid: <1378076934.77.0.274572144087.issue18844@psf.upfronthosting.co.za>
2013-09-01 23:08:54	madison.may	link	issue18844 messages
2013-09-01 23:08:54	madison.may	create