Message 108095 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	mark.dickinson
Recipients	mark.dickinson, rhettinger
Date	2010-06-18.10:20:48
SpamBayes Score	0.00011363064
Marked as misclassified	No
Message-id	<1276856452.03.0.825027672834.issue9025@psf.upfronthosting.co.za>
In-reply-to

Content
Not a serious bug, but worth noting: The result of randrange(n) is not even close to uniform for large n. Witness the obvious skew in the following (this takes a minute or two to run, so you might want to reduce the range argument): Python 3.2a0 (py3k:81980, Jun 14 2010, 11:23:36) [GCC 4.2.1 (SUSE Linux)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from random import randrange >>> from collections import Counter >>> Counter(randrange(6755399441055744) % 3 for _ in range(100000000)) Counter({1: 37508130, 0: 33323818, 2: 29168052}) (The actual probabilities here are, as you might guess from the above numbers: {0: 1/3, 1: 3/8, 2: 7/24}.) The cause: for n < 2*53, randrange(n) is effectively computed as int(random() n). For small n, there's a tiny bias involved, but this is still an effective method. However, as n increases towards 253, the bias increases significantly. (For n >= 253, the random module uses a different strategy that does produce uniformly distributed results.) A solution would be to lower the cutoff point where randrange() switches from using int(random() * n) to using the _randbelow method.

Not a serious bug, but worth noting:

The result of randrange(n) is not even close to uniform for large n.  Witness the obvious skew in the following (this takes a minute or two to run, so you might want to reduce the range argument):

Python 3.2a0 (py3k:81980, Jun 14 2010, 11:23:36)
[GCC 4.2.1 (SUSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from random import randrange
>>> from collections import Counter
>>> Counter(randrange(6755399441055744) % 3 for _ in range(100000000))
Counter({1: 37508130, 0: 33323818, 2: 29168052})

(The actual probabilities here are, as you might guess from the above numbers:  {0: 1/3, 1: 3/8, 2: 7/24}.)

The cause:  for n < 2**53, randrange(n) is effectively computed as int(random() * n).  For small n, there's a tiny bias involved, but this is still an effective method.  However, as n increases towards 2**53, the bias increases significantly.  (For n >= 2**53, the random module uses a different strategy that *does* produce uniformly distributed results.)

A solution would be to lower the cutoff point where randrange() switches from using int(random() * n) to using the _randbelow method.

History
Date	User	Action	Args
2010-06-18 10:20:52	mark.dickinson	set	recipients: + mark.dickinson, rhettinger
2010-06-18 10:20:52	mark.dickinson	set	messageid: <1276856452.03.0.825027672834.issue9025@psf.upfronthosting.co.za>
2010-06-18 10:20:49	mark.dickinson	link	issue9025 messages
2010-06-18 10:20:48	mark.dickinson	create