classification
Title: Remove misleading documentation for random.shuffle
Type: Stage: resolved
Components: Documentation Versions:
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: docs@python Nosy List: dbenbenn, docs@python, r.david.murray, rhettinger, tim.peters
Priority: normal Keywords: patch

Created on 2013-09-05 00:18 by dbenbenn, last changed 2013-09-05 05:40 by tim.peters. This issue is now closed.

Files
File name Uploaded Description Edit
mywork.patch dbenbenn, 2013-09-05 00:18 Remove incorrect documentation comment review
Messages (5)
msg196971 - (view) Author: David Benbennick (dbenbenn) Date: 2013-09-05 00:18
Since Python 2.1 [1], when random.shuffle was added, the documentation has said:

"""Note that for even rather small len(x), the total number of permutations of x is larger than the period of most random number generators; this implies that most permutations of a long sequence can never be generated."""

This comment is incorrect and misleading.  In fact, I claim that shuffle can produce "all" permutations for any representable sequence.

To shuffle a sequence of length N requires log(N!) ~ N * log(N/e) bits of randomness [2].  The random module provides a generator with "a period of 2**19937-1", meaning you can get 2**19937 bits of randomness out of it before it starts repeating.

All of which is to say that any representable sequence, say N < 2**50, will need no more than 2**60 bits of randomness to shuffle.  That is well within the period of the random number generator.

Attached is a patch that deletes the comment.

An illustration of this misconception is at [3].


1: http://docs.python.org/release/2.1/lib/module-random.html
2: https://en.wikipedia.org/wiki/Factorial#Rate_of_growth_and_approximations_for_large_n
3: http://stackoverflow.com/questions/3062741/maximal-length-of-list-to-shuffle-with-python-random-shuffle
msg196974 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-09-05 03:11
It seems to me that 2080 (per the accepted answer to your [3]) is indeed "a rather small len(x)", and that the docs are correct as written.

I wonder if it would be worth adding a footnote that explains how to calculate that example 2080 number from the documented period of the random number generator, to make the concepts involved a little bit clearer?
msg196975 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-09-05 03:20
Alternatively, you would have to supply (or supply a pointer to) a mathematical proof of your thesis.
msg196978 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2013-09-05 04:20
When the comment was introduced, Python's Wichmann-Hill generator had a much shorter period, and we couldn't even generate all the permutations of a deck of cards.

The period is astronomically larger now, but the stackoverflow answer (2080) is correct for the current upper bound.  The docs could be beefed up to say more about that - but they'd get awfully tedious ;-)

To the OP, this is your error:  it takes lg(N!) "bits of randomness" (lg is the logarithm to base 2) to generate _one_ random permutation.  That says nothing about what's needed to generate _all_ permutations.

With an RNG of period P, there are P possible starting states.  The starting state wholly determines the permutation produced.  Therefore an RNG of period P cannot generate more than P distinct permutations (and that's an absolute upper bound - it may not be achieved).  That's why comparing N! to P is the correct computation here, and indeed:

>>> math.factorial(2080) > 2**19937 - 1
False
>>> math.factorial(2081) > 2**19937 - 1
True

If you still don't believe it, here's a challenge:  take a toy RNG with a small period, and use it to generate permutations (via any method you like).  You'll find that you never get more distinct permutations than the period of your RNG.  Then reread the above ;-)
msg196979 - (view) Author: David Benbennick (dbenbenn) Date: 2013-09-05 05:14
Okay, I see what the comment is saying now.  I was mistaken.

It might make the statement clearer if it is made more precise.  "rather small" is vague.  That vagueness is intentional, but it makes it more confusing.
History
Date User Action Args
2013-09-05 05:40:10tim.peterssetresolution: not a bug
stage: resolved
2013-09-05 05:14:16dbenbennsetstatus: open -> closed

messages: + msg196979
2013-09-05 04:20:34tim.peterssetnosy: + tim.peters
messages: + msg196978
2013-09-05 03:20:43r.david.murraysetmessages: + msg196975
2013-09-05 03:11:00r.david.murraysetnosy: + rhettinger, r.david.murray
messages: + msg196974
2013-09-05 00:18:34dbenbenncreate