Author rhettinger
Recipients Scott Eilerman, docs@python, rhettinger
Date 2018-03-23.21:22:28
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
> Something along the lines of: "For a fixed seed, random.sample(population, k)
> is not guaranteed to return the same samples for different values of k."

In a way, the proposed wording succinctly directly addresses the problem you had.  So, it would seem like a reasonable suggestion.  On the other hand, it would be easy for others who haven't had this problem to have a hard time figuring out what it means (when should they be worried, what should be avoided, why is it a concern at all, what to do about it).

In general, the docs are worded in an affirmative manner (here's what something does, here's what it is for, and here is how to use it correctly).  In this case, the docs already indicate the intended way to address this use case: "the resulting list is in selection order so that all sub-slices will also be valid random samples.  This allows raffle winners (the sample) to be partitioned into grand prize and second place winners (the subslices)."

Perhaps there could be an algorithmic note, "internally, sample() shifts selection algorithms depending on the proportion of the population being sampled".  However, this would be unusual -- we don't usually document implementation details.  Numpy[1] and R[2] make no mention of the internals.  Julia[3] does discuss the algorithms but primarily from an efficiency point-of-view rather than as a usage note.

Perhaps it may be best to leave this alone rather than adding a note that may itself create confusion and worry.  AFAICT, this hasn't come up before in the 15 year history of random.sample(), not even a StackOverflow question.

Date User Action Args
2018-03-23 21:22:28rhettingersetrecipients: + rhettinger, docs@python, Scott Eilerman
2018-03-23 21:22:28rhettingersetmessageid: <>
2018-03-23 21:22:28rhettingerlinkissue33114 messages
2018-03-23 21:22:28rhettingercreate