This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Omit k in random.sample()
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.11
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Dennis Sweeney, TilmanKrummeck, rhettinger
Priority: normal Keywords:

Created on 2021-12-29 05:36 by TilmanKrummeck, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (11)
msg409283 - (view) Author: Tilman Krummeck (TilmanKrummeck) * Date: 2021-12-29 05:36
random.sample can be used to choose k items from a given sequence. Currently, k is a mandatory parameter. 

I suggest to make k optional and instead, if omitted, pick a random value from the range of 0 and the length of the sequence.

Of course, doing this must also consider any possible value of 'count'
msg409284 - (view) Author: Dennis Sweeney (Dennis Sweeney) * (Python committer) Date: 2021-12-29 06:52
Can you describe more about your use-case for this?

You can already do something like this now with something like the following:

    def random_subset(sequence):
        source = random.randbytes(len(sequence))
        return [x for x, r in zip(sequence, source) if r & 1]

You could add a random.shuffle() call at the end if your application needs it.

For the case with counts, you could do getrandbits(i).bit_count() to get a binomial distribution to choose how many of each element to include.
msg409285 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-12-29 07:04
If all you want is a sample where k==1, then use choice().  That is clearer and more efficient.  

The sample() function is for sampling without replacement which only makes sense when k > 1; otherwise, choice() or choices() is usually what you want.
msg409286 - (view) Author: Tilman Krummeck (TilmanKrummeck) * Date: 2021-12-29 07:06
I use this mostly in tests to randomize my inputs. So currently I'm doing something like this:

result = random.sample(items, random.randint(0, len(items)))

I guess if someone would omit 'k' he wouldn't care about the result (which is probably a use-case when using random functions). This would mostly be a convenience improvement for lazy guys like myself.
msg409287 - (view) Author: Dennis Sweeney (Dennis Sweeney) * (Python committer) Date: 2021-12-29 07:10
For completeness:

def random_subset_with_counts(sequence, counts):
    result = []
    for x, k in zip(sequence, counts):
        result.extend([x] * random.getrandbits(k).bit_count())
    return result
msg409288 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-12-29 07:12
Okay. Thank for the quick response and the suggestion.  I'm going to mark this one as closed.  AFAICT, it distracts users from better solutions.  

I did a quick code search for sample().  The k==1 case is rare and in most cases the code should have used choice() or choices() instead.  Accordingly, it doesn't make sense to make k==1 the default.  

Also, I suspect (but don't no for sure) that most users benefit by having the error message when k is omitted.  That is more likely a user mistake than a correct design choice.
msg409289 - (view) Author: Tilman Krummeck (TilmanKrummeck) * Date: 2021-12-29 07:19
My suggestion is not to set k=1 when omitted but to assign it a random value that is something between 0 and the maximum possible value which is:

sum(counts) if counts else len(population)
msg409290 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-12-29 07:27
> My suggestion is not to set k=1 when omitted but to assign it a random value 

Sorry, I think that is just bizarre.  Also, some populations are *very* large, so a minor user accident of omitting a parameter would result in a large unexpected output.  For choices(), it would have been nice to have k default the population size (because resampling is a common use case) but we didn't go that path because of the likelihood of a large unexpected output.  The same reasoning holds here a well.

If you want to go down this path, I recommend making your code explicit about what it is trying to do.  Something this unexpected should not be the implicit and default behavior.
msg409291 - (view) Author: Tilman Krummeck (TilmanKrummeck) * Date: 2021-12-29 07:31
Well, it's not bizarre, it's a use-case I'm facing quite often.

But thanks for the clarification, I haven't had very large populations in mind - this makes indeed sense.
msg409293 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-12-29 08:00
The use case isn't bizarre.  But having an API where that is the default behavior would be.  From the point of view of most users, such an API would be unusual and surprising (I don't know of any other random package that has such a default).
msg409300 - (view) Author: Tilman Krummeck (TilmanKrummeck) * Date: 2021-12-29 11:23
Hmm, ok, that sounds obvious. Thanks for the clarification.
History
Date User Action Args
2022-04-11 14:59:53adminsetgithub: 90348
2021-12-29 11:23:46TilmanKrummecksetmessages: + msg409300
2021-12-29 08:00:05rhettingersetmessages: + msg409293
2021-12-29 07:31:52TilmanKrummecksetmessages: + msg409291
2021-12-29 07:27:29rhettingersetmessages: + msg409290
2021-12-29 07:19:04TilmanKrummecksetmessages: + msg409289
2021-12-29 07:12:24rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg409288

stage: resolved
2021-12-29 07:10:34Dennis Sweeneysetmessages: + msg409287
2021-12-29 07:06:50TilmanKrummecksetmessages: + msg409286
2021-12-29 07:04:54rhettingersetassignee: rhettinger
messages: + msg409285
2021-12-29 06:52:24Dennis Sweeneysetversions: + Python 3.11
nosy: + Dennis Sweeney, rhettinger

messages: + msg409284

components: + Library (Lib), - Extension Modules
2021-12-29 05:36:37TilmanKrummeckcreate