Issue 46190: Omit k in random.sample()

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/90348

classification

Title:	Omit k in random.sample()
Type:	enhancement	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.11

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:	rhettinger	Nosy List:	Dennis Sweeney, TilmanKrummeck, rhettinger
Priority:	normal	Keywords:

Created on 2021-12-29 05:36 by TilmanKrummeck, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (11)
msg409283 - (view)	Author: Tilman Krummeck (TilmanKrummeck) *	Date: 2021-12-29 05:36
random.sample can be used to choose k items from a given sequence. Currently, k is a mandatory parameter. I suggest to make k optional and instead, if omitted, pick a random value from the range of 0 and the length of the sequence. Of course, doing this must also consider any possible value of 'count'
msg409284 - (view)	Author: Dennis Sweeney (Dennis Sweeney) *	Date: 2021-12-29 06:52
Can you describe more about your use-case for this? You can already do something like this now with something like the following: def random_subset(sequence): source = random.randbytes(len(sequence)) return [x for x, r in zip(sequence, source) if r & 1] You could add a random.shuffle() call at the end if your application needs it. For the case with counts, you could do getrandbits(i).bit_count() to get a binomial distribution to choose how many of each element to include.
msg409285 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2021-12-29 07:04
If all you want is a sample where k==1, then use choice(). That is clearer and more efficient. The sample() function is for sampling without replacement which only makes sense when k > 1; otherwise, choice() or choices() is usually what you want.
msg409286 - (view)	Author: Tilman Krummeck (TilmanKrummeck) *	Date: 2021-12-29 07:06
I use this mostly in tests to randomize my inputs. So currently I'm doing something like this: result = random.sample(items, random.randint(0, len(items))) I guess if someone would omit 'k' he wouldn't care about the result (which is probably a use-case when using random functions). This would mostly be a convenience improvement for lazy guys like myself.
msg409287 - (view)	Author: Dennis Sweeney (Dennis Sweeney) *	Date: 2021-12-29 07:10
For completeness: def random_subset_with_counts(sequence, counts): result = [] for x, k in zip(sequence, counts): result.extend([x] * random.getrandbits(k).bit_count()) return result
msg409288 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2021-12-29 07:12
Okay. Thank for the quick response and the suggestion. I'm going to mark this one as closed. AFAICT, it distracts users from better solutions. I did a quick code search for sample(). The k==1 case is rare and in most cases the code should have used choice() or choices() instead. Accordingly, it doesn't make sense to make k==1 the default. Also, I suspect (but don't no for sure) that most users benefit by having the error message when k is omitted. That is more likely a user mistake than a correct design choice.
msg409289 - (view)	Author: Tilman Krummeck (TilmanKrummeck) *	Date: 2021-12-29 07:19
My suggestion is not to set k=1 when omitted but to assign it a random value that is something between 0 and the maximum possible value which is: sum(counts) if counts else len(population)
msg409290 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2021-12-29 07:27
> My suggestion is not to set k=1 when omitted but to assign it a random value Sorry, I think that is just bizarre. Also, some populations are very large, so a minor user accident of omitting a parameter would result in a large unexpected output. For choices(), it would have been nice to have k default the population size (because resampling is a common use case) but we didn't go that path because of the likelihood of a large unexpected output. The same reasoning holds here a well. If you want to go down this path, I recommend making your code explicit about what it is trying to do. Something this unexpected should not be the implicit and default behavior.
msg409291 - (view)	Author: Tilman Krummeck (TilmanKrummeck) *	Date: 2021-12-29 07:31
Well, it's not bizarre, it's a use-case I'm facing quite often. But thanks for the clarification, I haven't had very large populations in mind - this makes indeed sense.
msg409293 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2021-12-29 08:00
The use case isn't bizarre. But having an API where that is the default behavior would be. From the point of view of most users, such an API would be unusual and surprising (I don't know of any other random package that has such a default).
msg409300 - (view)	Author: Tilman Krummeck (TilmanKrummeck) *	Date: 2021-12-29 11:23
Hmm, ok, that sounds obvious. Thanks for the clarification.

History
Date	User	Action	Args
2022-04-11 14:59:53	admin	set	github: 90348
2021-12-29 11:23:46	TilmanKrummeck	set	messages: + msg409300
2021-12-29 08:00:05	rhettinger	set	messages: + msg409293
2021-12-29 07:31:52	TilmanKrummeck	set	messages: + msg409291
2021-12-29 07:27:29	rhettinger	set	messages: + msg409290
2021-12-29 07:19:04	TilmanKrummeck	set	messages: + msg409289
2021-12-29 07:12:24	rhettinger	set	status: open -> closed resolution: rejected messages: + msg409288 stage: resolved
2021-12-29 07:10:34	Dennis Sweeney	set	messages: + msg409287
2021-12-29 07:06:50	TilmanKrummeck	set	messages: + msg409286
2021-12-29 07:04:54	rhettinger	set	assignee: rhettinger messages: + msg409285
2021-12-29 06:52:24	Dennis Sweeney	set	versions: + Python 3.11 nosy: + Dennis Sweeney, rhettinger messages: + msg409284 components: + Library (Lib), - Extension Modules
2021-12-29 05:36:37	TilmanKrummeck	create