classification
Title: Add optional weights parameter to random.sample()
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: mark.dickinson, rhettinger, tim.peters
Priority: normal Keywords: patch

Created on 2020-05-06 22:35 by rhettinger, last changed 2020-05-08 14:53 by rhettinger. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 19970 merged rhettinger, 2020-05-06 22:35
Messages (4)
msg368307 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-05-06 22:35
I've seen multiple requests for this and it isn't obvious how to do it with the existing tooling.  

The example currently given in the recipes section isn't scalable because it involves expanding the population into a big list with repeated elements:

      sample(['x', 'x', 'x', 'x', 'y', 'y', 'z'], k=5)

Example task:  Given an urn with 8 red balls, 2 white balls, and 3 green balls, choose ten without replacement:

    >>> population = ['red', 'blue', 'green']
    >>> weights =    [  8,      5,      3   ]
    >>> sample(population, weights=weights, k=10)
    ['red', 'green', 'blue', 'red', 'green', 'blue', 'red', 'blue', 'red', 'blue']

I could also add *cum_weights* as an optional optimization but think it best to wait until someone asks for it ;-)
msg368333 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2020-05-07 11:34
+1 for the functionality.

How about "counts" instead of "weights"?

I found the name "weights" misleading - my first thought was that this would be doing a weighted sampling without replacement (like NumPy's `random.choice(..., replace=False, p=weights)`).

Of course, now I've read the docs, I know better.
msg368374 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-05-07 20:49
> How about "counts" instead of "weights"?

That makes sense.
msg368445 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-05-08 14:53
New changeset 81a5fc38e81b424869f4710f48e9371dfa2d3b77 by Raymond Hettinger in branch 'master':
bpo-40541: Add optional *counts* parameter to random.sample() (GH-19970)
https://github.com/python/cpython/commit/81a5fc38e81b424869f4710f48e9371dfa2d3b77
History
Date User Action Args
2020-05-08 14:53:54rhettingersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2020-05-08 14:53:18rhettingersetmessages: + msg368445
2020-05-07 20:49:09rhettingersetmessages: + msg368374
2020-05-07 11:34:58mark.dickinsonsetmessages: + msg368333
2020-05-07 11:25:31mark.dickinsonsetnosy: + mark.dickinson
2020-05-06 22:35:42rhettingersetkeywords: + patch
stage: patch review
pull_requests: + pull_request19286
2020-05-06 22:35:20rhettingercreate