This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Weighted random.sample() (weighted sampling without replacement)
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.8
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: mark.dickinson, pablogsal, piotrjurkiewicz, rhettinger
Priority: normal Keywords: patch

Created on 2018-07-25 15:58 by piotrjurkiewicz, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 8465 closed piotrjurkiewicz, 2018-07-25 15:59
Messages (2)
msg322367 - (view) Author: Piotr Jurkiewicz (piotrjurkiewicz) * Date: 2018-07-25 15:58
Function random.choices(), which appeared in Python 3.6, allows to perform weighted random sampling with replacement. Function random.sample() performs random sampling without replacement, but cannot do it weighted.

I propose to enhance random.sample() to perform weighted sampling. That way all four possibilities will be supported:

- non-weighted sampling with replacement: random.choices(..., weights=None) (exists)

- weighted sampling with replacement: random.choices(..., weights=weights) (exists)

- non-weighted sampling without replacement: random.sample(..., weights=None) (exists)

- weighted sampling without replacement: random.sample(..., weights=weights) (NEW)

Rationale:

Weighted sampling without replacement is a popular problem. There are lot of questions on StackOverflow and similar sites how to implement it. Unfortunately, many proposed solutions are wrong, for example:

https://stackoverflow.com/a/353510/2178047
https://softwareengineering.stackexchange.com/a/233552/161807

or have excessive computational complexity (e.g. quadratic). There are lot of suggestions to use numpy.random.choice() to do that, which supports all four possibilities with a single function:

    numpy.random.choice(a, size=None, replace=True, p=None)

But of course this is an overkill to install numpy just to do that.

I think that this should be possible with stdlib, without the need to implement it by yourself or to install numpy. Especially, that it can be implemented in 2 lines (plus 4 lines of error checking), as you can see in the PR.
msg322402 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-07-26 06:42
Thank for the suggestion and patch, but I will decline.  To me, the reseviour style algorithm doesn't fit in well (with the slow sorting step and the frequent calls to the random in the key function). For large population sizes (which sample() was designed for), the API and algorithm perform badly.  For typical student "urn problems", the better approach is to pre-weight the population by the exact counts specified in the problem.  I recommend that you post a recipe for what you want to do, but I don't think the standard library is the place for it.
History
Date User Action Args
2022-04-11 14:59:03adminsetgithub: 78408
2018-07-26 06:42:37rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg322402

stage: patch review -> resolved
2018-07-25 16:25:17pablogsalsetnosy: + pablogsal
2018-07-25 16:19:57serhiy.storchakasetassignee: rhettinger

nosy: + rhettinger, mark.dickinson
versions: + Python 3.8
2018-07-25 15:59:48piotrjurkiewiczsetkeywords: + patch
stage: patch review
pull_requests: + pull_request7988
2018-07-25 15:58:34piotrjurkiewiczcreate