Author thomasahle
Recipients thomasahle
Date 2019-07-25.17:03:28
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1564074208.5.0.498175435162.issue37682@roundup.psfhosted.org>
In-reply-to
Content
Given a generator `f()` we can use `random.sample(list(f()), 10)` to get a uniform sample of the values generated.
This is fine, and fast, as long as `list(f())` easily fits in memory.
However, if it doesn't, one has to implement the reservoir sampling algorithm as a pure python function, which is much slower, and not so easy.

It seems that having a fast reservoir sampling implementation in `random.sample` to use for iterators would be both useful and make the API more predictable.

Currently when passing an iterator `random.sample` throws `TypeError: Population must be a sequence or set.`.
This is inconsistent with most of the standard library which accepts lists and iterators transparently.

I apologize if this enhancement has already been discussed.
I wasn't able to find it.
If wanted, I can write up a pull request.
I believe questions like this: https://stackoverflow.com/questions/12581437/python-random-sample-with-a-generator-iterable-iterator makes it clear that such functionality is wanted and non-obvious.
History
Date User Action Args
2019-07-25 17:03:28thomasahlesetrecipients: + thomasahle
2019-07-25 17:03:28thomasahlesetmessageid: <1564074208.5.0.498175435162.issue37682@roundup.psfhosted.org>
2019-07-25 17:03:28thomasahlelinkissue37682 messages
2019-07-25 17:03:28thomasahlecreate