Author thomasahle
Recipients thomasahle
Date 2019-07-25.17:03:28
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
Given a generator `f()` we can use `random.sample(list(f()), 10)` to get a uniform sample of the values generated.
This is fine, and fast, as long as `list(f())` easily fits in memory.
However, if it doesn't, one has to implement the reservoir sampling algorithm as a pure python function, which is much slower, and not so easy.

It seems that having a fast reservoir sampling implementation in `random.sample` to use for iterators would be both useful and make the API more predictable.

Currently when passing an iterator `random.sample` throws `TypeError: Population must be a sequence or set.`.
This is inconsistent with most of the standard library which accepts lists and iterators transparently.

I apologize if this enhancement has already been discussed.
I wasn't able to find it.
If wanted, I can write up a pull request.
I believe questions like this: makes it clear that such functionality is wanted and non-obvious.
Date User Action Args
2019-07-25 17:03:28thomasahlesetrecipients: + thomasahle
2019-07-25 17:03:28thomasahlesetmessageid: <>
2019-07-25 17:03:28thomasahlelinkissue37682 messages
2019-07-25 17:03:28thomasahlecreate