This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author rhettinger
Recipients mark.dickinson, oscarbenjamin, rhettinger, tim.peters
Date 2020-07-17.20:28:48
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1595017728.91.0.829399135384.issue41311@roundup.psfhosted.org>
In-reply-to
Content
Other implementations aren't directly comparable, but I thought I would check to see what others were doing:

* Scikit-learn uses reservoir sampling but only when k / n > 0.99.  Also, it requires a follow-on step to shuffle the selections.

* numpy does not use reservoir sampling.

* Julia's randsubseq() does not use reservoir sampling.  The docs guarantee that, "Complexity is linear in p*length(A), so this function is efficient even if p is small and A is large."
History
Date User Action Args
2020-07-17 20:28:48rhettingersetrecipients: + rhettinger, tim.peters, mark.dickinson, oscarbenjamin
2020-07-17 20:28:48rhettingersetmessageid: <1595017728.91.0.829399135384.issue41311@roundup.psfhosted.org>
2020-07-17 20:28:48rhettingerlinkissue41311 messages
2020-07-17 20:28:48rhettingercreate