Message 373853 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	rhettinger
Recipients	mark.dickinson, oscarbenjamin, rhettinger, tim.peters
Date	2020-07-17.20:28:48
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1595017728.91.0.829399135384.issue41311@roundup.psfhosted.org>
In-reply-to

Content
Other implementations aren't directly comparable, but I thought I would check to see what others were doing: * Scikit-learn uses reservoir sampling but only when k / n > 0.99. Also, it requires a follow-on step to shuffle the selections. * numpy does not use reservoir sampling. * Julia's randsubseq() does not use reservoir sampling. The docs guarantee that, "Complexity is linear in p*length(A), so this function is efficient even if p is small and A is large."

Other implementations aren't directly comparable, but I thought I would check to see what others were doing:

* Scikit-learn uses reservoir sampling but only when k / n > 0.99.  Also, it requires a follow-on step to shuffle the selections.

* numpy does not use reservoir sampling.

* Julia's randsubseq() does not use reservoir sampling.  The docs guarantee that, "Complexity is linear in p*length(A), so this function is efficient even if p is small and A is large."

History
Date	User	Action	Args
2020-07-17 20:28:48	rhettinger	set	recipients: + rhettinger, tim.peters, mark.dickinson, oscarbenjamin
2020-07-17 20:28:48	rhettinger	set	messageid: <1595017728.91.0.829399135384.issue41311@roundup.psfhosted.org>
2020-07-17 20:28:48	rhettinger	link	issue41311 messages
2020-07-17 20:28:48	rhettinger	create