Author rhettinger
Recipients mark.dickinson, oscarbenjamin, rhettinger, tim.peters
Date 2020-07-17.20:28:48
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
Other implementations aren't directly comparable, but I thought I would check to see what others were doing:

* Scikit-learn uses reservoir sampling but only when k / n > 0.99.  Also, it requires a follow-on step to shuffle the selections.

* numpy does not use reservoir sampling.

* Julia's randsubseq() does not use reservoir sampling.  The docs guarantee that, "Complexity is linear in p*length(A), so this function is efficient even if p is small and A is large."
Date User Action Args
2020-07-17 20:28:48rhettingersetrecipients: + rhettinger, tim.peters, mark.dickinson, oscarbenjamin
2020-07-17 20:28:48rhettingersetmessageid: <>
2020-07-17 20:28:48rhettingerlinkissue41311 messages
2020-07-17 20:28:48rhettingercreate