Message373853
Other implementations aren't directly comparable, but I thought I would check to see what others were doing:
* Scikit-learn uses reservoir sampling but only when k / n > 0.99. Also, it requires a follow-on step to shuffle the selections.
* numpy does not use reservoir sampling.
* Julia's randsubseq() does not use reservoir sampling. The docs guarantee that, "Complexity is linear in p*length(A), so this function is efficient even if p is small and A is large." |
|
Date |
User |
Action |
Args |
2020-07-17 20:28:48 | rhettinger | set | recipients:
+ rhettinger, tim.peters, mark.dickinson, oscarbenjamin |
2020-07-17 20:28:48 | rhettinger | set | messageid: <1595017728.91.0.829399135384.issue41311@roundup.psfhosted.org> |
2020-07-17 20:28:48 | rhettinger | link | issue41311 messages |
2020-07-17 20:28:48 | rhettinger | create | |
|