Message348474
> ISTM that if a generator produces so much data that it is infeasible to fit in memory, then it will also take a long time to loop over it and generate a random value for each entry.
Good point!
$ ./python -m timeit -s 'from random import sample as s' 's(range(10**6), 50)'
10000 loops, best of 5: 25.6 usec per loop
$ ./python -m timeit -s 'from random import sample as s' 's(list(range(10**6)), 50)'
10 loops, best of 5: 31.5 msec per loop
$ ./python -m timeit -s 'from random import reservoir_sample as s' 's(range(10**6), 50)'
1 loop, best of 5: 328 msec per loop
$ ./python -m timeit -s 'from random import sample as s' 's(range(10**8), 50)'
10000 loops, best of 5: 26.9 usec per loop
$ ./python -m timeit -s 'from random import sample as s' 's(list(range(10**8)), 50)'
1 loop, best of 5: 3.41 sec per loop
$ ./python -m timeit -s 'from random import reservoir_sample as s' 's(range(10**8), 50)'
1 loop, best of 5: 36.5 sec per loop
It is possible that a generator produces not so much data, but every item takes much memory so the total size does not fit in memory. But I suppose that the generation time of larger items will be proportionally larger, so reservoir_sample() will be just as slow. |
|
Date |
User |
Action |
Args |
2019-07-26 07:22:58 | serhiy.storchaka | set | recipients:
+ serhiy.storchaka, rhettinger, mark.dickinson, thomasahle |
2019-07-26 07:22:58 | serhiy.storchaka | set | messageid: <1564125778.39.0.257095861312.issue37682@roundup.psfhosted.org> |
2019-07-26 07:22:58 | serhiy.storchaka | link | issue37682 messages |
2019-07-26 07:22:58 | serhiy.storchaka | create | |
|