Message197540
Thank you Neil. It is interesting.
Vose's alias method has followed disadvantages (in comparison with the roulette wheel selection proposed above):
1. It operates with probabilities and uses floats, therefore it can be a little less accurate.
2. It consumes two random number (an integer and a float) for generating one sample. It can be fixed however (in the cost of additional precision lost).
3. While it has same time and memory O(n) cost for initialization, it has larger multiplication, Vose's alias method requires several times larger time and memory for initialization.
4. It requires more memory in process of generating samples.
However it has an advantage. It really has constant time cost to generate each sample.
Here are some benchmark results. "Roulette Wheel" is proposed above implementation. "Roulette Wheel 2" is its modification with normalized cumulative sums. It has twice more initialization time, but 1.5-2x faster generates each sample. "Vose's Alias" is an implementation of Vose's alias method directly translated from Java. "Vose's Alias 2" is optimized implementation which uses Python specific.
Second column is a size of distribution, third column is initialization time (in milliseconds), fourth column is time to generate each sample (in microseconds), fifth column is a number of generated samples after which this method will overtake "Roulette Wheel" (including initialization time).
Roulette Wheel 10 0.059 7.165 0
Roulette Wheel 2 10 0.076 4.105 5
Vose's Alias 10 0.129 13.206 -
Vose's Alias 2 10 0.105 6.501 69
Roulette Wheel 100 0.128 8.651 0
Roulette Wheel 2 100 0.198 4.630 17
Vose's Alias 100 0.691 12.839 -
Vose's Alias 2 100 0.441 6.547 148
Roulette Wheel 1000 0.719 10.949 0
Roulette Wheel 2 1000 1.458 5.177 128
Vose's Alias 1000 6.614 13.052 -
Vose's Alias 2 1000 3.704 6.531 675
Roulette Wheel 10000 7.495 13.249 0
Roulette Wheel 2 10000 14.961 6.051 1037
Vose's Alias 10000 69.937 13.830 -
Vose's Alias 2 10000 37.017 6.746 4539
Roulette Wheel 100000 73.988 16.180 0
Roulette Wheel 2 100000 148.176 8.182 9275
Vose's Alias 100000 690.099 13.808 259716
Vose's Alias 2 100000 391.367 7.095 34932
Roulette Wheel 1000000 743.415 19.493 0
Roulette Wheel 2 1000000 1505.409 8.930 72138
Vose's Alias 1000000 7017.669 13.798 1101673
Vose's Alias 2 1000000 4044.746 7.152 267507
As you can see Vose's alias method has very large initialization time. Non-optimized version will never overtake "Roulette Wheel" with small distributions (<100000), and even optimized version will never overtake "Roulette Wheel" with small distributions (<100000). Only with very large distributions Vose's alias method has an advantage (when you needs very larger number of samples).
Because for generating only one sample we need a method with fastest initialization we need "Roulette Wheel" implementation. And because large distributions are rare, I think there is no need in alternative implementation. In worst case for generating 1000000 samples from 1000000-elements distribution the difference between "Roulette Wheel" and "Vose's Alias 2" is a difference between 20 and 11 seconds. |
|
Date |
User |
Action |
Args |
2013-09-12 20:32:19 | serhiy.storchaka | set | recipients:
+ serhiy.storchaka, tim.peters, rhettinger, mark.dickinson, pitrou, eli.bendersky, aisaac, westley.martinez, NeilGirdhar, madison.may |
2013-09-12 20:32:18 | serhiy.storchaka | set | messageid: <1379017938.89.0.68957368481.issue18844@psf.upfronthosting.co.za> |
2013-09-12 20:32:18 | serhiy.storchaka | link | issue18844 messages |
2013-09-12 20:32:17 | serhiy.storchaka | create | |
|