Author serhiy.storchaka
Recipients aisaac, eli.bendersky, madison.may, mark.dickinson, neil.g, pitrou, rhettinger, serhiy.storchaka, tim.peters, westley.martinez
Date 2013-09-12.20:32:17
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1379017938.89.0.68957368481.issue18844@psf.upfronthosting.co.za>
In-reply-to
Content
Thank you Neil. It is interesting.

Vose's alias method has followed disadvantages (in comparison with the roulette wheel selection proposed above):

1. It operates with probabilities and uses floats, therefore it can be a little less accurate.

2. It consumes two random number (an integer and a float) for generating one sample. It can be fixed however (in the cost of additional precision lost).

3. While it has same time and memory O(n) cost for initialization, it has larger multiplication, Vose's alias method requires several times larger time and memory for initialization.

4. It requires more memory in process of generating samples.

However it has an advantage. It really has constant time cost to generate each sample.

Here are some benchmark results. "Roulette Wheel" is proposed above implementation. "Roulette Wheel 2" is its modification with normalized cumulative sums. It has twice more initialization time, but 1.5-2x faster generates each sample. "Vose's Alias" is an implementation of Vose's alias method directly translated from Java. "Vose's Alias 2" is optimized implementation which uses Python specific.

Second column is a size of distribution, third column is initialization time (in milliseconds), fourth column is time to generate each sample (in microseconds), fifth column is a number of generated samples after which this method will overtake "Roulette Wheel" (including initialization time).

Roulette Wheel          10     0.059     7.165         0
Roulette Wheel 2        10     0.076     4.105         5
Vose's Alias            10     0.129    13.206         -
Vose's Alias 2          10     0.105     6.501        69
Roulette Wheel         100     0.128     8.651         0
Roulette Wheel 2       100     0.198     4.630        17
Vose's Alias           100     0.691    12.839         -
Vose's Alias 2         100     0.441     6.547       148
Roulette Wheel        1000     0.719    10.949         0
Roulette Wheel 2      1000     1.458     5.177       128
Vose's Alias          1000     6.614    13.052         -
Vose's Alias 2        1000     3.704     6.531       675
Roulette Wheel       10000     7.495    13.249         0
Roulette Wheel 2     10000    14.961     6.051      1037
Vose's Alias         10000    69.937    13.830         -
Vose's Alias 2       10000    37.017     6.746      4539
Roulette Wheel      100000    73.988    16.180         0
Roulette Wheel 2    100000   148.176     8.182      9275
Vose's Alias        100000   690.099    13.808    259716
Vose's Alias 2      100000   391.367     7.095     34932
Roulette Wheel     1000000   743.415    19.493         0
Roulette Wheel 2   1000000  1505.409     8.930     72138
Vose's Alias       1000000  7017.669    13.798   1101673
Vose's Alias 2     1000000  4044.746     7.152    267507

As you can see Vose's alias method has very large initialization time. Non-optimized version will never overtake "Roulette Wheel" with small distributions (<100000), and even optimized version will never overtake "Roulette Wheel" with small distributions (<100000). Only with very large distributions Vose's alias method has an advantage (when you needs very larger number of samples).

Because for generating only one sample we need a method with fastest initialization we need "Roulette Wheel" implementation. And because large distributions are rare, I think there is no need in alternative implementation. In worst case for generating 1000000 samples from 1000000-elements distribution the difference between "Roulette Wheel" and "Vose's Alias 2" is a difference between 20 and 11 seconds.
History
Date User Action Args
2013-09-12 20:32:19serhiy.storchakasetrecipients: + serhiy.storchaka, tim.peters, rhettinger, mark.dickinson, pitrou, eli.bendersky, aisaac, westley.martinez, neil.g, madison.may
2013-09-12 20:32:18serhiy.storchakasetmessageid: <1379017938.89.0.68957368481.issue18844@psf.upfronthosting.co.za>
2013-09-12 20:32:18serhiy.storchakalinkissue18844 messages
2013-09-12 20:32:17serhiy.storchakacreate