Message316727
btw, this was my suggestion. Steven opened the issue on my behalf (I'm new).
1) Documentation change.
I would suggest that, to this paragraph:
"If neither weights nor cum_weights are specified, selections are made with equal probability. If a weights sequence is supplied, it must be the same length as the population sequence. It is a TypeError to specify both weights and cum_weights."
The following sentence be added:
"A cum_weights sequence, if supplied, must be in strictly-ascending order, else incorrect results will be (silently) returned."
[BTW, in the current documentation, when I read this sentence:
:For example, the relative weights [10, 5, 30, 5] are equivalent to the cumulative weights [10, 15, 45, 50]," it wasn't clear to me that this was the format of the cum_weights *argument*. I thought that this conversion happened internally. So, I'd prefer that something more explicit be stated (especially the part about silently giving bad results).]
2) I'm giving up on suggesting a code change. However, I'll just
respond that
a) I believe that the big win of the cum_weights option is for people who already have the sequence in that form, rather than that they will save the O(n) cost of having the list built.
b) If I have big list, but also an other-than-tiny k value (eg, k=100), then the time (with the change) would be 400 time the O(log n) plus one times O(n), so this may or may not be significant.
c) I agree that, if someone did, eg, 400 *separate* calls, each with k=1, the cost would be higher. This seems unlikely to me but...
thanks
Paul Czyzewski |
|
Date |
User |
Action |
Args |
2018-05-15 20:56:58 | PaulSFO | set | recipients:
+ PaulSFO, rhettinger, paul.moore, mark.dickinson, steven.daprano, serhiy.storchaka, miss-islington |
2018-05-15 20:56:58 | PaulSFO | set | messageid: <1526417818.58.0.682650639539.issue33494@psf.upfronthosting.co.za> |
2018-05-15 20:56:58 | PaulSFO | link | issue33494 messages |
2018-05-15 20:56:58 | PaulSFO | create | |
|