classification
Title: allow weights in random.choice
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Christian.Kleineidam, aisaac, dkorchem, madison.may, mark.dickinson, neil.g, pitrou, rhettinger, serhiy.storchaka, tim.peters, westley.martinez, xksteven
Priority: normal Keywords: needs review, patch

Created on 2013-08-26 17:23 by aisaac, last changed 2016-04-07 16:25 by xksteven.

Files
File name Uploaded Description Edit
weighted_choice.diff madison.may, 2013-08-26 23:25 Preliminary implementation of random.choice optional arg "weights" review
weighted_choice_v2.diff madison.may, 2013-09-01 14:47 Move cumulative distribution calculation to separate function that returns an index generator review
weighted_choice_generator.patch serhiy.storchaka, 2013-09-11 22:29 review
wcg_bench.py serhiy.storchaka, 2013-09-12 20:32 Benchmarking different methods
weighted_choice_generator_2.patch serhiy.storchaka, 2014-08-10 08:48 review
weighted_choice_v3.diff xksteven, 2016-03-29 20:39 a new implementation of weighted choice
weighted_choice_v3.patch xksteven, 2016-03-29 21:04 weighted choice function
weighted_choice_v4.patch xksteven, 2016-04-06 22:49
weighted_choice_v5.patch xksteven, 2016-04-07 16:25 weighted choice function v5
Messages (52)
msg196229 - (view) Author: Alan Isaac (aisaac) Date: 2013-08-26 17:23
The need for weighted random choices is so common that it is addressed as a "common task" in the docs:
http://docs.python.org/dev/library/random.html

This enhancement request is to add an optional argument to random.choice, which must be a sequence of non-negative numbers (the weights) having the same length as the main argument.
msg196234 - (view) Author: Madison May (madison.may) * Date: 2013-08-26 18:55
+1. I've found myself in need of this feature often enough to wonder why it's not part of the stdlib.
msg196235 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-26 19:09
Agreed with the feature request. The itertools dance won't be easy to understand, for many people.
msg196252 - (view) Author: Madison May (madison.may) * Date: 2013-08-26 23:25
I realize its probably quite early to begin putting a patch together, but here's some preliminary code for anyone interested.  It builds off of the "common task" example in the docs and adds in validation for the weights list.

There are a few design decisions I'd like to hash out.  
In particular: 

  - Should negative weights cause a ValueError to be raised, or should they be converted to 0s?
  - Should passing a list full of zeros as the weights arg raise a ValueError or be treated as if no weights arg was passed?
msg196551 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-08-30 14:43
[Madison May]
>  - Should negative weights cause a ValueError to be raised, or should they be converted to 0s?
>  - Should passing a list full of zeros as the weights arg raise a ValueError or be treated as if no weights arg was passed?

Both those seem like clear error conditions to me, though I think it would be fine if the second condition produced a ZeroDivisionError rather than a ValueError.

I'm not 100% sold on the feature request.  For one thing, the direct implementation is going to be inefficient for repeated sampling, building the table of cumulative sums each time random.choice is called.  A more efficient approach for many use-cases would do the precomputation once, returning some kind of 'distribution' object from which samples can be generated.  (Walker's aliasing method is one route for doing this efficiently, though there are others.)  I agree that this is a commonly needed and commonly requested operation;  I'm just not convinced either that an efficient implementation fits well into the random module, or that it makes sense to add an inefficient implementation.
msg196567 - (view) Author: Madison May (madison.may) * Date: 2013-08-30 18:16
[Mark Dickinson]
> Both those seem like clear error conditions to me, though I think it would be fine if the second condition produced a ZeroDivisionError rather than a ValueError.

Yeah, in hindsight it makes sense that both of those conditions should raise errors.  After all: "Explicit is better than implicit".

As far as optimization goes, could we potentially use functools.lru_cache to cache the cumulative distribution produced by the weights argument and optimize repeated sampling? 

Without @lru_cache:
>>> timeit.timeit("x = choice(list(range(100)), list(range(100)))", setup="from random import choice", number=100000)
36.7109281539997

With @lru_cache(max=128):
>>> timeit.timeit("x = choice(list(range(100)), list(range(100)))", setup="from random import choice", number=100000)
6.6788657720007905

Of course it's a contrived example, but you get the idea.

Walker's aliasing method looks intriguing.  I'll have to give it a closer look.  

I agree that an efficient implementation would be preferable but would feel out of place in random because of the return type.  I still believe a relatively inefficient addition to random.choice would be valuable, though.
msg196709 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-09-01 05:38
+1 for the overall idea.  I'll take a detailed look at the patch when I get a chance.
msg196711 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-09-01 05:43
The sticking point is going to be that we don't want to recompute the cumulative weights for every call to weighted_choice.

So there should probably be two functions:

  cw = make_cumulate_weights(weight_list) 
  x = choice(choice_list, cw)

This is similar to what was done with string.maketrans() and str.translate().
msg196716 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-01 08:43
> A more efficient approach for many use-cases would do the precomputation once, returning some kind of 'distribution' object from which samples can be generated.

I like the idea about adding a family of distribution generators. They should check input parameters and make a precomputation and then generate infinite sequence of specially distributed random numbers.
msg196721 - (view) Author: Madison May (madison.may) * Date: 2013-09-01 14:37
[Raymond Hettinger]
> The sticking point is going to be that we don't want to recompute the 
> cumulative weights for every call to weighted_choice.

> So there should probably be two functions:

>  cw = make_cumulate_weights(weight_list) 
>  x = choice(choice_list, cw)

That's pretty much how I broke things up when I decided to test out optimization with lru_cache.  That version of the patch is now attached.

[Serhiy Storchaka]
> I like the idea about adding a family of distribution generators. 
> They should check input parameters and make a precomputation and then > generate infinite sequence of specially distributed random numbers.

Would these distribution generators be implemented internally (see attached patch) or publicly exposed?
msg196728 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-01 19:16
> Would these distribution generators be implemented internally (see attached patch) or publicly exposed?

See issue18900. Even if this proposition will be rejected I think we should publicly expose weighted choice_generator(). A generator or a builder which returns function are only ways how efficiently implement this feature. Use lru_cache isn't good because several choice generators can be used in a program and because it left large data in a cache long time after it was used.
msg196731 - (view) Author: Madison May (madison.may) * Date: 2013-09-01 19:47
> Use lru_cache isn't good because several choice generators can be used in a program and because it left large data in a cache long time after it was used.

Yeah, I just did a quick search of the stdlib and only found one instance of lru_cache in use -- another sign that lru_cache is a bad choice.
msg196741 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-09-01 21:17
> I like the idea about adding a family of distribution generators

Let's stay focused on the OP's feature request for a weighted version of choice().

For the most part, it's not a good idea to "just add" a family of anything to the standard library.  We wait for user requests and use cases to guide the design and error on the side of less, rather than more.  This helps avoid bloat.   Also, it would be a good idea to start something like this as a third-party to module to let it iterate and mature before deciding whether there was sufficient user uptake to warrant inclusion in the standard library.

For the current request, we should also do some research on existing solutions in other languages.  This isn't new territory.  What do R, SciPy, Fortran, Matlab or other statistical packages already do?  Their experiences can be used to inform our design.  Alan Kay's big criticism of Python developers is that they have a strong propensity invent from scratch rather than taking advantage of the mountain of work done by the developers who came before them.
msg196750 - (view) Author: Madison May (madison.may) * Date: 2013-09-01 23:08
> What do R, SciPy, Fortran, Matlab or other statistical packages already do? 

Numpy avoids recalculating the cumulative distribution by introducing a 'size' argument to numpy.random.choice().  The cumulative distribution is calculated once, then 'size' random choices are generated and returned.

Their overall implementation is quite similar to the method suggested in the python docs.  

>>> choices, weights = zip(*weighted_choices)
>>> cumdist = list(itertools.accumulate(weights))
>>> x = random.random() * cumdist[-1]
>>> choices[bisect.bisect(cumdist, x)]

The addition of a 'size' argument to random.choice() has already been discussed (and rejected) in Issue18414, but this was on the grounds that the standard idiom for generating a list of random choices ([random.choice(seq) for i in range(k)]) is obvious and efficient.
msg196761 - (view) Author: Westley Martínez (westley.martinez) * Date: 2013-09-02 00:31
Honestly, I think adding weights to any of the random functions are trivial enough to implement as is.  Just because something becomes a common task does not mean it ought to be added to the stdlib.

Anyway, from a user point of view, I think it'd be useful to be able to send a sequence to a function that'll weight the sequence for use by random.
msg196767 - (view) Author: Madison May (madison.may) * Date: 2013-09-02 03:00
Just ran across a great blog post on the topic of weighted random generation from Eli Bendersky for anyone interested:
http://eli.thegreenplace.net/2010/01/22/weighted-random-generation-in-python/
msg197507 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-11 22:29
The proposed patch add two methods to the Random class and two module level functions: weighted_choice() and weighted_choice_generator().

weighted_choice(data) accepts either mapping or sequence and returns a key or index x with probability which is proportional to data[x].

If you need several elements with same distribution, use weighted_choice_generator(data) which returns an iterator which produces random keys or indices of the data. It is more faster than calling weighted_choice(data) repeatedly and is more flexible than generating a list of random values at specified size (as in NumPy).
msg197512 - (view) Author: Neil Girdhar (neil.g) * Date: 2013-09-12 02:57
Should this really be implemented using the cumulative distribution and binary search algorithm?  Vose's Alias Method has the same initialization and memory usage cost (O(n)), but is constant time to generate each sample.

An excellent tutorial is here: http://www.keithschwarz.com/darts-dice-coins/
msg197540 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-12 20:32
Thank you Neil. It is interesting.

Vose's alias method has followed disadvantages (in comparison with the roulette wheel selection proposed above):

1. It operates with probabilities and uses floats, therefore it can be a little less accurate.

2. It consumes two random number (an integer and a float) for generating one sample. It can be fixed however (in the cost of additional precision lost).

3. While it has same time and memory O(n) cost for initialization, it has larger multiplication, Vose's alias method requires several times larger time and memory for initialization.

4. It requires more memory in process of generating samples.

However it has an advantage. It really has constant time cost to generate each sample.

Here are some benchmark results. "Roulette Wheel" is proposed above implementation. "Roulette Wheel 2" is its modification with normalized cumulative sums. It has twice more initialization time, but 1.5-2x faster generates each sample. "Vose's Alias" is an implementation of Vose's alias method directly translated from Java. "Vose's Alias 2" is optimized implementation which uses Python specific.

Second column is a size of distribution, third column is initialization time (in milliseconds), fourth column is time to generate each sample (in microseconds), fifth column is a number of generated samples after which this method will overtake "Roulette Wheel" (including initialization time).

Roulette Wheel          10     0.059     7.165         0
Roulette Wheel 2        10     0.076     4.105         5
Vose's Alias            10     0.129    13.206         -
Vose's Alias 2          10     0.105     6.501        69
Roulette Wheel         100     0.128     8.651         0
Roulette Wheel 2       100     0.198     4.630        17
Vose's Alias           100     0.691    12.839         -
Vose's Alias 2         100     0.441     6.547       148
Roulette Wheel        1000     0.719    10.949         0
Roulette Wheel 2      1000     1.458     5.177       128
Vose's Alias          1000     6.614    13.052         -
Vose's Alias 2        1000     3.704     6.531       675
Roulette Wheel       10000     7.495    13.249         0
Roulette Wheel 2     10000    14.961     6.051      1037
Vose's Alias         10000    69.937    13.830         -
Vose's Alias 2       10000    37.017     6.746      4539
Roulette Wheel      100000    73.988    16.180         0
Roulette Wheel 2    100000   148.176     8.182      9275
Vose's Alias        100000   690.099    13.808    259716
Vose's Alias 2      100000   391.367     7.095     34932
Roulette Wheel     1000000   743.415    19.493         0
Roulette Wheel 2   1000000  1505.409     8.930     72138
Vose's Alias       1000000  7017.669    13.798   1101673
Vose's Alias 2     1000000  4044.746     7.152    267507

As you can see Vose's alias method has very large initialization time. Non-optimized version will never overtake "Roulette Wheel" with small distributions (<100000), and even optimized version will never overtake "Roulette Wheel" with small distributions (<100000). Only with very large distributions Vose's alias method has an advantage (when you needs very larger number of samples).

Because for generating only one sample we need a method with fastest initialization we need "Roulette Wheel" implementation. And because large distributions are rare, I think there is no need in alternative implementation. In worst case for generating 1000000 samples from 1000000-elements distribution the difference between "Roulette Wheel" and "Vose's Alias 2" is a difference between 20 and 11 seconds.
msg197862 - (view) Author: Madison May (madison.may) * Date: 2013-09-16 02:20
Serhiy, from a technical standpoint, your latest patch looks like a solid solution.  From an module design standpoint we still have a few options to think through, though. What if random.weighted_choice_generator was moved to random.choice_generator and refactored to take an array of weights as an optional argument?  Likewise, random.weighted_choice could still be implemented with an optional arg to random.choice.  Here's the pros and cons of each implementation as I see them.

Implementation: weighted_choice_generator + weighted_choice
Pros: 
Distinct functions help indicate that weighted_choice should be used in a different manner than choice -- [weighted_choice(x) for _ in range(n)] isn't efficient.
Can take Mapping or Sequence as argument.
Has a single parameter
Cons:
Key, not value, is returned
Requires two new functions
Dissimilar to random.choice
Long function name (weighted_choice_generator)

Implementation: choice_generator + optional arg to choice
Pros: 
Builds on existing code layout
Value returned directly
Only a single new function required
More compact function name

Cons:
Difficult to support Mappings
Two args required for choice_generator and random.choice
Users may use [choice(x, weights) for _ in range(n)] expecting efficient results
msg197865 - (view) Author: Westley Martínez (westley.martinez) * Date: 2013-09-16 04:00
I think Storchaka's solution is more transparent and I agree with him on the point that the choice generator should be exposed.
msg197866 - (view) Author: Madison May (madison.may) * Date: 2013-09-16 04:14
>  I think Storchaka's solution is more transparent and I agree with him on the point that the choice generator should be exposed.

Valid point -- transparency should be priority #1
msg198367 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-24 21:26
Most existing implementation produce just index. That is why weighted_choice() accepts singular weights list and returns index. On the other hand, I think working with mapping will be wished feature too (especially because Counter is in stdlib). Indexable sequences and mappings are similar. In both cases weighted_choice() returns value which can be used as index/key of input argument.

If you need choice an element from some sequence, just use seq[weighted_choice(weights)]. Actually weighted_choice() has no common code with choice() and has too different use cases. They should be dissimilar as far as possible. Perhaps we even should avoid the "choice" part in function names (are there any ideas?) to accent this.
msg198372 - (view) Author: Madison May (madison.may) * Date: 2013-09-24 22:36
You have me convinced, Serhiy.  I see the value in making the two functions distinct.

For naming purposes, perhaps weighted_index() would be more descriptive.
msg223750 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2014-07-23 17:21
Closed issue 22048 as a duplicate of this one.
msg224947 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-08-06 16:12
Raymond, what is your opinion?
msg224949 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-08-06 16:21
I don't want to speak for Raymond, but the proposed API looks good, and it seems "Roulette Wheel 2" should be the implementation choice given its characteristics (simple, reasonably good and balanced performance).
msg224953 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-08-06 17:26
"Roulette Wheel 2" has twice slower initializations than "Roulette Wheel", but then generates every new item twice faster.

It is possible to implement hybrid generator, which yields first item using "Roulette Wheel", and then rescales cumulative_dist and continues with "Roulette Wheel 2". It will be so fast as "Roulette Wheel" for generating only one item and so fast as "Roulette Wheel 2" for generating multiple items.
msg224954 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-08-06 17:52
The setup cost of RW2 should always be within a small constant multiplier of RW's, so I'm not sure it's worth the hassle to complicate things. But it's your patch :)
msg224957 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-08-06 18:25
Non-generator weighted_choice() function is purposed to produce exactly one item. This is a use case for such optimization.
msg225128 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-08-10 08:48
Updated patch. Synchronized with tip and added optimizations.
msg225133 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-08-10 09:45
I'm adverse to adding the generator magic and the level of complexity in this patch.  Please aim for the approach I outlined above (one function to build cumulative weights and another function to choose the value).

Since this isn't a new problem, please take a look at how other languages and packages have solved the problem.
msg225137 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-08-10 10:32
Other languages have no such handly feature as generators. NumPy provides the size parameter to all functions and generates a bunch of random numbers at time. This doesn't look pythonic (Python stdlib prefers iterators).

I believe a generator is most Pythonic and most handly solution of this issue on Python. And it is efficient enough.
msg225140 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-08-10 14:06
I agree with Serhiy. There is nothing "magic" about generators in Python. Also, the concept of an infinite stream of random numbers (or random whatevers) is perfectly common (/dev/urandom being an obvious example); it is not a notion we are inventing.

By contrast, the two-function approach only makes things clumsier for people since they have to remember to combine them.
msg225148 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-08-10 17:26
When I get a chance, I'll work up an approach that is consistent with the rest of the module in terms of implementation, restartability, and API.
msg226891 - (view) Author: Christian Kleineidam (Christian.Kleineidam) Date: 2014-09-15 00:39
I like the idea of adding a weights keyword to choice and creating an additional choice_generator() that also takes weights.

A choice_generator() could take a further argument to allow unique choices and be a generator version of sample().

In some cases you want to draw randomly from a sequence till you draw an item that fulfills certain criteria. At the moment neither the sample nor the shuffle method seems to be optimal for that use case.

Given that items are commonly drawn from an urn in math, urn seems a good alternative for choice_generator().

random.urn(data, *, weights=None, unique=False)
msg262625 - (view) Author: Steven Basart (xksteven) * Date: 2016-03-29 20:39
Reopen this idea but removing the generator from weighted choice.
msg262626 - (view) Author: Steven Basart (xksteven) * Date: 2016-03-29 21:04
The entire function of weighted choice. I removed the generator and replaced it by adding an optional argument to specify an amount by which you want to call this function.
msg262642 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-03-30 01:15
Thanks for the patch.   Can I get you to fill out a contributor agreement?
msg262649 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-03-30 05:02
What is wrong with generators?
msg262652 - (view) Author: Steven Basart (xksteven) * Date: 2016-03-30 07:11
Hello rhettinger.  I filled out the form thanks for letting me know about it.  Is there anything else I have to do?

Hey serhiy.storchaka

There were several things "wrong" with the previous implementation in my opinion. 

1st they tried to add too much.  Which would if allowed would clutter up the random library if every function had both it's implementation as well as an accompanied generator.  The other problem being that both were attempted to be made as callable to the public API.  I would prefer the generator if present to be hidden and would also have to be more sophisticated to be able to check if it was being called with new input.

2nd by adding in the generator to the pulbic API of the random library it makes it far more confusing and obfuscates the true purpose of this function anyways which is to get a weighted choice.  

So basically there is nothing wrong with generators but they don't necessarily belong here so I removed it to try to get back to the core principles of what the function should be doing, by making it simpler.
msg262656 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-03-30 08:56
I disagree.

My patch adds two functions because they serve two different purposes. weighted_choice() returns one random value as other functions in the random module. weighted_choice_generator() provides more efficient way to generate random values, since startup cost is more significant than for other random value generators. Generators are widely used in Python, especially in Python 3. If they considered confusing, we should deprecate builtins map(), filter(), zip() and the itertools module at first place.

Your function, Steven, returns a list containing one random value by default. It does not match the interface of other functions in the random module. It matches the interface of NumPy random module. In Python you need two separate functions, one that returns single value,  and other that returns a list of values. But returning iterator and generating values by demand is more preferable in Python 3. Generatorsa are more flexible. With weighted_choice_generator() it is easy to get the result of your function: list(islice(weighted_choice_generator(data), amount)). But generating dynamic amount of values with your interface is impossible.

Raymond, if you have now free time, could you please make a review of weighted_choice_generator_2.patch?
msg262678 - (view) Author: Steven Basart (xksteven) * Date: 2016-03-30 18:48
Hey serhiy.storchaka

I can edit the code to output just one value if called with simply a list and then return a list of values if called with the optional amount parameter.  My code also needs to check that amount >= 1.  

My code was mostly just to restart this discussion as I personally like the idea of the function for weighted choice and would like it to be standard in the random library. 

I have no qualms with adding both weighted_choice and weighted_choice_generator but my concern is mostly that you are asking too much and it won't go through by trying to add two functions at the same time.  The other thing is that I believe that weighted_choice could suffice with just one function call.

I just think my last concern is that generators are different from the other functions in random.py.  Whereas they are more intuitive and accepted in the builtins like map and zip etc.  There isn't any other functions in the random library that return that type of object when called. They instead return a numerical result.  

Those are my concerns and hence why I rewrote the code.
msg262744 - (view) Author: Christian Kleineidam (Christian.Kleineidam) Date: 2016-04-01 15:43
A user can use map(), filter(), zip() without knowing anything about generators. In most cases those function will do their magic and provide a finite number of outputs. 

The weighted_choice_generator on the other hand isn't as easy to use. If the user wants 5 values from it, they need to know about `take()` from itertools or call `next()`.
msg262967 - (view) Author: Westley Martínez (westley.martinez) * Date: 2016-04-06 22:11
I still like Serhiy's implementation more. A function that returns a list instead of the item is unnatural and doesn't fit with the rest of the module.

I think there's need to be some discussion about use cases. What do users actually want? Maybe post this on the ideas list.
msg262970 - (view) Author: Steven Basart (xksteven) * Date: 2016-04-06 22:46
Okay so I added a few lines of code.  One to make it return a single number if amount == 1 and the other to check that the amount > 1.

The main difference I've noticed between this implementation and previous versions compared to say R is that in R they provide a boolean flag to ask if sampling with replacement.

Here's there documentation and source code:
https://github.com/wch/r-source/blob/e5b21d0397c607883ff25cca379687b86933d730/src/library/base/man/sample.Rd

https://github.com/wch/r-source/blob/e5b21d0397c607883ff25cca379687b86933d730/src/library/base/R/sample.R

Maybe someone else can comment more on the use cases.  I can only say for myself that I've needed this function plenty of times when working with samples that have a non uniform distribution.
msg262971 - (view) Author: Steven Basart (xksteven) * Date: 2016-04-06 22:49
I reuploaded the file.  The spacing on the if amount < 1 was off.  Hopefully its fixed now.
msg262981 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-04-07 07:21
> One to make it return a single number if amount == 1 and the other to check that the amount > 1.

I think that's a dangerous API. Any code making a call to "weighted_choice(..., amount=n)" for variable n now has to be prepared to deal with two possible result types. It would be easy to introduce buggy code that fails in the corner case n = 1.
msg262982 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-04-07 07:43
> One to make it return a single number if amount == 1 and the other to check that the amount > 1.

Suggestion: if you want to go that way, return a single number if `amount` is not provided (so make the default value for `amount` None rather than 1). If `amount=1` is explicitly given, a list containing one item should be returned.

I also think there's no reason to raise an exception when `amount = 0`: just return an empty list.

For comparison, here's NumPy's "uniform" generator, which generates a scalar if the "size" parameter is not given, and an array if "size" is given, even if it's 1.

>>> np.random.uniform()
0.4964992470265117
>>> np.random.uniform(size=1)
array([ 0.64817717])
>>> np.random.uniform(size=0)
array([], dtype=float64)
msg262983 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2016-04-07 07:47
> Suggestion: if you want to go that way, return a single number if `amount` is not provided (so make the default value for `amount` None rather than 1). If `amount=1` is explicitly given, a list containing one item should be returned.

+1
msg262994 - (view) Author: Steven Basart (xksteven) * Date: 2016-04-07 16:23
Re-implemented with suggested improvements taken into account. Thanks @mark.dickinson and @pitrou for the suggestions.  

I also removed the redundant "fast path" portion for this code since it doesn't deal with generators anyways.

Let me know additional thoughts about it.
msg262995 - (view) Author: Steven Basart (xksteven) * Date: 2016-04-07 16:25
Left in a line of code that was supposed to be removed. Fixed.
History
Date User Action Args
2016-04-07 16:25:19xkstevensetfiles: + weighted_choice_v5.patch

messages: + msg262995
2016-04-07 16:24:47xkstevensetfiles: - weighted_choice_v5.patch
2016-04-07 16:23:32xkstevensetfiles: + weighted_choice_v5.patch

messages: + msg262994
2016-04-07 07:47:20pitrousetmessages: + msg262983
2016-04-07 07:43:46mark.dickinsonsetmessages: + msg262982
2016-04-07 07:21:09mark.dickinsonsetmessages: + msg262981
2016-04-06 22:49:33xkstevensetfiles: + weighted_choice_v4.patch

messages: + msg262971
2016-04-06 22:48:05xkstevensetfiles: - weighted_choice_v4.patch
2016-04-06 22:46:02xkstevensetfiles: + weighted_choice_v4.patch

messages: + msg262970
2016-04-06 22:11:45westley.martinezsetmessages: + msg262967
2016-04-01 15:43:11Christian.Kleineidamsetmessages: + msg262744
2016-03-30 18:48:14xkstevensetmessages: + msg262678
2016-03-30 08:56:22serhiy.storchakasetkeywords: + needs review

messages: + msg262656
versions: + Python 3.6, - Python 3.5
2016-03-30 07:11:55xkstevensetmessages: + msg262652
2016-03-30 05:02:04serhiy.storchakasetmessages: + msg262649
2016-03-30 01:15:24rhettingersetmessages: + msg262642
2016-03-29 21:04:47xkstevensetfiles: + weighted_choice_v3.patch

messages: + msg262626
2016-03-29 20:39:37xkstevensetfiles: + weighted_choice_v3.diff
nosy: + xksteven
messages: + msg262625

2014-09-15 00:39:05Christian.Kleineidamsetnosy: + Christian.Kleineidam
messages: + msg226891
2014-08-10 17:26:08rhettingersetmessages: + msg225148
2014-08-10 14:06:02pitrousetmessages: + msg225140
2014-08-10 10:32:42serhiy.storchakasetmessages: + msg225137
2014-08-10 09:45:12rhettingersetmessages: + msg225133
2014-08-10 08:48:30serhiy.storchakasetfiles: + weighted_choice_generator_2.patch

messages: + msg225128
2014-08-06 18:25:02serhiy.storchakasetmessages: + msg224957
2014-08-06 17:52:11pitrousetmessages: + msg224954
2014-08-06 17:27:00serhiy.storchakasetmessages: + msg224953
2014-08-06 16:21:49pitrousetmessages: + msg224949
2014-08-06 16:12:45serhiy.storchakasetmessages: + msg224947
versions: + Python 3.5, - Python 3.4
2014-07-23 17:22:32mark.dickinsonsetnosy: + dkorchem
2014-07-23 17:21:10mark.dickinsonsetmessages: + msg223750
2014-07-23 17:20:44mark.dickinsonlinkissue22048 superseder
2013-09-24 22:36:07madison.maysetmessages: + msg198372
2013-09-24 21:26:04serhiy.storchakasetmessages: + msg198367
2013-09-16 04:14:48madison.maysetmessages: + msg197866
2013-09-16 04:00:25westley.martinezsetmessages: + msg197865
2013-09-16 02:20:10madison.maysetmessages: + msg197862
2013-09-13 13:38:53eli.benderskysetnosy: - eli.bendersky
2013-09-12 20:32:18serhiy.storchakasetfiles: + wcg_bench.py

messages: + msg197540
2013-09-12 02:57:46neil.gsetnosy: + neil.g
messages: + msg197512
2013-09-11 22:29:56serhiy.storchakasetstage: patch review
2013-09-11 22:29:41serhiy.storchakasetfiles: + weighted_choice_generator.patch

messages: + msg197507
2013-09-02 03:00:51madison.maysetnosy: + eli.bendersky
messages: + msg196767
2013-09-02 00:31:23westley.martinezsetmessages: + msg196761
2013-09-01 23:08:54madison.maysetmessages: + msg196750
2013-09-01 21:17:48rhettingersetmessages: + msg196741
2013-09-01 19:47:51madison.maysetmessages: + msg196731
2013-09-01 19:16:37serhiy.storchakasetmessages: + msg196728
2013-09-01 14:47:16madison.maysetfiles: + weighted_choice_v2.diff
2013-09-01 14:45:49madison.maysetfiles: - weighted_choice_v2.diff
2013-09-01 14:37:04madison.maysetfiles: + weighted_choice_v2.diff

messages: + msg196721
2013-09-01 08:43:40serhiy.storchakasetmessages: + msg196716
2013-09-01 05:50:01westley.martinezsetnosy: + westley.martinez
2013-09-01 05:43:42rhettingersetmessages: + msg196711
2013-09-01 05:38:28rhettingersetassignee: rhettinger
messages: + msg196709
2013-08-30 18:16:30madison.maysetmessages: + msg196567
2013-08-30 14:43:41mark.dickinsonsetnosy: + tim.peters
messages: + msg196551
2013-08-26 23:25:58madison.maysetfiles: + weighted_choice.diff
keywords: + patch
messages: + msg196252
2013-08-26 19:09:04pitrousetnosy: + pitrou
messages: + msg196235
2013-08-26 18:55:35madison.maysetnosy: + madison.may
messages: + msg196234
2013-08-26 17:47:20serhiy.storchakasetnosy: + rhettinger, mark.dickinson, serhiy.storchaka

components: + Library (Lib)
versions: + Python 3.4
2013-08-26 17:23:45aisaaccreate