classification
Title: random.choice: raise IndexError on empty sequence even when not using getrandbits internally
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: rhettinger, selik, serhiy.storchaka, wolma
Priority: normal Keywords: patch

Created on 2018-04-01 20:04 by wolma, last changed 2018-04-05 16:24 by rhettinger. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 6338 merged wolma, 2018-04-01 20:13
PR 6387 merged miss-islington, 2018-04-05 15:20
PR 6388 merged miss-islington, 2018-04-05 15:20
Messages (10)
msg314787 - (view) Author: Wolfgang Maier (wolma) * Date: 2018-04-01 20:04
from https://docs.python.org/3/library/random.html#random.choice:

Return a random element from the non-empty sequence seq. If seq is empty, raises IndexError.

Indeed:
>>> import random
>>> random.choice([])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/wolma/cpython/random.py", line 259, in choice
    raise IndexError('Cannot choose from an empty sequence') from None
IndexError: Cannot choose from an empty sequence

but when not using getrandbits internally:
>>> class MyRandom(random.Random):
...     def random(self):
...         return super().random()
... 
>>> my_random=MyRandom()
>>> my_random.choice([])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/wolma/cpython/random.py", line 257, in choice
    i = self._randbelow(len(seq))
  File "/home/wolma/cpython/random.py", line 245, in _randbelow
    rem = maxsize % n
ZeroDivisionError: integer division or modulo by zero

This is because the ValueError that random.choice tries to catch gets raised only in the getrandbits-dependent branch of Random._randbelow, but not in the branch using only Random.random (even though Random._randbelow suggests uniform behaviour.
msg314835 - (view) Author: Michael Selik (selik) * Date: 2018-04-02 22:33
If you're going to tackle this problem, this should probably be solved for the general case of n <= 0 rather than just n == 0.

In [1]: import random

In [2]: class Random(random.Random):
   ...:     def random(self):
   ...:         return  super().random()
   ...:

In [3]: r = Random()

In [4]: r._randbelow(-1) # Should raise a ValueError
Out[4]: 0

But honestly, if someone is going to override ``random`` I think we can expect them to realize they're stepping into the deep end.
msg314837 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-04-02 22:43
FWIW, it was intended that the error checking (when required) occur at higher levels in the API rather than in the inner-most non-public utility function.  Some calls to _randbelow cannot be zero or negative, so they shouldn't have to pay an penalty for the extra error check.  A comment to this effect should be added but I don't think the design should be changed.
msg314840 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-04-02 23:01
I'll mull this over for while.  There is some merit in getting the various components to fit together more uniformly.  There's also the option of having choice() catch either a ZeroDivisionError or ValueError.
msg314869 - (view) Author: Wolfgang Maier (wolma) * Date: 2018-04-03 08:43
@rhettinger: the reason the ValueError gets raised correctly in the getrandbits-dependent branch is because getrandbits itself does a n<=0 check (in C for random.Random, in Python for random.SystemRandom).
So I thought the real reason why _randbelow does not perform the check was that it would be redundant, which it isn't when only random is used.

So instead of thinking about the suggested check as something that impacts performance (which certainly it does), I would rather see it as adding something into that second branch that was forgotten and gave that branch a performance benefit over the other one, which it never deserved.
It is also worthwhile to remember that any performance impact of this will only hit subclasses of random.Random that define random, but not getrandbits. Your alternative idea of having random.choice catch (ValueError, ZeroDivisionError) would affect random.Random and all subclasses.
If a subclass defines getrandbits, it needs to perform that check anyway  and, thinking about this more, I suggest that the requirement for any user-provided getrandbits to raise ValueError on k <= 0 should actually be added to the getrandbits docs.
msg314870 - (view) Author: Wolfgang Maier (wolma) * Date: 2018-04-03 08:50
@selik: it's true _randbelow doesn't work for negative numbers, but the difference is that both branches are affected, the docstring does not make any guarantees about it, and no public part of the random module is affected by this behavior. In addition, "fixing" _randbelow for negative input cannot be done without impacting performance of several public methods of random.Random so I don't think this should be done.
msg314957 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-04-04 22:50
The patch looks fine.  Once a news blurb is added, it can go forward.
msg314990 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-04-05 15:19
New changeset 091e95e9004b794280ab35becec2c3e30dd5e96e by Raymond Hettinger (Wolfgang Maier) in branch 'master':
bpo-33203: Ensure random.choice always raises IndexError on empty sequence (GH-6338)
https://github.com/python/cpython/commit/091e95e9004b794280ab35becec2c3e30dd5e96e
msg314991 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-04-05 16:02
New changeset baf304e82e1b54dbeee6b78ddf168e33ed8d557a by Raymond Hettinger (Miss Islington (bot)) in branch '3.7':
bpo-33203: Ensure random.choice always raises IndexError on empty sequence (GH-6338) (GH-6387)
https://github.com/python/cpython/commit/baf304e82e1b54dbeee6b78ddf168e33ed8d557a
msg314992 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-04-05 16:24
New changeset e25af930a300c01aa043745058a8c7f6c32d89ae by Raymond Hettinger (Miss Islington (bot)) in branch '3.6':
bpo-33203: Ensure random.choice always raises IndexError on empty sequence (GH-6338) (GH-6388)
https://github.com/python/cpython/commit/e25af930a300c01aa043745058a8c7f6c32d89ae
History
Date User Action Args
2018-04-05 16:24:53rhettingersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2018-04-05 16:24:29rhettingersetmessages: + msg314992
2018-04-05 16:02:14rhettingersetmessages: + msg314991
2018-04-05 15:20:53miss-islingtonsetpull_requests: + pull_request6097
2018-04-05 15:20:19miss-islingtonsetpull_requests: + pull_request6096
2018-04-05 15:19:54rhettingersetmessages: + msg314990
2018-04-04 22:50:02rhettingersetmessages: + msg314957
2018-04-04 22:49:30rhettingersetmessages: - msg314956
2018-04-04 22:49:16rhettingersetmessages: + msg314956
2018-04-03 08:50:00wolmasetmessages: + msg314870
2018-04-03 08:43:42wolmasetmessages: + msg314869
2018-04-02 23:01:09rhettingersetmessages: + msg314840
2018-04-02 22:43:06rhettingersetassignee: rhettinger
messages: + msg314837
2018-04-02 22:33:36seliksetnosy: + selik
messages: + msg314835
2018-04-01 20:13:45wolmasetkeywords: + patch
stage: patch review
pull_requests: + pull_request6050
2018-04-01 20:04:10wolmacreate