random.choice: raise IndexError on empty sequence even when not using getrandbits internally #77384

wm75 · 2018-04-01T20:04:11Z

BPO	33203
Nosy	@rhettinger, @serhiy-storchaka, @wm75, @selik
PRs	bpo-33203: Ensure random.choice always raises IndexError on empty sequence #6338 [3.7] bpo-33203: Ensure random.choice always raises IndexError on empty sequence (GH-6338) #6387 [3.6] bpo-33203: Ensure random.choice always raises IndexError on empty sequence (GH-6338) #6388

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/rhettinger'
closed_at = <Date 2018-04-05.16:24:53.997>
created_at = <Date 2018-04-01.20:04:10.759>
labels = ['3.7', '3.8', 'type-bug', 'library']
title = 'random.choice: raise IndexError on empty sequence even when not using getrandbits internally'
updated_at = <Date 2018-04-05.16:24:53.997>
user = 'https://github.com/wm75'

bugs.python.org fields:

activity = <Date 2018-04-05.16:24:53.997>
actor = 'rhettinger'
assignee = 'rhettinger'
closed = True
closed_date = <Date 2018-04-05.16:24:53.997>
closer = 'rhettinger'
components = ['Library (Lib)']
creation = <Date 2018-04-01.20:04:10.759>
creator = 'wolma'
dependencies = []
files = []
hgrepos = []
issue_num = 33203
keywords = ['patch']
message_count = 10.0
messages = ['314787', '314835', '314837', '314840', '314869', '314870', '314957', '314990', '314991', '314992']
nosy_count = 4.0
nosy_names = ['rhettinger', 'serhiy.storchaka', 'wolma', 'selik']
pr_nums = ['6338', '6387', '6388']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue33203'
versions = ['Python 3.6', 'Python 3.7', 'Python 3.8']

wm75 · 2018-04-01T20:04:11Z

from https://docs.python.org/3/library/random.html#random.choice:

Return a random element from the non-empty sequence seq. If seq is empty, raises IndexError.

Indeed:
>>> import random
>>> random.choice([])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/wolma/cpython/random.py", line 259, in choice
    raise IndexError('Cannot choose from an empty sequence') from None
IndexError: Cannot choose from an empty sequence

but when not using getrandbits internally:
>>> class MyRandom(random.Random):
...     def random(self):
...         return super().random()
... 
>>> my_random=MyRandom()
>>> my_random.choice([])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/wolma/cpython/random.py", line 257, in choice
    i = self._randbelow(len(seq))
  File "/home/wolma/cpython/random.py", line 245, in _randbelow
    rem = maxsize % n
ZeroDivisionError: integer division or modulo by zero

This is because the ValueError that random.choice tries to catch gets raised only in the getrandbits-dependent branch of Random._randbelow, but not in the branch using only Random.random (even though Random._randbelow suggests uniform behaviour.

selik · 2018-04-02T22:33:36Z

If you're going to tackle this problem, this should probably be solved for the general case of n <= 0 rather than just n == 0.

In [1]: import random

In [2]: class Random(random.Random):
...: def random(self):
...: return super().random()
...:

In [3]: r = Random()

In [4]: r._randbelow(-1) # Should raise a ValueError
Out[4]: 0

But honestly, if someone is going to override random I think we can expect them to realize they're stepping into the deep end.

rhettinger · 2018-04-02T22:43:06Z

FWIW, it was intended that the error checking (when required) occur at higher levels in the API rather than in the inner-most non-public utility function. Some calls to _randbelow cannot be zero or negative, so they shouldn't have to pay an penalty for the extra error check. A comment to this effect should be added but I don't think the design should be changed.

rhettinger · 2018-04-02T23:01:09Z

I'll mull this over for while. There is some merit in getting the various components to fit together more uniformly. There's also the option of having choice() catch either a ZeroDivisionError or ValueError.

wm75 · 2018-04-03T08:43:43Z

@rhettinger: the reason the ValueError gets raised correctly in the getrandbits-dependent branch is because getrandbits itself does a n<=0 check (in C for random.Random, in Python for random.SystemRandom).
So I thought the real reason why _randbelow does not perform the check was that it would be redundant, which it isn't when only random is used.

So instead of thinking about the suggested check as something that impacts performance (which certainly it does), I would rather see it as adding something into that second branch that was forgotten and gave that branch a performance benefit over the other one, which it never deserved.
It is also worthwhile to remember that any performance impact of this will only hit subclasses of random.Random that define random, but not getrandbits. Your alternative idea of having random.choice catch (ValueError, ZeroDivisionError) would affect random.Random and all subclasses.
If a subclass defines getrandbits, it needs to perform that check anyway and, thinking about this more, I suggest that the requirement for any user-provided getrandbits to raise ValueError on k <= 0 should actually be added to the getrandbits docs.

wm75 · 2018-04-03T08:50:00Z

@selik: it's true _randbelow doesn't work for negative numbers, but the difference is that both branches are affected, the docstring does not make any guarantees about it, and no public part of the random module is affected by this behavior. In addition, "fixing" _randbelow for negative input cannot be done without impacting performance of several public methods of random.Random so I don't think this should be done.

rhettinger · 2018-04-04T22:50:02Z

The patch looks fine. Once a news blurb is added, it can go forward.

rhettinger · 2018-04-05T15:19:54Z

New changeset 091e95e by Raymond Hettinger (Wolfgang Maier) in branch 'master':
bpo-33203: Ensure random.choice always raises IndexError on empty sequence (GH-6338)
091e95e

rhettinger · 2018-04-05T16:02:15Z

New changeset baf304e by Raymond Hettinger (Miss Islington (bot)) in branch '3.7':
bpo-33203: Ensure random.choice always raises IndexError on empty sequence (GH-6338) (GH-6387)
baf304e

rhettinger · 2018-04-05T16:24:30Z

New changeset e25af93 by Raymond Hettinger (Miss Islington (bot)) in branch '3.6':
bpo-33203: Ensure random.choice always raises IndexError on empty sequence (GH-6338) (GH-6388)
e25af93

wm75 mannequin added 3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Apr 1, 2018

rhettinger self-assigned this Apr 2, 2018

rhettinger closed this as completed Apr 5, 2018

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

random.choice: raise IndexError on empty sequence even when not using getrandbits internally #77384

random.choice: raise IndexError on empty sequence even when not using getrandbits internally #77384

wm75 mannequin commented Apr 1, 2018

wm75 mannequin commented Apr 1, 2018

selik mannequin commented Apr 2, 2018

rhettinger commented Apr 2, 2018

rhettinger commented Apr 2, 2018

wm75 mannequin commented Apr 3, 2018

wm75 mannequin commented Apr 3, 2018

rhettinger commented Apr 4, 2018

rhettinger commented Apr 5, 2018

rhettinger commented Apr 5, 2018

rhettinger commented Apr 5, 2018

random.choice: raise IndexError on empty sequence even when not using getrandbits internally #77384

random.choice: raise IndexError on empty sequence even when not using getrandbits internally #77384

Comments

wm75 mannequin commented Apr 1, 2018

wm75 mannequin commented Apr 1, 2018

selik mannequin commented Apr 2, 2018

rhettinger commented Apr 2, 2018

rhettinger commented Apr 2, 2018

wm75 mannequin commented Apr 3, 2018

wm75 mannequin commented Apr 3, 2018

rhettinger commented Apr 4, 2018

rhettinger commented Apr 5, 2018

rhettinger commented Apr 5, 2018

rhettinger commented Apr 5, 2018