This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Random.seed does not affect string hash randomization leading to non-intuitive results
Type: behavior Stage: resolved
Components: Documentation, Library (Lib) Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Yuval S, docs@python, remi.lapeyre, rhettinger, tim.peters, xmorel
Priority: normal Keywords: patch

Created on 2020-04-18 21:25 by Yuval S, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
test.py Yuval S, 2020-04-18 21:25
Pull Requests
URL Status Linked Edit
PR 19591 merged rhettinger, 2020-04-19 02:53
PR 19596 closed Yuval S, 2020-04-19 09:25
Messages (10)
msg366741 - (view) Author: Yuval S (Yuval S) * Date: 2020-04-18 21:25
The following code gives different results on each run, even though "``random.seed()``" is used:

>>> import random
>>> random.seed(6)
>>> x = set(str(i) for i in range(500))
>>> print(random.sample(x, 1))

presumably because of string hash randomization (see also #27706),
unless "``PYTHONHASHSEED``" is set. 

However, this is non-intuitive, especially as this random aspect of Python is not mentioned in `Notes on Reproducability <https://docs.python.org/3/library/random.html#notes-on-reproducibility>`_.

I would suggest this is either fixed (using the provided seed for string hash randomization as well) or documented.
msg366742 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2020-04-18 21:48
String hash randomization is a security feature so it may be better to not disable it unless explicitly asked for. Maybe a note in random's documentation could be added?
msg366746 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-04-19 00:58
I'm going to deprecate the support for sets.  It was a design mistake at several levels.  Better to just remove it.
msg366747 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2020-04-19 01:05
Raymond, I think that removing sample(set) support is a different issue.  This report will just change its final example line to

>>> print(random.sample(list(x), 1))

or

>>> print(random.sample(tuple(x), 1))

and have the same complaint.
msg366748 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-04-19 02:12
I think the thing we can fix is the automatic set support which is intrinsically broken with respect to reproducibility and which was likely not a good idea to begin with (because it adds an implicit and possibly unexpected O(n) conversion step and because it doesn't make the API for choice()).

If someone converts a set to a list or tuple upstream from sample(), there isn't much we can do about it.   That wouldn't be much different from list(s)[0] giving different output from run to run.  That is a general FAQ and would apply to just about anything that takes a sequence or iterator to run.
msg366749 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2020-04-19 02:30
Yup, I agree sample(set) is a misfeature.
msg366759 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-04-19 07:36
New changeset 4fe002045fcf40823154b709fef0948b2bc5e934 by Raymond Hettinger in branch 'master':
bpo-40325: Deprecate set object support in random.sample() (GH-19591)
https://github.com/python/cpython/commit/4fe002045fcf40823154b709fef0948b2bc5e934
msg366760 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-04-19 07:37
Yuval, thanks for the report.
msg366763 - (view) Author: Yuval S (Yuval S) * Date: 2020-04-19 08:25
Thank you for the attention and the quick fix. However, the current documentation for "Notes on Reproducibility" should still address this issue of hash randomization. Not only `sample` is affected by this, but any code that combines strings (or bytes or datetime) with hash and random, e.g.

>>> import random
>>> random.seed(6)
>>> a = list(set(str(i) for i in range(500)))
>>> print(a[int(random.random() * 500)])

or, this

>>> import random
>>> import datetime
>>> random.seed(6)
>>> print(random.choice(range(hash(datetime.datetime(2000,1,1)) % 100)))

will still produce non-reproducible results even after the fix. Here is my suggestion for documentation:

> Hash randomization, which is enabled by default since version 3.3, is not affected by `random.seed()`. For this reason, code that relies on string hashes, such as code that relies on the ordering of `set` or `dict`, might be non-reproducible, unless string hash randomization is disabled or seeded (see: https://docs.python.org/3/using/cmdline.html#envvar-PYTHONHASHSEED).

My vote would be to keep hash randomization ties to `random.seed()`, and this would make all use cases more predictable, as well as allow `random.sample()` to support `set`.
msg378683 - (view) Author: Xavier Morel (xmorel) * Date: 2020-10-15 13:21
@rhettinger checking software against 3.9 there's a little issue with the way the check is done: if passed something which is *both* a sequence and a set (e.g. an ordered set), `random.sample` will trigger a warning, which I don't think is correct.

Should I open a new issue for that? Fix seems simple: just move the check for _Set inside the check for _Sequence, and raise if that doesn't pass either.
History
Date User Action Args
2022-04-11 14:59:29adminsetgithub: 84505
2020-10-15 13:21:41xmorelsetnosy: + xmorel
messages: + msg378683
2020-04-19 09:25:54Yuval Ssetpull_requests: + pull_request18932
2020-04-19 08:25:51Yuval Ssetmessages: + msg366763
2020-04-19 07:37:40rhettingersetstatus: open -> closed
resolution: fixed
messages: + msg366760

stage: patch review -> resolved
2020-04-19 07:36:51rhettingersetmessages: + msg366759
2020-04-19 02:53:02rhettingersetkeywords: + patch
stage: patch review
pull_requests: + pull_request18926
2020-04-19 02:30:02tim.peterssetmessages: + msg366749
2020-04-19 02:12:31rhettingersetmessages: + msg366748
2020-04-19 01:05:11tim.peterssetnosy: + tim.peters
messages: + msg366747
2020-04-19 00:58:14rhettingersetversions: + Python 3.9, - Python 3.7, Python 3.8
nosy: + rhettinger

messages: + msg366746

assignee: docs@python -> rhettinger
type: behavior
2020-04-18 21:48:09remi.lapeyresetnosy: + remi.lapeyre
messages: + msg366742
2020-04-18 21:25:19Yuval Screate