random.seed(tuple) uses the randomized hash function and so is not reproductible #76735

johnnyd · 2018-01-15T09:08:22Z

BPO	32554
Nosy	@rhettinger, @mdickinson, @vstinner, @serhiy-storchaka
PRs	bpo-32554: Deprecate hashing arbitrary types in random.seed() #15382

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/rhettinger'
closed_at = <Date 2019-08-22.16:20:08.633>
created_at = <Date 2018-01-15.09:08:22.137>
labels = ['extension-modules', 'type-bug', 'docs']
title = 'random.seed(tuple) uses the randomized hash function and so is not reproductible'
updated_at = <Date 2019-08-22.16:20:08.633>
user = 'https://bugs.python.org/johnnyd'

bugs.python.org fields:

activity = <Date 2019-08-22.16:20:08.633>
actor = 'rhettinger'
assignee = 'rhettinger'
closed = True
closed_date = <Date 2019-08-22.16:20:08.633>
closer = 'rhettinger'
components = ['Documentation', 'Extension Modules']
creation = <Date 2018-01-15.09:08:22.137>
creator = 'johnnyd'
dependencies = []
files = []
hgrepos = []
issue_num = 32554
keywords = ['patch']
message_count = 9.0
messages = ['309956', '309957', '310009', '310019', '320360', '320361', '320383', '321759', '350209']
nosy_count = 7.0
nosy_names = ['rhettinger', 'mark.dickinson', 'vstinner', 'docs@python', 'serhiy.storchaka', 'johnnyd', 'poddster']
pr_nums = ['15382']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue32554'
versions = ['Python 3.6']

johnnyd · 2018-01-15T09:08:22Z

When using a tuple that include a string the results are not consistent when invoking a new interpreter or process.

For example executing the following on a linux machine will yield different results:
python3.6 -c 'import random; random.seed(("a", 1)); print(random.random())"

Please note that the doc string of random.seed states: "Initialize internal state from hashable object."

Python documentation does not. (https://docs.python.org/3.6/library/random.html#random.seed)

This is very confusing, I hope you can fix the behavior, not the doc string.

vstinner · 2018-01-15T09:13:25Z

random.seed(str) uses:

        if version == 2 and isinstance(a, (str, bytes, bytearray)):
            if isinstance(a, str):
                a = a.encode()
            a += _sha512(a).digest()
            a = int.from_bytes(a, 'big')

Whereas for other types, random.seed(obj) uses hash(obj), and hash is randomized by default in Python 3.

Yeah, the random.seed() documentation should describe the implementation and explain that hash(obj) is used and that the hash function is randomized by default:
https://docs.python.org/dev/library/random.html#random.seed

rhettinger · 2018-01-15T18:46:17Z

This is very confusing, I hope you can fix the behavior, not the doc string.

I'll fix the docstring to make it more specific.

We really don't want to use hash(obj) because it produces too few bits of entropy.

serhiy-storchaka · 2018-01-15T21:49:39Z

Maybe deprecate using a hash?

rhettinger · 2018-06-24T06:59:04Z

Maybe deprecate using a hash?

Any deprecation will likely break some existing code, but it would be nice to restrict inputs types to int, float, bytes, bytearray, or str. Then we could remove all reference to hashing.

serhiy-storchaka · 2018-06-24T07:08:16Z

This is what I meant. Emit a deprecation warning for input types other than explicitly supported types (but I didn't think about float), and raise an error in future.

rhettinger · 2018-06-24T19:44:56Z

I'm thinking of something like this:

$ git diff
diff --git a/Lib/random.py b/Lib/random.py
index 1e0dcc87ed..f479e66ada 100644
--- a/Lib/random.py
+++ b/Lib/random.py
@@ -136,12 +136,17 @@ class Random(_random.Random):
             x ^= len(a)
             a = -2 if x == -1 else x

-        if version == 2 and isinstance(a, (str, bytes, bytearray)):
+        elif version == 2 and isinstance(a, (str, bytes, bytearray)):
             if isinstance(a, str):
                 a = a.encode()
             a += _sha512(a).digest()
             a = int.from_bytes(a, 'big')

+ elif not isinstance(a, (type(None), int, float, str, bytes, bytearray)):
+ _warn('Seeding based on hashing is deprecated.\n'
+ 'The only supported seed types are None, int, float, '
+ 'str, bytes, and bytearray.', DeprecationWarning, 2)
+
super().seed(a)
self.gauss_next = None

poddster · 2018-07-16T19:25:12Z

a) This below issue added doc to py2.7 that calls out PYTHONHASHSEED, but py doesn't currently contain those words

https://bugs.python.org/issue27706

It'd be useful to have the something whether the "behaviour" is fixed or not, as providing other objects (like a tuple) will still be non-deterministic.

b) I don't know if this is the correct issue to heap this on, but I think it might as you're looking at changing the seed function?

The documentation for object.__hash__ calls out str, bytes and datetime as being affected by PYTHONHASHSEED. Doesn't it seem odd that there's a workaround in the seed function for str and bytes, but not for datetime?

https://docs.python.org/3/reference/datamodel.html#object.\_\_hash__

I mainly point this out as seeding random with the current date/time is idiomatic in many languages and environments (usually used when you log the seed to be able to recreate things later, or just blindly copying the historical use srand(time(NULL)) from C programs!). Anyone shoving a datetime straight into seed() is going to find it non-deterministic and might not understand why, or even notice, especially as the documentation for seed() doesn't call this out.

Those "in the know" will get a unix timestamp out of the datetime and put that in seed instead, but I feel that falls under the same argument as users-in-the-know SHA512ing a string, mentioned above, which is undesirable and apparently something the function should implement and not users.

Would it be wise for datetime to have a specific implementation as well?

rhettinger · 2019-08-22T16:19:40Z

New changeset d0cdeaa by Raymond Hettinger in branch 'master':
bpo-32554: Deprecate hashing arbitrary types in random.seed() (GH-15382)
d0cdeaa

johnnyd mannequin added extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error labels Jan 15, 2018

vstinner added the docs Documentation in the Doc dir label Jan 15, 2018

vstinner assigned docspython Jan 15, 2018

vstinner changed the title ~~random seed is not consistent when using tuples with a str element~~ random.seed(tuple) uses the randomized hash function and so is not reproductible Jan 15, 2018

rhettinger assigned rhettinger and unassigned docspython Jan 15, 2018

rhettinger closed this as completed Aug 22, 2019

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

random.seed(tuple) uses the randomized hash function and so is not reproductible #76735

random.seed(tuple) uses the randomized hash function and so is not reproductible #76735

johnnyd mannequin commented Jan 15, 2018

johnnyd mannequin commented Jan 15, 2018

vstinner commented Jan 15, 2018

rhettinger commented Jan 15, 2018

serhiy-storchaka commented Jan 15, 2018

rhettinger commented Jun 24, 2018

serhiy-storchaka commented Jun 24, 2018

rhettinger commented Jun 24, 2018

poddster mannequin commented Jul 16, 2018

rhettinger commented Aug 22, 2019

random.seed(tuple) uses the randomized hash function and so is not reproductible #76735

random.seed(tuple) uses the randomized hash function and so is not reproductible #76735

Comments

johnnyd mannequin commented Jan 15, 2018

johnnyd mannequin commented Jan 15, 2018

vstinner commented Jan 15, 2018

rhettinger commented Jan 15, 2018

serhiy-storchaka commented Jan 15, 2018

rhettinger commented Jun 24, 2018

serhiy-storchaka commented Jun 24, 2018

rhettinger commented Jun 24, 2018

poddster mannequin commented Jul 16, 2018

rhettinger commented Aug 22, 2019