New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
random.seed(tuple) uses the randomized hash function and so is not reproductible #76735
Comments
When using a tuple that include a string the results are not consistent when invoking a new interpreter or process. For example executing the following on a linux machine will yield different results: Please note that the doc string of random.seed states: "Initialize internal state from hashable object." Python documentation does not. (https://docs.python.org/3.6/library/random.html#random.seed) This is very confusing, I hope you can fix the behavior, not the doc string. |
random.seed(str) uses: if version == 2 and isinstance(a, (str, bytes, bytearray)):
if isinstance(a, str):
a = a.encode()
a += _sha512(a).digest()
a = int.from_bytes(a, 'big') Whereas for other types, random.seed(obj) uses hash(obj), and hash is randomized by default in Python 3. Yeah, the random.seed() documentation should describe the implementation and explain that hash(obj) is used and that the hash function is randomized by default: |
I'll fix the docstring to make it more specific. We really don't want to use hash(obj) because it produces too few bits of entropy. |
Maybe deprecate using a hash? |
Any deprecation will likely break some existing code, but it would be nice to restrict inputs types to int, float, bytes, bytearray, or str. Then we could remove all reference to hashing. |
This is what I meant. Emit a deprecation warning for input types other than explicitly supported types (but I didn't think about float), and raise an error in future. |
I'm thinking of something like this: $ git diff
diff --git a/Lib/random.py b/Lib/random.py
index 1e0dcc87ed..f479e66ada 100644
--- a/Lib/random.py
+++ b/Lib/random.py
@@ -136,12 +136,17 @@ class Random(_random.Random):
x ^= len(a)
a = -2 if x == -1 else x - if version == 2 and isinstance(a, (str, bytes, bytearray)):
+ elif version == 2 and isinstance(a, (str, bytes, bytearray)):
if isinstance(a, str):
a = a.encode()
a += _sha512(a).digest()
a = int.from_bytes(a, 'big') + elif not isinstance(a, (type(None), int, float, str, bytes, bytearray)): |
a) This below issue added doc to py2.7 that calls out PYTHONHASHSEED, but py doesn't currently contain those words https://bugs.python.org/issue27706 It'd be useful to have the something whether the "behaviour" is fixed or not, as providing other objects (like a tuple) will still be non-deterministic. b) I don't know if this is the correct issue to heap this on, but I think it might as you're looking at changing the seed function? The documentation for https://docs.python.org/3/reference/datamodel.html#object.\_\_hash__ I mainly point this out as seeding random with the current date/time is idiomatic in many languages and environments (usually used when you log the seed to be able to recreate things later, or just blindly copying the historical use Those "in the know" will get a unix timestamp out of the datetime and put that in seed instead, but I feel that falls under the same argument as users-in-the-know SHA512ing a string, mentioned above, which is undesirable and apparently something the function should implement and not users. Would it be wise for datetime to have a specific implementation as well? |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: