Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

random.seed(tuple) uses the randomized hash function and so is not reproductible #76735

Closed
johnnyd mannequin opened this issue Jan 15, 2018 · 9 comments
Closed

random.seed(tuple) uses the randomized hash function and so is not reproductible #76735

johnnyd mannequin opened this issue Jan 15, 2018 · 9 comments
Assignees
Labels
docs Documentation in the Doc dir extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error

Comments

@johnnyd
Copy link
Mannequin

johnnyd mannequin commented Jan 15, 2018

BPO 32554
Nosy @rhettinger, @mdickinson, @vstinner, @serhiy-storchaka
PRs
  • bpo-32554: Deprecate hashing arbitrary types in random.seed() #15382
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2019-08-22.16:20:08.633>
    created_at = <Date 2018-01-15.09:08:22.137>
    labels = ['extension-modules', 'type-bug', 'docs']
    title = 'random.seed(tuple) uses the randomized hash function and so is not reproductible'
    updated_at = <Date 2019-08-22.16:20:08.633>
    user = 'https://bugs.python.org/johnnyd'

    bugs.python.org fields:

    activity = <Date 2019-08-22.16:20:08.633>
    actor = 'rhettinger'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2019-08-22.16:20:08.633>
    closer = 'rhettinger'
    components = ['Documentation', 'Extension Modules']
    creation = <Date 2018-01-15.09:08:22.137>
    creator = 'johnnyd'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 32554
    keywords = ['patch']
    message_count = 9.0
    messages = ['309956', '309957', '310009', '310019', '320360', '320361', '320383', '321759', '350209']
    nosy_count = 7.0
    nosy_names = ['rhettinger', 'mark.dickinson', 'vstinner', 'docs@python', 'serhiy.storchaka', 'johnnyd', 'poddster']
    pr_nums = ['15382']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue32554'
    versions = ['Python 3.6']

    @johnnyd
    Copy link
    Mannequin Author

    johnnyd mannequin commented Jan 15, 2018

    When using a tuple that include a string the results are not consistent when invoking a new interpreter or process.

    For example executing the following on a linux machine will yield different results:
    python3.6 -c 'import random; random.seed(("a", 1)); print(random.random())"

    Please note that the doc string of random.seed states: "Initialize internal state from hashable object."

    Python documentation does not. (https://docs.python.org/3.6/library/random.html#random.seed)

    This is very confusing, I hope you can fix the behavior, not the doc string.

    @johnnyd johnnyd mannequin added extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error labels Jan 15, 2018
    @vstinner
    Copy link
    Member

    random.seed(str) uses:

            if version == 2 and isinstance(a, (str, bytes, bytearray)):
                if isinstance(a, str):
                    a = a.encode()
                a += _sha512(a).digest()
                a = int.from_bytes(a, 'big')

    Whereas for other types, random.seed(obj) uses hash(obj), and hash is randomized by default in Python 3.

    Yeah, the random.seed() documentation should describe the implementation and explain that hash(obj) is used and that the hash function is randomized by default:
    https://docs.python.org/dev/library/random.html#random.seed

    @vstinner vstinner added the docs Documentation in the Doc dir label Jan 15, 2018
    @vstinner vstinner changed the title random seed is not consistent when using tuples with a str element random.seed(tuple) uses the randomized hash function and so is not reproductible Jan 15, 2018
    @rhettinger rhettinger assigned rhettinger and unassigned docspython Jan 15, 2018
    @rhettinger
    Copy link
    Contributor

    This is very confusing, I hope you can fix the behavior, not the doc string.

    I'll fix the docstring to make it more specific.

    We really don't want to use hash(obj) because it produces too few bits of entropy.

    @serhiy-storchaka
    Copy link
    Member

    Maybe deprecate using a hash?

    @rhettinger
    Copy link
    Contributor

    Maybe deprecate using a hash?

    Any deprecation will likely break some existing code, but it would be nice to restrict inputs types to int, float, bytes, bytearray, or str. Then we could remove all reference to hashing.

    @serhiy-storchaka
    Copy link
    Member

    This is what I meant. Emit a deprecation warning for input types other than explicitly supported types (but I didn't think about float), and raise an error in future.

    @rhettinger
    Copy link
    Contributor

    I'm thinking of something like this:

    $ git diff
    diff --git a/Lib/random.py b/Lib/random.py
    index 1e0dcc87ed..f479e66ada 100644
    --- a/Lib/random.py
    +++ b/Lib/random.py
    @@ -136,12 +136,17 @@ class Random(_random.Random):
                 x ^= len(a)
                 a = -2 if x == -1 else x
    -        if version == 2 and isinstance(a, (str, bytes, bytearray)):
    +        elif version == 2 and isinstance(a, (str, bytes, bytearray)):
                 if isinstance(a, str):
                     a = a.encode()
                 a += _sha512(a).digest()
                 a = int.from_bytes(a, 'big')

    + elif not isinstance(a, (type(None), int, float, str, bytes, bytearray)):
    + _warn('Seeding based on hashing is deprecated.\n'
    + 'The only supported seed types are None, int, float, '
    + 'str, bytes, and bytearray.', DeprecationWarning, 2)
    +
    super().seed(a)
    self.gauss_next = None

    @poddster
    Copy link
    Mannequin

    poddster mannequin commented Jul 16, 2018

    a) This below issue added doc to py2.7 that calls out PYTHONHASHSEED, but py doesn't currently contain those words

    https://bugs.python.org/issue27706

    It'd be useful to have the something whether the "behaviour" is fixed or not, as providing other objects (like a tuple) will still be non-deterministic.

    b) I don't know if this is the correct issue to heap this on, but I think it might as you're looking at changing the seed function?

    The documentation for object.__hash__ calls out str, bytes and datetime as being affected by PYTHONHASHSEED. Doesn't it seem odd that there's a workaround in the seed function for str and bytes, but not for datetime?

    https://docs.python.org/3/reference/datamodel.html#object.\_\_hash__

    I mainly point this out as seeding random with the current date/time is idiomatic in many languages and environments (usually used when you log the seed to be able to recreate things later, or just blindly copying the historical use srand(time(NULL)) from C programs!). Anyone shoving a datetime straight into seed() is going to find it non-deterministic and might not understand why, or even notice, especially as the documentation for seed() doesn't call this out.

    Those "in the know" will get a unix timestamp out of the datetime and put that in seed instead, but I feel that falls under the same argument as users-in-the-know SHA512ing a string, mentioned above, which is undesirable and apparently something the function should implement and not users.

    Would it be wise for datetime to have a specific implementation as well?

    @rhettinger
    Copy link
    Contributor

    New changeset d0cdeaa by Raymond Hettinger in branch 'master':
    bpo-32554: Deprecate hashing arbitrary types in random.seed() (GH-15382)
    d0cdeaa

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants