Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-deterministic bytecode generation for frozenset #78903

Closed
peterebden mannequin opened this issue Sep 18, 2018 · 8 comments
Closed

Non-deterministic bytecode generation for frozenset #78903

peterebden mannequin opened this issue Sep 18, 2018 · 8 comments
Labels
3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@peterebden
Copy link
Mannequin

peterebden mannequin commented Sep 18, 2018

BPO 34722
Nosy @rhettinger, @benjaminp, @serhiy-storchaka, @yan12125, @peterebden, @pablogsal, @jefferyto
PRs
  • bpo-34722: Consistent serialization of sets in bytecode #9472
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2018-09-18.13:53:47.451>
    labels = ['interpreter-core', '3.9']
    title = 'Non-deterministic bytecode generation'
    updated_at = <Date 2021-09-05.02:16:59.920>
    user = 'https://github.com/peterebden'

    bugs.python.org fields:

    activity = <Date 2021-09-05.02:16:59.920>
    actor = 'yan12125'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2018-09-18.13:53:47.451>
    creator = 'Peter Ebden'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 34722
    keywords = ['patch']
    message_count = 8.0
    messages = ['325644', '325657', '325658', '325736', '325837', '344238', '347820', '401066']
    nosy_count = 7.0
    nosy_names = ['rhettinger', 'benjamin.peterson', 'serhiy.storchaka', 'yan12125', 'Peter Ebden', 'pablogsal', 'jefferyto']
    pr_nums = ['9472']
    priority = 'high'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue34722'
    versions = ['Python 3.9']

    @peterebden
    Copy link
    Mannequin Author

    peterebden mannequin commented Sep 18, 2018

    We've found that the following code produces non-deterministic bytecode,
    even post PEP-552:

    def test(x):
        if x in {'ONE', 'TWO', 'THREE'}:
            pass

    It's not too hard to test it:

    $ python3.7 -m compileall --invalidation-mode=unchecked-hash test.py
    Compiling 'test.py'...
    $ sha1sum __pycache__/test.cpython-37.pyc
    61e5682ca95e8707b4ef2a79f64566664dafd800  __pycache__/test.cpython-37.pyc
    $ rm __pycache__/test.cpython-37.pyc
    $ python3.7 -m compileall --invalidation-mode=unchecked-hash test.py
    Compiling 'test.py'...
    $ sha1sum __pycache__/test.cpython-37.pyc
    222a06621b491879e5317b34e9dd715bacd89b7d  __pycache__/test.cpython-37.pyc

    It looks like the peephole optimiser is converting the LOAD_CONST instructions
    for the set into a single LOAD_CONST for a frozenset which then serialises in
    nondeterministic order. One can hence work around it by setting PYTHONHASHSEED
    to a known value.

    I'm happy to help out with this if needed, although I don't have a lot of
    familiarity with the relevant code.

    @peterebden peterebden mannequin added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Sep 18, 2018
    @rhettinger
    Copy link
    Contributor

    Have a look at line 512 in Python/marshal.c which calls PyObject_GetIter(). We would need to add PySequence_List() and PyList_Sort(). This will slow down marshaling but would make the bytecode deterministic.

    @serhiy-storchaka
    Copy link
    Member

    Not all types are orderable.

    >>> sorted({'', None})
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: '<' not supported between instances of 'NoneType' and 'str'

    @peterebden
    Copy link
    Mannequin Author

    peterebden mannequin commented Sep 19, 2018

    Thanks for the pointer, I'll have a bit more of a dig into it (although Serhiy makes a good point too...).

    @benjaminp
    Copy link
    Contributor

    Possibly we should just sort the individual marsahalled entries of the frozenset.

    @rhettinger
    Copy link
    Contributor

    Bumping up the priority a bit on this one. It would be nice to get it in for 3.8.

    @rhettinger rhettinger added 3.8 only security fixes and removed 3.7 (EOL) end of life labels Jun 1, 2019
    @serhiy-storchaka serhiy-storchaka added 3.9 only security fixes and removed 3.8 only security fixes labels Jul 9, 2019
    @yan12125
    Copy link
    Mannequin

    yan12125 mannequin commented Jul 13, 2019

    I encounter another case that leads to non-deterministic bytecode. For example, with commit e6b46aa + PR 9472, I got:

    $ cat foobar.py
    _m = None
    $ PYTHONHASHSEED=0 ./python -m compileall --invalidation-mode=unchecked-hash foobar.py
    Compiling 'foobar.py'...
    $ sha256sum __pycache__/foobar.cpython-39.pyc
    7f84b08d5536390d6ce4ccb2d65e259449c56549ee9cc67560f61771824f20ea  __pycache__/foobar.cpython-39.pyc
    $ rm __pycache__/foobar.cpython-39.pyc
    $ PYTHONHASHSEED=1 ./python -m compileall --invalidation-mode=unchecked-hash foobar.py
    Compiling 'foobar.py'...
    $ sha256sum __pycache__/foobar.cpython-39.pyc
    46dadbb92ad6e1e5b5f8abe9f107086cd231b2b80c15fe84f86e2081a6b8c428  __pycache__/foobar.cpython-39.pyc

    In this case there are no sets. I guess the cause is different. Should I open a new issue?

    @yan12125
    Copy link
    Mannequin

    yan12125 mannequin commented Sep 5, 2021

    bpo-37596 merged a fix to enable deterministic frozensets. I think this issue can be closed?

    Regarding my last comment msg347820 - it seems similar to one of https://bugs.python.org/issue34033 or https://bugs.python.org/issue34093. I followed those tickets instead.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @methane methane changed the title Non-deterministic bytecode generation Non-deterministic bytecode generation for frozenset May 4, 2022
    @methane methane closed this as completed May 4, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants