This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Non-deterministic bytecode generation
Type: Stage: patch review
Components: Interpreter Core Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Peter Ebden, benjamin.peterson, jefferyto, pablogsal, rhettinger, serhiy.storchaka, yan12125
Priority: high Keywords: patch

Created on 2018-09-18 13:53 by Peter Ebden, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 9472 open Peter Ebden, 2018-09-21 14:35
Messages (8)
msg325644 - (view) Author: Peter Ebden (Peter Ebden) * Date: 2018-09-18 13:53
We've found that the following code produces non-deterministic bytecode,
even post PEP-552:

def test(x):
    if x in {'ONE', 'TWO', 'THREE'}:
        pass

It's not too hard to test it:

$ python3.7 -m compileall --invalidation-mode=unchecked-hash test.py
Compiling 'test.py'...
$ sha1sum __pycache__/test.cpython-37.pyc
61e5682ca95e8707b4ef2a79f64566664dafd800  __pycache__/test.cpython-37.pyc
$ rm __pycache__/test.cpython-37.pyc
$ python3.7 -m compileall --invalidation-mode=unchecked-hash test.py
Compiling 'test.py'...
$ sha1sum __pycache__/test.cpython-37.pyc
222a06621b491879e5317b34e9dd715bacd89b7d  __pycache__/test.cpython-37.pyc

It looks like the peephole optimiser is converting the LOAD_CONST instructions
for the set into a single LOAD_CONST for a frozenset which then serialises in
nondeterministic order. One can hence work around it by setting PYTHONHASHSEED
to a known value.

I'm happy to help out with this if needed, although I don't have a lot of
familiarity with the relevant code.
msg325657 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-09-18 15:19
Have a look at line 512 in Python/marshal.c which calls PyObject_GetIter().  We would need to add PySequence_List() and PyList_Sort().  This will slow down marshaling but would make the bytecode deterministic.
msg325658 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-09-18 15:29
Not all types are orderable.

>>> sorted({'', None})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'NoneType' and 'str'
msg325736 - (view) Author: Peter Ebden (Peter Ebden) * Date: 2018-09-19 09:11
Thanks for the pointer, I'll have a bit more of a dig into it (although Serhiy makes a good point too...).
msg325837 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2018-09-20 04:53
Possibly we should just sort the individual marsahalled entries of the frozenset.
msg344238 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-06-01 21:15
Bumping up the priority a bit on this one.  It would be nice to get it in for 3.8.
msg347820 - (view) Author: (yan12125) * Date: 2019-07-13 13:52
I encounter another case that leads to non-deterministic bytecode. For example, with commit e6b46aafad3427463d6264a68824df4797e682f1 + PR 9472, I got:

$ cat foobar.py
_m = None
$ PYTHONHASHSEED=0 ./python -m compileall --invalidation-mode=unchecked-hash foobar.py
Compiling 'foobar.py'...
$ sha256sum __pycache__/foobar.cpython-39.pyc
7f84b08d5536390d6ce4ccb2d65e259449c56549ee9cc67560f61771824f20ea  __pycache__/foobar.cpython-39.pyc
$ rm __pycache__/foobar.cpython-39.pyc
$ PYTHONHASHSEED=1 ./python -m compileall --invalidation-mode=unchecked-hash foobar.py
Compiling 'foobar.py'...
$ sha256sum __pycache__/foobar.cpython-39.pyc
46dadbb92ad6e1e5b5f8abe9f107086cd231b2b80c15fe84f86e2081a6b8c428  __pycache__/foobar.cpython-39.pyc

In this case there are no sets. I guess the cause is different. Should I open a new issue?
msg401066 - (view) Author: (yan12125) * Date: 2021-09-05 02:16
issue37596 merged a fix to enable deterministic frozensets. I think this issue can be closed?

Regarding my last comment msg347820 - it seems similar to one of https://bugs.python.org/issue34033 or https://bugs.python.org/issue34093. I followed those tickets instead.
History
Date User Action Args
2022-04-11 14:59:06adminsetgithub: 78903
2021-09-05 02:16:59yan12125setmessages: + msg401066
2020-04-12 16:46:23jefferytosetnosy: + jefferyto
2019-07-13 13:52:34yan12125setmessages: + msg347820
2019-07-09 17:54:40xtreaksetnosy: + pablogsal
2019-07-09 15:12:12serhiy.storchakasetversions: + Python 3.9, - Python 3.8
2019-07-09 12:37:25yan12125setnosy: + yan12125
2019-06-01 21:15:26rhettingersetpriority: normal -> high

messages: + msg344238
versions: + Python 3.8, - Python 3.7
2018-09-21 14:35:48Peter Ebdensetkeywords: + patch
stage: patch review
pull_requests: + pull_request8885
2018-09-20 04:53:17benjamin.petersonsetmessages: + msg325837
2018-09-19 09:11:37Peter Ebdensetmessages: + msg325736
2018-09-18 15:38:59eric.snowsetnosy: + benjamin.peterson
2018-09-18 15:29:44serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg325658
2018-09-18 15:19:31rhettingersetnosy: + rhettinger
messages: + msg325657
2018-09-18 13:53:47Peter Ebdencreate