Title: Reproducible pyc: frozenset is not serialized in a deterministic order
Type: Stage:
Components: Interpreter Core Versions: Python 3.9
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: jefferyto, vstinner, yan12125
Priority: normal Keywords:

Created on 2019-07-15 15:05 by vstinner, last changed 2020-04-10 13:22 by yan12125.

Messages (2)
msg347969 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-07-15 15:05
See bpo-29708 meta issue and for reproducible builds.

pyc files are not fully reproducible yet: frozenset items are not serialized in a deterministic order

One solution would be to modify marshal to sort frozenset items before serializing them. The issue is how to handle items which cannot be compared. Example:

>>> l=[float("nan"), b'bytes', 'unicode']
>>> l.sort()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'bytes' and 'float'

One workaround for types which cannot be compared is to use the type name in the key used to compare items:

>>> l.sort(key=lambda x: (type(x).__name__, x))
>>> l
[b'bytes', nan, 'unicode']

Note: comparison between bytes and str raises a BytesWarning exception when using python3 -bb.

Second problem: how to handle exceptions when comparison raises an error anyway?

Another solution would be to use the PYTHONHASHSEED environment variable. For example, if SOURCE_DATE_EPOCH is set, PYTHONHASHSEED would be set to 0. This option is not my favorite because it disables a security fix against denial of service on dict and set:


Previous discussions on reproducible frozenset:


See also bpo-34093: "Reproducible pyc: FLAG_REF is not stable" and PEP 552 "Deterministic pycs".
msg366124 - (view) Author: Chih-Hsuan Yen (yan12125) * Date: 2020-04-10 13:22
issue34722 also talks about frozenset, nondeterministic order and sorting. Maybe this ticket and that one are for the same issue?
Date User Action Args
2020-10-22 20:39:28eric.araujolinkissue29708 dependencies
2020-04-10 13:22:03yan12125setnosy: + yan12125
messages: + msg366124
2020-04-08 12:47:20jefferytosetnosy: + jefferyto
2019-07-15 15:05:11vstinnercreate