Message347969
See bpo-29708 meta issue and https://reproducible-builds.org/ for reproducible builds.
pyc files are not fully reproducible yet: frozenset items are not serialized in a deterministic order
One solution would be to modify marshal to sort frozenset items before serializing them. The issue is how to handle items which cannot be compared. Example:
>>> l=[float("nan"), b'bytes', 'unicode']
>>> l.sort()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'bytes' and 'float'
One workaround for types which cannot be compared is to use the type name in the key used to compare items:
>>> l.sort(key=lambda x: (type(x).__name__, x))
>>> l
[b'bytes', nan, 'unicode']
Note: comparison between bytes and str raises a BytesWarning exception when using python3 -bb.
Second problem: how to handle exceptions when comparison raises an error anyway?
Another solution would be to use the PYTHONHASHSEED environment variable. For example, if SOURCE_DATE_EPOCH is set, PYTHONHASHSEED would be set to 0. This option is not my favorite because it disables a security fix against denial of service on dict and set:
https://python-security.readthedocs.io/vuln/hash-dos.html
--
Previous discussions on reproducible frozenset:
* https://mail.python.org/pipermail/python-dev/2018-July/154604.html
* https://bugs.python.org/issue34093#msg321523
See also bpo-34093: "Reproducible pyc: FLAG_REF is not stable" and PEP 552 "Deterministic pycs". |
|
Date |
User |
Action |
Args |
2019-07-15 15:05:11 | vstinner | set | recipients:
+ vstinner |
2019-07-15 15:05:11 | vstinner | set | messageid: <1563203111.95.0.786255267379.issue37596@roundup.psfhosted.org> |
2019-07-15 15:05:11 | vstinner | link | issue37596 messages |
2019-07-15 15:05:11 | vstinner | create | |
|