This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: Marshal output isn't completely deterministic.
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.11
Status: closed Resolution: duplicate
Dependencies: Superseder: Reproducible pyc: FLAG_REF is not stable.
View: 34093
Assigned To: Nosy List: eric.snow, gvanrossum, methane
Priority: normal Keywords: patch

Created on 2021-09-13 20:34 by eric.snow, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 28335 merged eric.snow, 2021-09-14 16:09
PR 28379 open eric.snow, 2021-09-16 04:38
Messages (9)
msg401724 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2021-09-13 20:34

The output from marshal (e.g. PyMarshal_WriteObjectToString(), marshal.dump()) may be different depending on if it is a debug or non-debug build.  I found this while working on freezing stdlib modules.
msg401726 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2021-09-13 20:34
FYI, I came up with a fix (for frozen modules, at least) in
msg401727 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2021-09-13 20:36
One consequence of this is that frozen module .h files can be different for debug vs. non-debug, which causes CI (and Windows builds) to fail.
msg401847 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2021-09-15 16:19
New changeset cbeb81971057d6c382f45ecce92df2b204d4106a by Eric Snow in branch 'main':
bpo-45020: Freeze some of the modules imported during startup. (gh-28335)
msg401869 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2021-09-15 17:59
I would propose that marshal internally make an extra pass over its input in order to determine which objects are referenced multiple times. This will speed up reading marshalled data (in addition to addressing the reproducibility issue with debug builds) at the cost of slowing down writing it, so there may need to be a way for 3rd party users to turn this off (or a way for importlib and compileall to turn it on).
msg401884 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2021-09-15 18:56
That's a good idea.  It's certainly cleaner than the approach I took (optionally pass in to marshal.dumps() the list of "before" object/refcount pairs to compare in w_ref()).

Adding a flag to marshal.dumps() to opt out shouldn't be too big a deal.  (I expect all users of marshal will want the improvement by default.)
msg401927 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-09-16 07:56
FYI, This issue is duplicate of, and I had made two pull requests to solve the issue.
msg401960 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2021-09-16 15:35
Thanks, Inada-san.  That's super helpful.
msg401977 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2021-09-16 18:23
I'm closing this in favor of bpo-34093.
Date User Action Args
2022-04-11 14:59:49adminsetgithub: 89349
2021-09-16 18:33:16eric.snowsetstatus: open -> closed
superseder: Reproducible pyc: FLAG_REF is not stable.
resolution: duplicate
stage: patch review -> resolved
2021-09-16 18:23:41eric.snowsetmessages: + msg401977
2021-09-16 15:35:55eric.snowsetmessages: + msg401960
2021-09-16 07:56:09methanesetnosy: + methane
messages: + msg401927
2021-09-16 04:38:27eric.snowsetpull_requests: + pull_request26794
2021-09-15 18:56:56eric.snowsetmessages: + msg401884
2021-09-15 17:59:21gvanrossumsetnosy: + gvanrossum
messages: + msg401869
2021-09-15 16:19:37eric.snowsetmessages: + msg401847
2021-09-14 16:09:35eric.snowsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request26746
2021-09-13 20:48:21eric.snowlinkissue45020 dependencies
2021-09-13 20:36:49eric.snowsetmessages: + msg401727
2021-09-13 20:34:45eric.snowsetmessages: + msg401726
2021-09-13 20:34:10eric.snowcreate