New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Marshal output isn't completely deterministic. #89349
Comments
(See: #28107 (comment)) The output from marshal (e.g. PyMarshal_WriteObjectToString(), marshal.dump()) may be different depending on if it is a debug or non-debug build. I found this while working on freezing stdlib modules. |
FYI, I came up with a fix (for frozen modules, at least) in #28107. |
One consequence of this is that frozen module .h files can be different for debug vs. non-debug, which causes CI (and Windows builds) to fail. |
I would propose that marshal internally make an extra pass over its input in order to determine which objects are referenced multiple times. This will speed up reading marshalled data (in addition to addressing the reproducibility issue with debug builds) at the cost of slowing down writing it, so there may need to be a way for 3rd party users to turn this off (or a way for importlib and compileall to turn it on). |
That's a good idea. It's certainly cleaner than the approach I took (optionally pass in to marshal.dumps() the list of "before" object/refcount pairs to compare in w_ref()). Adding a flag to marshal.dumps() to opt out shouldn't be too big a deal. (I expect all users of marshal will want the improvement by default.) |
FYI, This issue is duplicate of https://bugs.python.org/issue34093, and I had made two pull requests to solve the issue. |
Thanks, Inada-san. That's super helpful. |
I'm closing this in favor of bpo-34093. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: