Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Marshal output isn't completely deterministic. #89349

Closed
ericsnowcurrently opened this issue Sep 13, 2021 · 9 comments
Closed

Marshal output isn't completely deterministic. #89349

ericsnowcurrently opened this issue Sep 13, 2021 · 9 comments
Labels
3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@ericsnowcurrently
Copy link
Member

BPO 45186
Nosy @gvanrossum, @methane, @ericsnowcurrently
PRs
  • bpo-45020: Freeze some of the modules imported during startup. #28335
  • gh-78274: Produce consistent output from marshal.dumps(). #28379
  • Superseder
  • bpo-34093: Reproducible pyc: FLAG_REF is not stable.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-09-16.18:33:16.157>
    created_at = <Date 2021-09-13.20:34:10.239>
    labels = ['interpreter-core', 'type-bug', '3.11']
    title = "Marshal output isn't completely deterministic."
    updated_at = <Date 2021-09-16.18:33:16.157>
    user = 'https://github.com/ericsnowcurrently'

    bugs.python.org fields:

    activity = <Date 2021-09-16.18:33:16.157>
    actor = 'eric.snow'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-09-16.18:33:16.157>
    closer = 'eric.snow'
    components = ['Interpreter Core']
    creation = <Date 2021-09-13.20:34:10.239>
    creator = 'eric.snow'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 45186
    keywords = ['patch']
    message_count = 9.0
    messages = ['401724', '401726', '401727', '401847', '401869', '401884', '401927', '401960', '401977']
    nosy_count = 3.0
    nosy_names = ['gvanrossum', 'methane', 'eric.snow']
    pr_nums = ['28335', '28379']
    priority = 'normal'
    resolution = 'duplicate'
    stage = 'resolved'
    status = 'closed'
    superseder = '34093'
    type = 'behavior'
    url = 'https://bugs.python.org/issue45186'
    versions = ['Python 3.11']

    @ericsnowcurrently
    Copy link
    Member Author

    (See: #28107 (comment))

    The output from marshal (e.g. PyMarshal_WriteObjectToString(), marshal.dump()) may be different depending on if it is a debug or non-debug build. I found this while working on freezing stdlib modules.

    @ericsnowcurrently ericsnowcurrently added 3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Sep 13, 2021
    @ericsnowcurrently
    Copy link
    Member Author

    FYI, I came up with a fix (for frozen modules, at least) in #28107.

    @ericsnowcurrently
    Copy link
    Member Author

    One consequence of this is that frozen module .h files can be different for debug vs. non-debug, which causes CI (and Windows builds) to fail.

    @ericsnowcurrently
    Copy link
    Member Author

    New changeset cbeb819 by Eric Snow in branch 'main':
    bpo-45020: Freeze some of the modules imported during startup. (gh-28335)
    cbeb819

    @gvanrossum
    Copy link
    Member

    I would propose that marshal internally make an extra pass over its input in order to determine which objects are referenced multiple times. This will speed up reading marshalled data (in addition to addressing the reproducibility issue with debug builds) at the cost of slowing down writing it, so there may need to be a way for 3rd party users to turn this off (or a way for importlib and compileall to turn it on).

    @ericsnowcurrently
    Copy link
    Member Author

    That's a good idea. It's certainly cleaner than the approach I took (optionally pass in to marshal.dumps() the list of "before" object/refcount pairs to compare in w_ref()).

    Adding a flag to marshal.dumps() to opt out shouldn't be too big a deal. (I expect all users of marshal will want the improvement by default.)

    @methane
    Copy link
    Member

    methane commented Sep 16, 2021

    FYI, This issue is duplicate of https://bugs.python.org/issue34093, and I had made two pull requests to solve the issue.

    @ericsnowcurrently
    Copy link
    Member Author

    Thanks, Inada-san. That's super helpful.

    @ericsnowcurrently
    Copy link
    Member Author

    I'm closing this in favor of bpo-34093.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants