Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make pickletools.optimize aware of the MEMOIZE opcode. #64057

Closed
avassalotti opened this issue Dec 2, 2013 · 12 comments
Closed

Make pickletools.optimize aware of the MEMOIZE opcode. #64057

avassalotti opened this issue Dec 2, 2013 · 12 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@avassalotti
Copy link
Member

BPO 19858
Nosy @tim-one, @pitrou, @avassalotti, @serhiy-storchaka
Files
  • pickle_optimize_memoize.patch
  • pickle_optimize_memoize_2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2014-12-16.18:08:13.740>
    created_at = <Date 2013-12-02.01:00:14.519>
    labels = ['type-bug', 'library']
    title = 'Make pickletools.optimize aware of the MEMOIZE opcode.'
    updated_at = <Date 2014-12-16.18:08:13.739>
    user = 'https://github.com/avassalotti'

    bugs.python.org fields:

    activity = <Date 2014-12-16.18:08:13.739>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2014-12-16.18:08:13.740>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2013-12-02.01:00:14.519>
    creator = 'alexandre.vassalotti'
    dependencies = []
    files = ['37317', '37469']
    hgrepos = []
    issue_num = 19858
    keywords = ['patch', 'needs review']
    message_count = 12.0
    messages = ['204985', '205032', '205051', '205055', '205071', '205074', '205348', '231861', '232658', '232738', '232748', '232752']
    nosy_count = 5.0
    nosy_names = ['tim.peters', 'pitrou', 'alexandre.vassalotti', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue19858'
    versions = ['Python 3.4', 'Python 3.5']

    @avassalotti
    Copy link
    Member Author

    PEP-3154 introduced the MEMOIZE opcode which lowered the overhead of memoization compared to the PUT opcodes which were previously used.

    We should update pickletools.optimize to remove superfluous uses of this new opcode.

    @avassalotti avassalotti added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Dec 2, 2013
    @serhiy-storchaka
    Copy link
    Member

    I afraid this is not a feature, but a bug. pickletools.optimize() can produce invalid pickled data when *PUT and MEMOIZE opcodes are used.

    >>> import pickle, pickletools
    >>> p = b'\x80\x04]\x94(\x8c\x04spamq\x01\x8c\x03ham\x94h\x02e.'
    >>> pickle.loads(p)
    ['spam', 'ham', 'ham']
    >>> pickletools.dis(p)
        0: \x80 PROTO      4
        2: ]    EMPTY_LIST
        3: \x94 MEMOIZE
        4: (    MARK
        5: \x8c     SHORT_BINUNICODE 'spam'
       11: q        BINPUT     1
       13: \x8c     SHORT_BINUNICODE 'ham'
       18: \x94     MEMOIZE
       19: h        BINGET     2
       21: e        APPENDS    (MARK at 4)
       22: .    STOP
    highest protocol among opcodes = 4
    >>> p2 = pickletools.optimize(p)
    >>> p2
    b'\x80\x04\x95\x13\x00\x00\x00\x00\x00\x00\x00]\x94(\x8c\x04spam\x8c\x03ham\x94h\x02e.'
    >>> pickle.loads(p2)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    KeyError: 2
    >>> pickletools.dis(p2)
        0: \x80 PROTO      4
        2: \x95 FRAME      19
       11: ]    EMPTY_LIST
       12: \x94 MEMOIZE
       13: (    MARK
       14: \x8c     SHORT_BINUNICODE 'spam'
       20: \x8c     SHORT_BINUNICODE 'ham'
       25: \x94     MEMOIZE
       26: h        BINGET     2
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/serhiy/py/cpython/Lib/pickletools.py", line 2458, in dis
        raise ValueError(errormsg)
    ValueError: memo key 2 has never been stored into

    @avassalotti
    Copy link
    Member Author

    Well, that can only happen if MEMOIZE and PUT are both used together, which won't happen with the Pickler classes we support. The easiest thing to do here is to disable pickletools.optimize on protocol 4.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 2, 2013

    By the way, I was curious if there were many users of pickletools.optimize(), so I did a search and most matches seem to be copies of the stdlib source tree:

    http://code.ohloh.net/search?s=pickletools%20optimize&p=0&pp=0&fl=Python&mp=1&ml=1&me=1&md=1&ff=1&filterChecked=true

    @tim-one
    Copy link
    Member

    tim-one commented Dec 3, 2013

    Is it _documented_ that MEMOIZE and PUT can't be used together? If not, it should be documented; and pickletools dis() and optimize() should verify that this restriction is honored in their inputs.

    @avassalotti
    Copy link
    Member Author

    MEMOIZE and PUT can be used together. They just need to not step on each other toes when they write to the memo table. As specified by PEP-3154, the memo index used by MEMOIZE is the number of elements currently in the memo table. This obviously means functions like, pickletools.optimize, have to be more careful when they rewrite pickles.

    @avassalotti
    Copy link
    Member Author

    Ah, I almost forgot! I did implement the verification in pickletools.dis() for MEMOIZE:

    http://hg.python.org/cpython/file/2612ea573ff7/Lib/pickletools.py#l2420

    @serhiy-storchaka
    Copy link
    Member

    Here is a patch which makes pickletools.optimize() to work with the MEMOIZE opcode.

    As side effect it also renumbers memoized values, this allows to use short BINPUT and BINGET instead of LONG_BINPUT and LONG_BINGET.

    @serhiy-storchaka
    Copy link
    Member

    Ping.

    @serhiy-storchaka
    Copy link
    Member

    Here is a patch which addresses Antoine's comments. Also added dis() output for binary example in comments.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 16, 2014

    New changeset c49b7acba06f by Serhiy Storchaka in branch '3.4':
    Issue bpo-19858: pickletools.optimize() now aware of the MEMOIZE opcode, can
    https://hg.python.org/cpython/rev/c49b7acba06f

    New changeset e7dd739b4b4e by Serhiy Storchaka in branch 'default':
    Issue bpo-19858: pickletools.optimize() now aware of the MEMOIZE opcode, can
    https://hg.python.org/cpython/rev/e7dd739b4b4e

    @serhiy-storchaka
    Copy link
    Member

    Many thanks for all your reviews Antoine!

    @serhiy-storchaka serhiy-storchaka self-assigned this Dec 16, 2014
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants