New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make pickletools.optimize aware of the MEMOIZE opcode. #64057
Comments
PEP-3154 introduced the MEMOIZE opcode which lowered the overhead of memoization compared to the PUT opcodes which were previously used. We should update pickletools.optimize to remove superfluous uses of this new opcode. |
I afraid this is not a feature, but a bug. pickletools.optimize() can produce invalid pickled data when *PUT and MEMOIZE opcodes are used. >>> import pickle, pickletools
>>> p = b'\x80\x04]\x94(\x8c\x04spamq\x01\x8c\x03ham\x94h\x02e.'
>>> pickle.loads(p)
['spam', 'ham', 'ham']
>>> pickletools.dis(p)
0: \x80 PROTO 4
2: ] EMPTY_LIST
3: \x94 MEMOIZE
4: ( MARK
5: \x8c SHORT_BINUNICODE 'spam'
11: q BINPUT 1
13: \x8c SHORT_BINUNICODE 'ham'
18: \x94 MEMOIZE
19: h BINGET 2
21: e APPENDS (MARK at 4)
22: . STOP
highest protocol among opcodes = 4
>>> p2 = pickletools.optimize(p)
>>> p2
b'\x80\x04\x95\x13\x00\x00\x00\x00\x00\x00\x00]\x94(\x8c\x04spam\x8c\x03ham\x94h\x02e.'
>>> pickle.loads(p2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 2
>>> pickletools.dis(p2)
0: \x80 PROTO 4
2: \x95 FRAME 19
11: ] EMPTY_LIST
12: \x94 MEMOIZE
13: ( MARK
14: \x8c SHORT_BINUNICODE 'spam'
20: \x8c SHORT_BINUNICODE 'ham'
25: \x94 MEMOIZE
26: h BINGET 2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/serhiy/py/cpython/Lib/pickletools.py", line 2458, in dis
raise ValueError(errormsg)
ValueError: memo key 2 has never been stored into |
Well, that can only happen if MEMOIZE and PUT are both used together, which won't happen with the Pickler classes we support. The easiest thing to do here is to disable pickletools.optimize on protocol 4. |
By the way, I was curious if there were many users of pickletools.optimize(), so I did a search and most matches seem to be copies of the stdlib source tree: |
Is it _documented_ that MEMOIZE and PUT can't be used together? If not, it should be documented; and pickletools dis() and optimize() should verify that this restriction is honored in their inputs. |
MEMOIZE and PUT can be used together. They just need to not step on each other toes when they write to the memo table. As specified by PEP-3154, the memo index used by MEMOIZE is the number of elements currently in the memo table. This obviously means functions like, pickletools.optimize, have to be more careful when they rewrite pickles. |
Ah, I almost forgot! I did implement the verification in pickletools.dis() for MEMOIZE: http://hg.python.org/cpython/file/2612ea573ff7/Lib/pickletools.py#l2420 |
Here is a patch which makes pickletools.optimize() to work with the MEMOIZE opcode. As side effect it also renumbers memoized values, this allows to use short BINPUT and BINGET instead of LONG_BINPUT and LONG_BINGET. |
Ping. |
Here is a patch which addresses Antoine's comments. Also added dis() output for binary example in comments. |
New changeset c49b7acba06f by Serhiy Storchaka in branch '3.4': New changeset e7dd739b4b4e by Serhiy Storchaka in branch 'default': |
Many thanks for all your reviews Antoine! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: