New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cPickle dumps(tuple) != dumps(loads(dumps(tuple))) #52984
Comments
Sometimes, when I use cPickle to serialize tuples of strings, I get different dumps() result for the same tuple: import cPickle
t = ('<s>', 'JOHN')
s1 = cPickle.dumps(t)
s2 = cPickle.dumps(cPickle.loads(cPickle.dumps(t)))
assert s1 == s2 # AssertionError With cPickle doesn't matter what protocol use por dumps(). The assertion is Ok if I use the pickle module instead of cPickle. This means that I can't use a serialized object as a key in a map/dict object. |
I don't think you can expect serialized results to always be equal. It can depend on specifics of the internal algorithm, such as optimizations or dict iteration order. |
There seems to be a bug somewhere in 2.x cPickle. Here is a somewhat simpler way to demonstrate the bug: the following code from pickletools import dis
import cPickle
t = 1L, # use long for easy 3.x comparison
s1 = cPickle.dumps(t)
s2 = cPickle.dumps(cPickle.loads(s1))
print(s1 == s2)
dis(s1)
dis(s2) prints False The difference is probably immaterial because nothing in the pickle uses the tuple again and PUT is redundant, but the difference does not show up when python pickle module is used instead of cPickle and is not present in py3k. The comparable py3k code: from pickletools import dis
import pickle
t = 1,
s1 = pickle.dumps(t, 0)
s2 = pickle.dumps(pickle.loads(s1), 0)
print(s1 == s2)
dis(s1)
dis(s2) produces True Most likely the bug is benign and not worth fixing, but I would like to figure out what's going on and what changed in 3.x. |
OK, the 2.7 behavior is explainable and correct. cPickle checks the reference count and does not generate PUT for objects that don't have references: >>> from pickletools import dis
>>> from cPickle import dumps
>>> dis(dumps(tuple([1])))
0: ( MARK
1: I INT 1
4: t TUPLE (MARK at 0)
5: . STOP
highest protocol among opcodes = 0
>>> t = 1,
>>> dis(dumps(t))
0: ( MARK
1: I INT 1
4: t TUPLE (MARK at 0)
5: p PUT 1
8: . STOP
highest protocol among opcodes = 0 This optimization is not available from python, of course so pickle.py behaves differently. The remaining question is why this optimization was removed from 3.x. |
I am speculating here while Alexandre probably knows the answer. The skip PUT on unreferenced objects optimization was probably removed because doing so makes _pickle module behave more like pickle and because pickletools now has optimize method which can provide a more thorough removal of unused unused PUT opcodes. Closing as "invalid". |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: