This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: cPickle dumps(tuple) != dumps(loads(dumps(tuple)))
Type: behavior Stage: resolved
Components: Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: belopolsky Nosy List: Alberto.Planas.Domínguez, alexandre.vassalotti, belopolsky, pitrou
Priority: low Keywords:

Created on 2010-05-17 10:45 by Alberto.Planas.Domínguez, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (5)
msg105896 - (view) Author: Alberto Planas Domínguez (Alberto.Planas.Domínguez) Date: 2010-05-17 10:45
Sometimes, when I use cPickle to serialize tuples of strings, I get different dumps() result for the same tuple:

import cPickle
t = ('<s>', 'JOHN')
s1 = cPickle.dumps(t)
s2 = cPickle.dumps(cPickle.loads(cPickle.dumps(t)))
assert s1 == s2     # AssertionError

With cPickle doesn't matter what protocol use por dumps(). The assertion is Ok if I use the pickle module instead of cPickle.

This means that I can't use a serialized object as a key in a map/dict object.
msg105898 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-17 10:50
I don't think you can expect serialized results to always be equal. It can depend on specifics of the internal algorithm, such as optimizations or dict iteration order.
msg110345 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-15 04:58
There seems to be a bug somewhere in 2.x cPickle.  Here is a somewhat simpler way to demonstrate the bug: the following code

from pickletools import dis
import cPickle
t = 1L, # use long for easy 3.x comparison
s1 = cPickle.dumps(t)
s2 = cPickle.dumps(cPickle.loads(s1))
print(s1 == s2)
dis(s1)
dis(s2)

prints


False
    0: (    MARK
    1: L        LONG       1L
    5: t        TUPLE      (MARK at 0)
    6: p    PUT        1
    9: .    STOP
highest protocol among opcodes = 0
    0: (    MARK
    1: L        LONG       1L
    5: t        TUPLE      (MARK at 0)
    6: .    STOP
highest protocol among opcodes = 0

The difference is probably immaterial because nothing in the pickle uses the tuple again and PUT is redundant, but the difference does not show up when python pickle module is used instead of cPickle and is not present in py3k.

The comparable py3k code:

from pickletools import dis
import pickle
t = 1,
s1 = pickle.dumps(t, 0)
s2 = pickle.dumps(pickle.loads(s1), 0)
print(s1 == s2)
dis(s1)
dis(s2)


produces

True
    0: (    MARK
    1: L        LONG       1
    5: t        TUPLE      (MARK at 0)
    6: p    PUT        0
    9: .    STOP
highest protocol among opcodes = 0
    0: (    MARK
    1: L        LONG       1
    5: t        TUPLE      (MARK at 0)
    6: p    PUT        0
    9: .    STOP
highest protocol among opcodes = 0


Most likely the bug is benign and not worth fixing, but I would like to figure out what's going on and what changed in 3.x.
msg110347 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-15 05:22
OK, the 2.7 behavior is explainable and correct.  cPickle checks the reference count and does not generate PUT for objects that don't have references:

>>> from pickletools import dis
>>> from cPickle import dumps
>>> dis(dumps(tuple([1])))
    0: (    MARK
    1: I        INT        1
    4: t        TUPLE      (MARK at 0)
    5: .    STOP
highest protocol among opcodes = 0
>>> t = 1,
>>> dis(dumps(t))
    0: (    MARK
    1: I        INT        1
    4: t        TUPLE      (MARK at 0)
    5: p    PUT        1
    8: .    STOP
highest protocol among opcodes = 0

This optimization is not available from python, of course so pickle.py behaves differently.

The remaining question is why this optimization was removed from 3.x.
msg110348 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-15 05:50
I am speculating here while Alexandre probably knows the answer.  The skip PUT on unreferenced objects optimization was probably removed because doing so makes _pickle module behave more like pickle and because pickletools now has optimize method which can provide a more thorough removal of unused unused PUT opcodes.

Closing as "invalid".
History
Date User Action Args
2022-04-11 14:57:01adminsetgithub: 52984
2010-07-15 05:50:20belopolskysetstatus: open -> closed
versions: + Python 2.7, - Python 3.2
messages: + msg110348

resolution: not a bug
stage: resolved
2010-07-15 05:22:54belopolskysetmessages: + msg110347
versions: + Python 3.2, - Python 2.7
2010-07-15 04:58:56belopolskysetversions: - Python 2.6
nosy: + belopolsky

messages: + msg110345

assignee: belopolsky
2010-05-17 10:50:30pitrousetpriority: normal -> low
versions: + Python 2.7
nosy: + alexandre.vassalotti, pitrou

messages: + msg105898
2010-05-17 10:45:17Alberto.Planas.Domínguezcreate