Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cPickle dumps(tuple) != dumps(loads(dumps(tuple))) #52984

Closed
AlbertoPlanasDomnguez mannequin opened this issue May 17, 2010 · 5 comments
Closed

cPickle dumps(tuple) != dumps(loads(dumps(tuple))) #52984

AlbertoPlanasDomnguez mannequin opened this issue May 17, 2010 · 5 comments
Assignees
Labels
type-bug An unexpected behavior, bug, or error

Comments

@AlbertoPlanasDomnguez
Copy link
Mannequin

AlbertoPlanasDomnguez mannequin commented May 17, 2010

BPO 8738
Nosy @abalkin, @pitrou, @avassalotti

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/abalkin'
closed_at = <Date 2010-07-15.05:50:20.703>
created_at = <Date 2010-05-17.10:45:17.651>
labels = ['type-bug', 'invalid']
title = 'cPickle dumps(tuple) != dumps(loads(dumps(tuple)))'
updated_at = <Date 2010-07-15.05:50:20.698>
user = 'https://bugs.python.org/AlbertoPlanasDomnguez'

bugs.python.org fields:

activity = <Date 2010-07-15.05:50:20.698>
actor = 'belopolsky'
assignee = 'belopolsky'
closed = True
closed_date = <Date 2010-07-15.05:50:20.703>
closer = 'belopolsky'
components = []
creation = <Date 2010-05-17.10:45:17.651>
creator = 'Alberto.Planas.Dom\xc3\xadnguez'
dependencies = []
files = []
hgrepos = []
issue_num = 8738
keywords = []
message_count = 5.0
messages = ['105896', '105898', '110345', '110347', '110348']
nosy_count = 4.0
nosy_names = ['belopolsky', 'pitrou', 'alexandre.vassalotti', 'Alberto.Planas.Dom\xc3\xadnguez']
pr_nums = []
priority = 'low'
resolution = 'not a bug'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue8738'
versions = ['Python 2.7']

@AlbertoPlanasDomnguez
Copy link
Mannequin Author

AlbertoPlanasDomnguez mannequin commented May 17, 2010

Sometimes, when I use cPickle to serialize tuples of strings, I get different dumps() result for the same tuple:

import cPickle
t = ('<s>', 'JOHN')
s1 = cPickle.dumps(t)
s2 = cPickle.dumps(cPickle.loads(cPickle.dumps(t)))
assert s1 == s2     # AssertionError

With cPickle doesn't matter what protocol use por dumps(). The assertion is Ok if I use the pickle module instead of cPickle.

This means that I can't use a serialized object as a key in a map/dict object.

@AlbertoPlanasDomnguez AlbertoPlanasDomnguez mannequin added the type-bug An unexpected behavior, bug, or error label May 17, 2010
@pitrou
Copy link
Member

pitrou commented May 17, 2010

I don't think you can expect serialized results to always be equal. It can depend on specifics of the internal algorithm, such as optimizations or dict iteration order.

@abalkin
Copy link
Member

abalkin commented Jul 15, 2010

There seems to be a bug somewhere in 2.x cPickle. Here is a somewhat simpler way to demonstrate the bug: the following code

from pickletools import dis
import cPickle
t = 1L, # use long for easy 3.x comparison
s1 = cPickle.dumps(t)
s2 = cPickle.dumps(cPickle.loads(s1))
print(s1 == s2)
dis(s1)
dis(s2)

prints

False
0: ( MARK
1: L LONG 1L
5: t TUPLE (MARK at 0)
6: p PUT 1
9: . STOP
highest protocol among opcodes = 0
0: ( MARK
1: L LONG 1L
5: t TUPLE (MARK at 0)
6: . STOP
highest protocol among opcodes = 0

The difference is probably immaterial because nothing in the pickle uses the tuple again and PUT is redundant, but the difference does not show up when python pickle module is used instead of cPickle and is not present in py3k.

The comparable py3k code:

from pickletools import dis
import pickle
t = 1,
s1 = pickle.dumps(t, 0)
s2 = pickle.dumps(pickle.loads(s1), 0)
print(s1 == s2)
dis(s1)
dis(s2)

produces

True
0: ( MARK
1: L LONG 1
5: t TUPLE (MARK at 0)
6: p PUT 0
9: . STOP
highest protocol among opcodes = 0
0: ( MARK
1: L LONG 1
5: t TUPLE (MARK at 0)
6: p PUT 0
9: . STOP
highest protocol among opcodes = 0

Most likely the bug is benign and not worth fixing, but I would like to figure out what's going on and what changed in 3.x.

@abalkin abalkin self-assigned this Jul 15, 2010
@abalkin
Copy link
Member

abalkin commented Jul 15, 2010

OK, the 2.7 behavior is explainable and correct. cPickle checks the reference count and does not generate PUT for objects that don't have references:

>>> from pickletools import dis
>>> from cPickle import dumps
>>> dis(dumps(tuple([1])))
    0: (    MARK
    1: I        INT        1
    4: t        TUPLE      (MARK at 0)
    5: .    STOP
highest protocol among opcodes = 0
>>> t = 1,
>>> dis(dumps(t))
    0: (    MARK
    1: I        INT        1
    4: t        TUPLE      (MARK at 0)
    5: p    PUT        1
    8: .    STOP
highest protocol among opcodes = 0

This optimization is not available from python, of course so pickle.py behaves differently.

The remaining question is why this optimization was removed from 3.x.

@abalkin
Copy link
Member

abalkin commented Jul 15, 2010

I am speculating here while Alexandre probably knows the answer. The skip PUT on unreferenced objects optimization was probably removed because doing so makes _pickle module behave more like pickle and because pickletools now has optimize method which can provide a more thorough removal of unused unused PUT opcodes.

Closing as "invalid".

@abalkin abalkin closed this as completed Jul 15, 2010
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants