classification
Title: cPickle produces inconsistent output
Type: Stage:
Components: Library (Lib) Versions: Python 2.6, Python 2.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, loewis, rb
Priority: normal Keywords:

Created on 2009-03-19 15:53 by rb, last changed 2009-03-20 23:32 by loewis. This issue is now closed.

Messages (5)
msg83814 - (view) Author: (rb) Date: 2009-03-19 15:53
The documentation states that the output of pickle and cPickle may be
different. However it is implied that the output of a particular module
will always be consistent within itself. This expectation fails for the
case below.

I am using the output of cPickle in order to generate a key to use for
external storage where the key is abstracted to a generic Python
(immutable) object. Without consistency this breaks for me; pickle is
too slow so I need to use cPickle.

$ python
Python 2.5.2 (r252:60911, Oct  5 2008, 19:29:17) 
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cPickle
>>> key = (1, u'foo')
>>> cPickle.dumps(key)
'(I1\nVfoo\ntp1\n.'
>>> cPickle.dumps((1, u'foo'))
'(I1\nVfoo\np1\ntp2\n.'

PythonWin 2.6.1 (r261:67517, Dec  4 2008, 16:51:00) [MSC v.1500 32 bit
(Intel)] 
on win32.
Portions Copyright 1994-2008 Mark Hammond - see 'Help/About PythonWin' for 
further copyright information.
>>> import cPickle
>>> key = (1,u'foo')
>>> cPickle.dumps(key)
'(I1\nVfoo\ntp1\n.'
>>> cPickle.dumps((1,u'foo'))
'(I1\nVfoo\np1\ntp2\n.'

Expected results: the output of the two dumps calls should be the same.
msg83826 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-03-19 23:00
I'm not quite sure why you expect them to be the same. The inputs are
different, after all - in one case, you have a Unicode object with a
single reference to it (from the tuple), in the second case, you have a
Unicode object with many more references:

py> sys.getrefcount(key[1])
2
py> sys.getrefcount((1,u'foo')[1])
5

That makes a difference for cPickle.
msg83862 - (view) Author: (rb) Date: 2009-03-20 14:23
Martin,

Sorry, I don't follow. I realise that the refcounts will be different;
but pickling an object should surely be independent of the refcount as
there is no need to include the refcount in the output?

What other way (using pickle or not) can I convert a generic immutable
Python object to a string to use as a key in external storage?

Currently the documentation points out that the output may be different
between pickle and cPickle which implies that the output will be
consistent for a single module.

If pickle is not required to produce consistent output for the same
input (and refcount isn't really part of the input in this case; it is 
a side issue) than can this be documented?
msg83866 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2009-03-20 18:23
pickle is designed to provide persistent storage, not create keys for
objects. Changes to the format are fine as long as they are compatible.
msg83890 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-03-20 23:32
> Sorry, I don't follow. I realise that the refcounts will be different;
> but pickling an object should surely be independent of the refcount as
> there is no need to include the refcount in the output?

There certainly is a need to consider the refcount. Else the memo
would not work.

> What other way (using pickle or not) can I convert a generic immutable
> Python object to a string to use as a key in external storage?

You will have to come up with your own serialization function. There
are MANY reasons why using a pickle cannot work. For example, in a
dictionary, the order of keys is not guaranteed, and might change even
though the dictionaries compare equal.

> Currently the documentation points out that the output may be different
> between pickle and cPickle which implies that the output will be
> consistent for a single module.

I doesn't imply this at all. The sentence says just what it says: don't
be surprised if you pickle the same object with pickle and cPickle,
and get different results.

> If pickle is not required to produce consistent output for the same
> input (and refcount isn't really part of the input in this case; it is 
> a side issue) than can this be documented?

It's certainly possible to document that, yes. Can you propose a
specific patch to the documentation?
History
Date User Action Args
2009-03-20 23:32:51loewissetmessages: + msg83890
2009-03-20 18:23:08benjamin.petersonsetstatus: pending -> closed
nosy: + benjamin.peterson
messages: + msg83866

2009-03-20 14:23:29rbsetmessages: + msg83862
2009-03-19 23:00:27loewissetstatus: open -> pending

nosy: + loewis
messages: + msg83826

resolution: not a bug
2009-03-19 15:53:50rbcreate