Please see the previous bug report for background
details:
[ python-Bugs-654866 ] pickle and cPickle not
equivalent
The basic problem is that in certain rather rare (but not
*that* rare either, imo) situations cPickle produces a
pickle file that cannot reconstruct the same class
instance identities as the pickle file produced by pickle.
In addition, this appears to be true about pickling
old-style class instances, but not pickling new-style
class instances. So changing a class from old-style to
new-style can change what gets pickled by cPickle.
Here are items from the current docs that are candidates
for clarification. My commentary appears within brackets:
The data streams the two modules produce are
guaranteed to be interchangeable. [Depends on the
definition of interchangeable. I'd like to see something
about reconstructability as well.]
The pickle module keeps track of the objects it has
already serialized, so that later references to the same
object won't be serialized again. [Not true for cPickle,
which apparently only keeps track if the refcount > 1.
Note that this statement has never been true about
instances of simple object types, like int and string.]
If the same object is pickled by multiple dump() calls, the
load() will all yield references to the same object.
[Depends on whether you consider an object pickled as
part of a container, and later pickled independently, as
pickling the same object. If you expect that load() will
yield references to the same object (and why wouldn't
you, right? But that's why I'm disturbed by this.) then you
need to be aware of the situations in which cPickle
decides not to keep track.]
The pickle data stream produced by pickle and cPickle
are identical, so it is possible to use pickle and cPickle
interchangeably with existing pickles. [This statement is
part true and part false. The pickle data streams are not
identical - they are often cosmetically different and
occassionally substantially different. And this isn't really
the reason the data streams are interchangeable. That
has to do with the structure of the data stream, not the
content of the data stream. We need to make it clear that
the content isn't guaranteed to be identical, even though
the structure of existing pickles can be read by either
pickle or cPickle.]
There are additional minor differences in API between
cPickle and pickle, however for most applications, they
are interchangable. More documentation is provided in
the pickle module documentation, which includes a list of
the documented differences. [Applications that care
about object identity will want to be aware of the
limitation of the cPickle memoization capability and how it
differs from the pickle version.]
Footnotes
... pickles3.13
Since the pickle data format is actually a tiny
stack-oriented programming language, and some freedom
is taken in the encodings of certain objects, it is possible
that the two modules produce different data streams for
the same input objects. However it is guaranteed that
they will always be able to read each other's data
streams. [Again, readability is not good enough for
applications that expect object reconstuction
equivalence.]
|