classification
Title: pickle protocol 2 failure on int subclass
Type: behavior Stage: needs patch
Components: Documentation, Library (Lib) Versions: Python 3.2, Python 2.6
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Pickle breakage with reduction of recursive structures
View: 1062277
Assigned To: belopolsky Nosy List: ajaksu2, alexandre.vassalotti, andersjm, belopolsky, rhettinger
Priority: low Keywords:

Created on 2006-10-20 10:15 by andersjm, last changed 2010-08-16 20:00 by belopolsky. This issue is now closed.

Files
File name Uploaded Description Edit
int_subclass_pickle_problem.py andersjm, 2006-10-20 10:15
issue1581183-test-py3k.py belopolsky, 2010-06-29 02:31
issue1581183-test.diff belopolsky, 2010-06-29 16:02
Messages (6)
msg30326 - (view) Author: Anders J. Munch (andersjm) Date: 2006-10-20 10:15
I ran into problems pickling a set containing an int
subclass which holds a reference to an object which
holds a reference to the original object.

I reduced it to the attached
int_subclass_pickle_problem.py.  There are no problems
with pickle protocols 0 and 1, but the protocol 2 unit
tests fail with an exception.

This happens for pickle and cPickle both, although with
two different excpeptions.

cPickle:
TypeError: ('set objects are unhashable', <type 'set'>,
([set([1])],))

pickle:
  File "....\lib\pickle.py", line 244, in memoize
    assert id(obj) not in self.memo
AssertionError

(For the full tracebacks, run the attached script.)

I looked into if this was because int implemented
__reduce__ or __reduce_ex__, trumping my
__getstate__/__setstate__, but that doesn't seem to be
the case:

>>> int.__reduce_ex__ is object.__reduce_ex__
True
>>> int.__reduce__ is object.__reduce__
True

After the further simplification of replacing
   self.orig = [set([E.a]), set([E.a])]
with
   self.orig = E.a
cPickle starts working, but pickle still fails.

(Seen with Python 2.4.3 and Python 2.5, on W2K.)
msg84523 - (view) Author: Daniel Diniz (ajaksu2) (Python triager) Date: 2009-03-30 07:20
Confirmed on trunk, has tests.
msg108889 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-29 02:31
I reproduced the problem in py3k (both protocol 2 and 3).  See issue1581183-test-py3k.py attached.
msg108912 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-29 15:42
At least part of the problem has nothing to do with subclassing from int and instead is related to pickling objects with circular references.

I am attaching a patch that demonstrates the problem.  In issue1581183-test.diff, I modified memoize so that it does nothing rather than fails an assert if object is already in the memo.  This makes python and C implementations behave the same, but still fail to produce correct results. Pickling with protocol 2 break circular reference and instead creates two independent objects.
msg108919 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-29 17:10
Upon further investigation, I conclude that the problem is in the user code.  I am attaching int_subclass_pickle_problem_fixed.py which fixes the user code as follows:

     def __getnewargs__(self):
-        return (int(self), self.an_enum)
+        return (int(self), None)

Note that with this change, the object is pickled correctly because __setstate__ takes care of resetting self.an_enum.

The problem is that self-referential state should not be passed via __getnewargs__ mechanism.  This is because when pickler starts writing newargs, it is already committed to creating a new object (otherwise it would not need to serialize newargs in the first place.)  If the newargs contain the object that is being pickled, it will be serialized twice unless this situation is caught in memoize.

What can be improved, is the diagnostic and possibly documentation.  If after saving newargs, memo contains the object that is being pickled, an exception should be raised explaining that __getnewargs__() should not contain self-references.
msg110390 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-15 21:55
I am going to close this as a duplicate of issue 1062277.  The later has a patch, but Raymond questioned whether proposed feature is desirable. [msg47268]  I am -1, but will look at the patch there.
History
Date User Action Args
2010-08-16 20:00:40belopolskysetstatus: pending -> closed
2010-07-15 21:55:54belopolskysetstatus: open -> pending

nosy: + rhettinger
messages: + msg110390

superseder: Pickle breakage with reduction of recursive structures
resolution: duplicate
2010-07-13 20:08:10belopolskysetnosy: belopolsky, andersjm, ajaksu2, alexandre.vassalotti
components: + Documentation
2010-06-29 17:10:54belopolskysetpriority: normal -> low
keywords: - patch
messages: + msg108919
2010-06-29 16:02:05belopolskysetfiles: + issue1581183-test.diff
2010-06-29 16:01:48belopolskysetfiles: - issue1581183-test.diff
2010-06-29 15:42:35belopolskysetfiles: + issue1581183-test.diff
assignee: belopolsky
messages: + msg108912

keywords: + patch
2010-06-29 02:31:55belopolskysetfiles: + issue1581183-test-py3k.py
versions: + Python 3.2
nosy: + alexandre.vassalotti, belopolsky

messages: + msg108889
2009-03-30 07:20:14ajaksu2setversions: + Python 2.6
type: behavior

nosy: + ajaksu2
title: pickle protocol 2 failure on int subclass -> pickle protocol 2 failure on int subclass
messages: + msg84523
stage: needs patch
2006-10-20 10:15:05andersjmcreate