Author eltoder
Recipients ajaksu2, alex, alexandre.vassalotti, bcroq, belopolsky, ddorfman, eltoder, mstefanro, pitrou, rhettinger, zzzeek
Date 2012-12-29.00:44:16
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1356741857.38.0.685467881353.issue1062277@psf.upfronthosting.co.za>
In-reply-to
Content
To recap, the issue is that pickle doesn't handle recursion via reduce arguments (i.e. arguments to the constructor function as returned in 2nd element of the tuple from __reduce__). This leads to 2 kind of effects:

class C:
    def __init__(self, x=None):
        self.x = x if x is not None else self
    def __reduce__(self):
        return C, (self.x,)

A. Recursion error:
>>> pickle.dumps(C())
Traceback (most recent call last):
  File "<pyshell#5>", line 1, in <module>
    pickle.dumps(C())
RuntimeError: maximum recursion depth exceeded while calling a Python object

This cannot be helped with the current reduce protocol. The error may be improved, but that's about it.

B. Duplication of object when unpickling:
>>> c = C([])
>>> c.x.append(c)
>>> c.x[0] is c
True
>>> c2 = pickle.loads(pickle.dumps(c))
>>> c2.x[0] is c2
False

This happens because list (or another recursion-friendly type) inside the problematic object handles recursion, but we still get the outer object, identical to the inner one.
This can be solved the same way as for tuple:
>>> t=([],1,2)
>>> t[0].append(t)
>>> t2 = pickle.loads(pickle.dumps(t))
>>> t2[0][0] is t2
True
>>> pickletools.dis(pickle.dumps(t))
    0: \x80 PROTO      3
    2: ]    EMPTY_LIST
    3: q    BINPUT     0
    5: h    BINGET     0
    7: K    BININT1    1
    9: K    BININT1    2
   11: \x87 TUPLE3
   12: q    BINPUT     1
   14: a    APPEND
   15: K    BININT1    1
   17: K    BININT1    2
   19: 0    POP
   20: 0    POP
   21: 0    POP
   22: h    BINGET     1
   24: .    STOP

After pickling its elements tuple checks if it got into memo. If it did, this means it was pickled by one of the elements, so it POPs all elements from the stack and fetches itself via GET. This is somewhat inefficient, but probably the best it can do.

I suggest we do 3 things:

1. Improve the documentation for __reduce__ function. It should mention that all state that a) can potentially point back to the object and b) not strictly necessary in the constructor function should be passed via the 3rd element of __reduce__ tuple (aka state) instead of the 2nd element, and applied by __setstate__. This handles recursion in robust and optimal way.

2. Check that all built-in/standard types follow this advice. I see that Stefan Mihaila already fixed sets.

3. To fix case B above add the memo check from save_tuple to save_reduce. While at it, we can consider checking after pickling every element instead of after pickling all elements, so we reduce the number of POPs and the wastage they cause.
History
Date User Action Args
2012-12-29 00:44:17eltodersetrecipients: + eltoder, rhettinger, belopolsky, ddorfman, pitrou, ajaksu2, alexandre.vassalotti, alex, zzzeek, bcroq, mstefanro
2012-12-29 00:44:17eltodersetmessageid: <1356741857.38.0.685467881353.issue1062277@psf.upfronthosting.co.za>
2012-12-29 00:44:17eltoderlinkissue1062277 messages
2012-12-29 00:44:16eltodercreate