Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pickle breakage with reduction of recursive structures #41150

Open
ddorfman mannequin opened this issue Nov 8, 2004 · 10 comments
Open

Pickle breakage with reduction of recursive structures #41150

ddorfman mannequin opened this issue Nov 8, 2004 · 10 comments
Labels
3.11 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@ddorfman
Copy link
Mannequin

ddorfman mannequin commented Nov 8, 2004

BPO 1062277
Nosy @rhettinger, @abalkin, @pitrou, @devdanzin, @avassalotti, @alex, @eltoder, @opensdh
Superseder
  • bpo-9269: Cannot pickle self-referencing sets
  • Files
  • redcycle.diff: Diff
  • issue1062277-test-py3k.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2004-11-08.08:06:54.000>
    labels = ['type-bug', 'library', '3.11']
    title = 'Pickle breakage with reduction of recursive structures'
    updated_at = <Date 2021-12-16.23:07:00.339>
    user = 'https://bugs.python.org/ddorfman'

    bugs.python.org fields:

    activity = <Date 2021-12-16.23:07:00.339>
    actor = 'ajaksu2'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2004-11-08.08:06:54.000>
    creator = 'ddorfman'
    dependencies = []
    files = ['6357', '18011']
    hgrepos = []
    issue_num = 1062277
    keywords = ['patch']
    message_count = 10.0
    messages = ['47267', '47268', '47269', '82105', '110350', '110868', '178449', '221898', '255046', '255067']
    nosy_count = 12.0
    nosy_names = ['rhettinger', 'belopolsky', 'ddorfman', 'pitrou', 'ajaksu2', 'alexandre.vassalotti', 'alex', 'zzzeek', 'bcroq', 'eltoder', 'mstefanro', 'herring']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'needs patch'
    status = 'open'
    superseder = '9269'
    type = 'behavior'
    url = 'https://bugs.python.org/issue1062277'
    versions = ['Python 3.11']

    @ddorfman
    Copy link
    Mannequin Author

    ddorfman mannequin commented Nov 8, 2004

    Fix problems related to reduce cycles during pickling. A
    "reduce cycle" is what happens when a __reduce__
    implementation returns an args that cycles back through the
    object it tried to reduce. This can't work because the
    unpickler has to call the constructor with those args, but
    it doesn't yet have that object to be able to place inside
    the args. There are two problems related to this:

    1. The standard reduce implementation for proto < 2 in
      copy_reg (_reduce_ex) doesn't correctly deal with
      recursive structures. reduce_2 in typeobject.c does the
      right thing by using listitems and dictitems. Fix
      _reduce_ex by making it do the same thing. This is okay
      for proto < 2 because those arguments are a pickler-
      side feature. Tested in test_stdreducecycle.

    2. Our pickle implementations don't check for reduce
      cycles. This is somewhat cosmetic except that stack
      overflow protection is imperfect (for cPickle), causing
      crashes, and some kinds of cycles trigger asserts (in
      pickle). Fixed in pickle and cPickle by introducing a
      reducing_now set; on entering save_reduce, the object
      id being saved must not be in that set or we've been
      called recursively while saving the callable or
      arguments. Tested in test_reduce_cycle.

    This shouldn't change any semantics. That reduce shouldn't
    introduce cycles through args isn't documented, but it can't
    work any other way.

    Possible improvement: If we want to support reducing of real
    immutable containers we might have to relax the reduction
    cycle test to give the cycle a chance of resolving itself
    normally (by constructing a partial object to pass to the
    constructor and filling it in later). I'm not sure if this
    trouble is worth it just to avoid writing a
    frozenset.__setstate__.

    See also: http://mail.python.
    org/pipermail/python-dev/2004-October/049714.html

    It would be very good if someone familiar with pickle
    reviewed this; I am still not very confident that I
    completely understand all the issues.

    @ddorfman ddorfman mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Nov 8, 2004
    @rhettinger
    Copy link
    Contributor

    Logged In: YES
    user_id=80475

    IMO, this is more of a feature request than a bug. Also, it
    needs to be fully thought-out and discussed. Putting this
    sort of thing in just before the Py2.4 release candidate is
    probably not wise.

    @ddorfman
    Copy link
    Mannequin Author

    ddorfman mannequin commented Nov 9, 2004

    Logged In: YES
    user_id=908995

    Triggering assertions (pickle) and producing incorrect output
    (cPickle) are certainly bugs, and being able to pickle a
    recursive structure is not a feature request. The copy_reg
    part of the patch is the real fix, and as it's almost a
    translation of what reduce_2 already does, it should be safe
    for 2.4. I agree that the rest of the patch--to check for
    cycles during pickling--should wait until 2.5.

    @devdanzin
    Copy link
    Mannequin

    devdanzin mannequin commented Feb 14, 2009

    Patch has test, can someone try it? I think it might be fixed already.

    @devdanzin devdanzin mannequin added type-feature A feature request or enhancement labels Feb 14, 2009
    @abalkin
    Copy link
    Member

    abalkin commented Jul 15, 2010

    The issue is still present in py3k. Attaching an updated patch with tests only. Is this the same as bpo-998998?

    @terryjreedy terryjreedy added stdlib Python modules in the Lib dir and removed interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Jul 20, 2010
    @abalkin
    Copy link
    Member

    abalkin commented Jul 20, 2010

    As I explained in msg110617 under bpo-9269, it is possible that we can do better than simply detect reduce cycles and bail out.

    I am torn between two options:

    1. Reject this patch and wait until a proper solution is found.

    2. Admit that better is the enemy of good and start with proper error reporting on reduce cycles by accepting this patch.

    My only concern about this patch is that it is likely to affect performance because pickling will have to maintain the "reducing_now" dictionary that is parallel to the memo dictionary.

    @BreamoreBoy BreamoreBoy mannequin added type-bug An unexpected behavior, bug, or error and removed type-feature A feature request or enhancement labels Aug 19, 2010
    @eltoder
    Copy link
    Mannequin

    eltoder mannequin commented Dec 29, 2012

    To recap, the issue is that pickle doesn't handle recursion via reduce arguments (i.e. arguments to the constructor function as returned in 2nd element of the tuple from __reduce__). This leads to 2 kind of effects:

    class C:
        def __init__(self, x=None):
            self.x = x if x is not None else self
        def __reduce__(self):
            return C, (self.x,)
    A. Recursion error:
    >>> pickle.dumps(C())
    Traceback (most recent call last):
      File "<pyshell#5>", line 1, in <module>
        pickle.dumps(C())
    RuntimeError: maximum recursion depth exceeded while calling a Python object

    This cannot be helped with the current reduce protocol. The error may be improved, but that's about it.

    B. Duplication of object when unpickling:
    >>> c = C([])
    >>> c.x.append(c)
    >>> c.x[0] is c
    True
    >>> c2 = pickle.loads(pickle.dumps(c))
    >>> c2.x[0] is c2
    False
    
    This happens because list (or another recursion-friendly type) inside the problematic object handles recursion, but we still get the outer object, identical to the inner one.
    This can be solved the same way as for tuple:
    >>> t=([],1,2)
    >>> t[0].append(t)
    >>> t2 = pickle.loads(pickle.dumps(t))
    >>> t2[0][0] is t2
    True
    >>> pickletools.dis(pickle.dumps(t))
        0: \x80 PROTO      3
        2: ]    EMPTY_LIST
        3: q    BINPUT     0
        5: h    BINGET     0
        7: K    BININT1    1
        9: K    BININT1    2
       11: \x87 TUPLE3
       12: q    BINPUT     1
       14: a    APPEND
       15: K    BININT1    1
       17: K    BININT1    2
       19: 0    POP
       20: 0    POP
       21: 0    POP
       22: h    BINGET     1
       24: .    STOP

    After pickling its elements tuple checks if it got into memo. If it did, this means it was pickled by one of the elements, so it POPs all elements from the stack and fetches itself via GET. This is somewhat inefficient, but probably the best it can do.

    I suggest we do 3 things:

    1. Improve the documentation for __reduce__ function. It should mention that all state that a) can potentially point back to the object and b) not strictly necessary in the constructor function should be passed via the 3rd element of __reduce__ tuple (aka state) instead of the 2nd element, and applied by __setstate__. This handles recursion in robust and optimal way.

    2. Check that all built-in/standard types follow this advice. I see that Stefan Mihaila already fixed sets.

    3. To fix case B above add the memo check from save_tuple to save_reduce. While at it, we can consider checking after pickling every element instead of after pickling all elements, so we reduce the number of POPs and the wastage they cause.

    @abalkin
    Copy link
    Member

    abalkin commented Jun 29, 2014

    Problem "B" has been resolved, but problem "A" is still there.

    Python 3.4.1 (default, Jun 29 2014, 15:26:46)
    [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> class C:
    ...     def __init__(self, x=None):
    ...         self.x = x if x is not None else self
    ...     def __reduce__(self):
    ...         return C, (self.x,)
    ...
    >>> import pickle
    >>> pickle.dumps(C())
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    RuntimeError: maximum recursion depth exceeded while calling a Python object
    >>> c = C([])
    >>> c.x.append(c)
    >>> c.x[0] is c
    True
    >>> c2 = pickle.loads(pickle.dumps(c))
    >>> c2.x[0] is c2
    True

    @opensdh
    Copy link
    Mannequin

    opensdh mannequin commented Nov 21, 2015

    Re msg110868: it's impossible to resolve a __reduce__ loop that involves only immutable intermediate objects (including none at all):

    class Direct:
      def __reduce__(self): return id,(self,) # obviously impossible
    class Indirect:
      # Can't create either the new object or the tuple first!
      def __reduce__(self): return id,((self,),)

    With pre-existing mutable objects, the same trick as for tuples certainly could be applied:

    class Mutable:
      def __init__(self): self.bracketed=[self]
      # Create list, REDUCE, then populate list
      def __reduce__(self): return id,(self.bracketed,)

    The way an analog to the tuple implementation would deal with this would be something like

    id # the dummy "constructor" in this example
    [] # empty list, memoized immediately as, say, #5
    id # try again to store the Mutable
    @5 # fetch from memo
    T1 # make singleton tuple
    R # reduce (have now succeeded in "making" the Mutable as, say, #20)
    APP # to the list
    T1 # finish the first __reduce__ attempt
    POP # abandon it, since the object has been created
    POP # abandon the "id" as well
    @20 # fetch the previous answer

    If the list were created directly in __reduce__, you'd still recurse infinitely trying to create each new list object. On the other hand, even complicated cases like this should work:

    class Switch:
      def __reduce__(self): return id,(self.lst,)
    a=Switch(); b=Switch()
    a.list=[b,a]; b.list=[a,b]

    where the pickling of, say, a would look something like

    id
    [] # -> #17
    id # trying b now
    [] # -> #42
    id # trying a again
    @17
    T1
    R # a done (#88)
    APP
    id # trying b again
    @42
    T1
    R # b done (#101)
    APP # b's list now populated
    POP
    POP # b abandoned
    @101 # b fetched
    APP # finally building a's list
    @88 # a is long done
    APP # a's list now populated
    POP
    POP # a abandoned
    @88 # final value: a

    Perhaps a technique for distinguishing these cases is to look for new objects having been created since the last time we visited an object. If there are none, we're in the immutable case and we lose. If there are yet more created between the second and third visits, on the other hand, we're generating more objects from __reduce__ every time and should again abort.

    @eltoder
    Copy link
    Mannequin

    eltoder mannequin commented Nov 21, 2015

    Davis, the possible cases (i.e. recursion via appropriate mutable objects) are already supported, and the impossible cases are, well, impossible. The only open issue is whether to implement better error handling for infinite recursion problems, instead of relying on built-in maximum recursion depth.

    @devdanzin devdanzin mannequin added 3.11 only security fixes labels Dec 16, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    Status: No status
    Development

    No branches or pull requests

    3 participants