Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pickle.py is limited by python's call stack #47369

Open
habnabit mannequin opened this issue Jun 16, 2008 · 17 comments
Open

pickle.py is limited by python's call stack #47369

habnabit mannequin opened this issue Jun 16, 2008 · 17 comments
Labels
3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@habnabit
Copy link
Mannequin

habnabit mannequin commented Jun 16, 2008

BPO 3119
Nosy @jcea, @pitrou, @avassalotti, @serhiy-storchaka, @matrixise
Files
  • pickle.patch: svn diff output against the python trunk.
  • pickle2.patch: svn diff output against the python trunk, take two.
  • pickle3.patch
  • pickle4.patch
  • graphtest.py: Example code triggering the issue
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2008-06-16.04:06:24.505>
    labels = ['3.7', 'type-bug', 'library']
    title = "pickle.py is limited by python's call stack"
    updated_at = <Date 2016-10-04.20:46:59.117>
    user = 'https://bugs.python.org/habnabit'

    bugs.python.org fields:

    activity = <Date 2016-10-04.20:46:59.117>
    actor = 'habnabit'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2008-06-16.04:06:24.505>
    creator = 'habnabit'
    dependencies = []
    files = ['10638', '11168', '13654', '18073', '43897']
    hgrepos = []
    issue_num = 3119
    keywords = ['patch', 'needs review']
    message_count = 17.0
    messages = ['68262', '69007', '69118', '69126', '69620', '71510', '85761', '85786', '101451', '101452', '110642', '110834', '268119', '270752', '271394', '278073', '278088']
    nosy_count = 9.0
    nosy_names = ['jafo', 'jcea', 'pitrou', 'alexandre.vassalotti', 'schmir', 'tleeuwenburg@gmail.com', 'serhiy.storchaka', 'matrixise', 'gavento']
    pr_nums = []
    priority = 'low'
    resolution = None
    stage = 'needs patch'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue3119'
    versions = ['Python 3.7']

    @habnabit
    Copy link
    Mannequin Author

    habnabit mannequin commented Jun 16, 2008

    Currently, pickle.py in the stdlib is limited by the python call stack.
    For deeply recursive data structures, the default recursion limit of 1000
    is not enough. The patch attached modifies pickle.py to instead use a
    deque object as a call stack. Pickler.save and other methods that increase
    the recursion depth are now generators which may yield either another
    generator or None, where yielding a generator adds it to the call stack.

    @habnabit habnabit mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Jun 16, 2008
    @pitrou
    Copy link
    Member

    pitrou commented Jun 30, 2008

    Interesting. Why do you use a deque? You never seem to prepend or pop
    from the beginning of it, so using a plain list should be as fast.

    Also, your patch should add one or several unit tests to enforce the new
    behaviour.

    @habnabit
    Copy link
    Mannequin Author

    habnabit mannequin commented Jul 2, 2008

    Ah, I didn't know that a list would be as fast for appending and popping.
    I knew that lists were optimized for .append() and .pop(), but I didn't
    know that a list would be just as fast as a deque if it was just used as a
    stack.

    And I'll be happy to write unit tests if it can be pointed out to me how
    exactly they can be written. Should it just test to make sure pickling a
    deeply nested object hierarchy can be pickled without raising a
    RuntimeError? I tried to make this as transparent as possible of a change.

    @pitrou
    Copy link
    Member

    pitrou commented Jul 2, 2008

    Hi,

    Le mercredi 02 juillet 2008 à 20:43 +0000, Aaron Gallagher a écrit :

    Aaron Gallagher <habnabit@gmail.com> added the comment:

    Ah, I didn't know that a list would be as fast for appending and popping.
    I knew that lists were optimized for .append() and .pop(), but I didn't
    know that a list would be just as fast as a deque if it was just used as a
    stack.

    If you have a few minutes to spare, the best would be to measure the
    speed of both possibilities, and opt for the fastest.

    And I'll be happy to write unit tests if it can be pointed out to me how
    exactly they can be written. Should it just test to make sure pickling a
    deeply nested object hierarchy can be pickled without raising a
    RuntimeError?

    Yes, exactly. Just create a very deeply nested object (a list should be
    sufficient I suppose), pickle it, unpickle it, and check that the result
    is equal to the original. No need to do more. Don't bother catching the
    eventual RuntimeError, if it is raised the test will be failed anyway.

    Since your patch only regards the pickle module (not cPickle) I suppose
    the test could go directly into Lib/test/test_pickle.py.

    (oh and don't forget to check the test case fails without your
    patch :-))

    Thanks !

    @avassalotti
    Copy link
    Member

    On some benchmark of my own, I get a good 2x slowdown when this patch is
    applied. The benchmark consists of loading ~1.4 millions objects (mostly
    dict, list, str and int) from a pickle string. It is basically a torture
    test for the inner function calls overhead.

    Anyway, the slowdown doesn't bother me much. I think a "stackless"
    pickle module would be well appreciated by many Python users.

    There is a few things that stops me from applying this patch. First, the
    patch will break subclasses of Pickler that relies on the save() method
    (Although the method is an implementation detail that shouldn't used).
    Therefore, any patches that modifies the API of save() is straight out
    for 2.x. And for Python 3.0, your patch should also remove recursion
    from the _pickle module (the transparent C accelerator for pickle).

    @habnabit
    Copy link
    Mannequin Author

    habnabit mannequin commented Aug 20, 2008

    Alright, sorry this took so long. Hopefully this can still be included in
    3.0.
    Included is a patch that no longer uses collections.deque and also adds a
    test case to test/test_pickle.py. The test catches RuntimeError and fails
    the unittest. I didn't see anything that would behave like the opposite of
    assertRaises except letting the exception propagate.

    @tleeuwenburggmailcom
    Copy link
    Mannequin

    tleeuwenburggmailcom mannequin commented Apr 8, 2009

    Aaron,

    Could you please upload another patch against the current trunk, then
    the issue could be flagged as requiring a patch review?

    Thanks,
    -Tennessee

    @habnabit
    Copy link
    Mannequin Author

    habnabit mannequin commented Apr 8, 2009

    Okay, here's a new version for the py3k trunk. I'm assuming that this is
    not going to make it into 2.x at all, because of the API changes. This
    patch only touches the python version of the code and adds a unit test
    for testing whether pickle works with arbitrary nesting depth in
    test.pickletest. The test is disabled for the C pickle tests currently,
    since it causes python to run out of stack space and crash.

    So, does _pickle.Pickler's API need to be changed as well? Right now,
    pickle._Pickler.dump expects save (and save expects the other save_*
    methods) to either return None or an iterable, where the iterable yields
    either None or further iterables. I'm sure it wouldn't be hard to make
    _pickle.Pickler.dump work the same way, but since C doesn't have
    generators, I don't know how exactly the API would translate.

    @jafo
    Copy link
    Mannequin

    jafo mannequin commented Mar 21, 2010

    Sorry for the delay in getting to this patch.

    I've reviewed this patch and it seems fine to me. The only thing I see outstanding is the recommendation made by Alexandre about changing the C module to also implement this.

    Aaron: You may want to take your question on the implementation to the python-dev mailing list: Whether it is necessary to implement it in the C module, and if so suggestions on how or help on doing it. Is that something you can do, Aaron?

    @jafo
    Copy link
    Mannequin

    jafo mannequin commented Mar 21, 2010

    Ugh, I forgot to check the output of my test run before submitting the last reply. With the patch applied, "make test" fails with:

    test_pickle
    Fatal Python error: Cannot recover from stack overflow.
    Fatal Python error: Cannot recover from stack overflow.
    make: *** [test] Aborted (core dumped)

    This is on Python 3 trunk, with the pickle3.patch applied (which applied cleanly).

    For the Misc/NEWS I propose (in the Library section):

    • Issue bpo-3119: pickle.py can now handle deeply-nested data-structures
      without reaching the Python call stack limit. NOTE: the pickle save()
      method is now a generator, though sub-classes of Pickler shouldn't
      override save().

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Jul 18, 2010

    Sean has stated that the patch seems fine, what needs to be done to take this forward?

    @habnabit
    Copy link
    Mannequin Author

    habnabit mannequin commented Jul 19, 2010

    Here's a patch that fixes the unit tests. A new test was added that tried to test the C implementation of this though none exists.

    @gavento
    Copy link
    Mannequin

    gavento mannequin commented Jun 10, 2016

    Hey all,

    I would like to patch the C _pickle module to be non-recursive and help this patch go through. I have an idea on how to do that with a very small amount of changes below and I would like to get feedback and improvements before implementing it in a particular way or getting into details (basically to check if that is acceptable and whether I missed something major).

    Also, what is missing in the Python patch to get it merged?

    == Nonrecursive _pickle:

    First, observe that all the save_* recursive calls do not rely on the return value of save_* except for error checking (which is propagated as -1 to the top). Also, almost all work in save_* is done before calling save() recursively (most of the work after calling save()s is decref or other minor cleanup).

    I would propose a linked list of structures acting as a stack of objects and byte-sequences to be written (for the marks and opcodes). Imagine something like the following (a rough sketch, I need to check all the required info properly later):

    struct _PickleStack {_PickleStack* next; PyObject *obj; char write_opcode; /* ... objects to decref / cleanup after the write */ /* possibly pers_save flags */ };

    The pickle stack would be a part of Pickler and save() would push a value onto that stack instead of recursion. (The order of save() and calls within save_*() needs to be reversed, though.)

    Most objects would push a fixed number of objects to the pickle stack, for lists and dicts I would make pickle stack entries for all the items at once (e.g. as in batch_list()). An alternative would be to store the last item and the open iterator, but that seems wasteful.

    This should not slow things down significantly (if at all), as the only overhead is having own stack of small structures (likely less than stackframe size of save_*). If frequent de/allocation will be a problem, I would modify the allocation strategy later (to blocks or such).

    Are there any issues with changing the working of save() internally? The functionality of reduce, custom Picklers, persistent ids and dispatch tables should stay the same. The unpickling is not recursive and needs no changes (and has its own custom stack impl).

    Looking forward to your comments.

    @matrixise
    Copy link
    Member

    hello, I think we can close this issue because Python 3.1 is not supported. or we keep it if this issue is confirmed for Python 3.6.

    @gavento
    Copy link
    Mannequin

    gavento mannequin commented Jul 26, 2016

    The issue is still present in Python 2.7.12 and Python 3.5.2, and the implementation has not been changed in the master branch either.
    You can test it with the attached program constructing a graph (simplified, but a realistic application), or with the following obviously deep construction:

    import pickle, _pickle
    a=()
    for i in range(1000): a = (a,)
    pickle.dumps(a) # or _pickle.dumps(a)

    Any further comments or thoughts on the solution?

    @serhiy-storchaka
    Copy link
    Member

    It would be nice to support unlimitedly nested structures. C stack is more hard limit than Python stack. But the code of the pickle module (especially C implementation) is complicated and hardly optimized. I think it would be not easy to implement stackless version without significant loss of speed and readability. You can try, but I don't have much hope.

    @serhiy-storchaka serhiy-storchaka added the 3.7 (EOL) end of life label Oct 4, 2016
    @habnabit
    Copy link
    Mannequin Author

    habnabit mannequin commented Oct 4, 2016

    Definitely not interested in pickle at all anymore.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    Status: No status
    Development

    No branches or pull requests

    4 participants