classification
Title: pickle.py is limited by python's call stack
Type: behavior Stage: needs patch
Components: Library (Lib) Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: alexandre.vassalotti, gavento, jafo, jcea, matrixise, pitrou, schmir, serhiy.storchaka, tleeuwenburg@gmail.com
Priority: low Keywords: needs review, patch

Created on 2008-06-16 04:06 by habnabit, last changed 2016-10-04 20:46 by habnabit.

Files
File name Uploaded Description Edit
pickle.patch habnabit, 2008-06-16 04:06 svn diff output against the python trunk. review
pickle2.patch habnabit, 2008-08-20 04:50 svn diff output against the python trunk, take two. review
pickle3.patch habnabit, 2009-04-08 22:41 review
pickle4.patch habnabit, 2010-07-19 23:12
graphtest.py gavento, 2016-07-26 16:59 Example code triggering the issue
Messages (17)
msg68262 - (view) Author: Aaron Gallagher (habnabit) Date: 2008-06-16 04:06
Currently, pickle.py in the stdlib is limited by the python call stack. 
For deeply recursive data structures, the default recursion limit of 1000 
is not enough. The patch attached modifies pickle.py to instead use a 
deque object as a call stack. Pickler.save and other methods that increase 
the recursion depth are now generators which may yield either another 
generator or None, where yielding a generator adds it to the call stack.
msg69007 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-06-30 14:46
Interesting. Why do you use a deque? You never seem to prepend or pop
from the beginning of it, so using a plain list should be as fast.

Also, your patch should add one or several unit tests to enforce the new
behaviour.
msg69118 - (view) Author: Aaron Gallagher (habnabit) Date: 2008-07-02 20:43
Ah, I didn't know that a list would be as fast for appending and popping. 
I knew that lists were optimized for .append() and .pop(), but I didn't 
know that a list would be just as fast as a deque if it was just used as a 
stack.

And I'll be happy to write unit tests if it can be pointed out to me how 
exactly they can be written. Should it just test to make sure pickling a 
deeply nested object hierarchy can be pickled without raising a 
RuntimeError? I tried to make this as transparent as possible of a change.
msg69126 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-07-02 21:10
Hi,

Le mercredi 02 juillet 2008 à 20:43 +0000, Aaron Gallagher a écrit :
> Aaron Gallagher <habnabit@gmail.com> added the comment:
> 
> Ah, I didn't know that a list would be as fast for appending and popping. 
> I knew that lists were optimized for .append() and .pop(), but I didn't 
> know that a list would be just as fast as a deque if it was just used as a 
> stack.

If you have a few minutes to spare, the best would be to measure the
speed of both possibilities, and opt for the fastest. 

> And I'll be happy to write unit tests if it can be pointed out to me how 
> exactly they can be written. Should it just test to make sure pickling a 
> deeply nested object hierarchy can be pickled without raising a 
> RuntimeError?

Yes, exactly. Just create a very deeply nested object (a list should be
sufficient I suppose), pickle it, unpickle it, and check that the result
is equal to the original. No need to do more. Don't bother catching the
eventual RuntimeError, if it is raised the test will be failed anyway.

Since your patch only regards the pickle module (not cPickle) I suppose
the test could go directly into Lib/test/test_pickle.py.

(oh and don't forget to check the test case fails without your
patch :-))

Thanks !
msg69620 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2008-07-13 20:26
On some benchmark of my own, I get a good 2x slowdown when this patch is
applied. The benchmark consists of loading ~1.4 millions objects (mostly
dict, list, str and int) from a pickle string. It is basically a torture
test for the inner function calls overhead.

Anyway, the slowdown doesn't bother me much. I think a "stackless"
pickle module would be well appreciated by many Python users. 

There is a few things that stops me from applying this patch. First, the
patch will break subclasses of Pickler that relies on the save() method
(Although the method is an implementation detail that shouldn't used).
Therefore, any patches that modifies the API of save() is straight out
for 2.x. And for Python 3.0, your patch should also remove recursion
from the _pickle module (the transparent C accelerator for pickle).
msg71510 - (view) Author: Aaron Gallagher (habnabit) Date: 2008-08-20 04:50
Alright, sorry this took so long. Hopefully this can still be included in 
3.0.
Included is a patch that no longer uses collections.deque and also adds a 
test case to test/test_pickle.py. The test catches RuntimeError and fails 
the unittest. I didn't see anything that would behave like the opposite of 
assertRaises except letting the exception propagate.
msg85761 - (view) Author: Tennessee Leeuwenburg (tleeuwenburg@gmail.com) (Python triager) Date: 2009-04-08 01:09
Aaron,

Could you please upload another patch against the current trunk, then
the issue could be flagged as requiring a patch review?

Thanks,
-Tennessee
msg85786 - (view) Author: Aaron Gallagher (habnabit) Date: 2009-04-08 22:41
Okay, here's a new version for the py3k trunk. I'm assuming that this is 
not going to make it into 2.x at all, because of the API changes. This 
patch only touches the python version of the code and adds a unit test 
for testing whether pickle works with arbitrary nesting depth in 
test.pickletest. The test is disabled for the C pickle tests currently, 
since it causes python to run out of stack space and crash.

So, does _pickle.Pickler's API need to be changed as well? Right now, 
pickle._Pickler.dump expects save (and save expects the other save_* 
methods) to either return None or an iterable, where the iterable yields 
either None or further iterables. I'm sure it wouldn't be hard to make 
_pickle.Pickler.dump work the same way, but since C doesn't have 
generators, I don't know how exactly the API would translate.
msg101451 - (view) Author: Sean Reifschneider (jafo) * (Python committer) Date: 2010-03-21 20:16
Sorry for the delay in getting to this patch.

I've reviewed this patch and it seems fine to me.  The only thing I see outstanding is the recommendation made by Alexandre about changing the C module to also implement this.

Aaron: You may want to take your question on the implementation to the python-dev mailing list: Whether it is necessary to implement it in the C module, and if so suggestions on how or help on doing it.  Is that something you can do, Aaron?
msg101452 - (view) Author: Sean Reifschneider (jafo) * (Python committer) Date: 2010-03-21 20:19
Ugh, I forgot to check the output of my test run before submitting the last reply.  With the patch applied, "make test" fails with:

test_pickle
Fatal Python error: Cannot recover from stack overflow.
Fatal Python error: Cannot recover from stack overflow.
make: *** [test] Aborted (core dumped)

This is on Python 3 trunk, with the pickle3.patch applied (which applied cleanly).

For the Misc/NEWS I propose (in the Library section):

- Issue #3119: pickle.py can now handle deeply-nested data-structures
  without reaching the Python call stack limit.  NOTE: the pickle save()
  method is now a generator, though sub-classes of Pickler shouldn't
  override save().
msg110642 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-07-18 12:14
Sean has stated that the patch seems fine, what needs to be done to take this forward?
msg110834 - (view) Author: Aaron Gallagher (habnabit) Date: 2010-07-19 23:12
Here's a patch that fixes the unit tests. A new test was added that tried to test the C implementation of this though none exists.
msg268119 - (view) Author: Tomas Gavenciak (gavento) * Date: 2016-06-10 12:51
Hey all,

I would like to patch the C _pickle module to be non-recursive and help this patch go through. I have an idea on how to do that with a very small amount of changes below and I would like to get feedback and improvements before implementing it in a particular way or getting into details (basically to check if that is acceptable and whether I missed something major).

Also, what is missing in the Python patch to get it merged?

== Nonrecursive _pickle:

First, observe that all the save_* recursive calls do not rely on the return value of save_* except for error checking (which is propagated as -1 to the top). Also, almost all work in save_* is done before calling save() recursively (most of the work after calling save()s is decref or other minor cleanup).

I would propose a linked list of structures acting as a stack of objects and byte-sequences to be written (for the marks and opcodes). Imagine something like the following (a rough sketch, I need to check all the required info properly later):

struct _PickleStack {_PickleStack* next; PyObject *obj; char write_opcode; /* ... objects to decref / cleanup after the write */ /* possibly pers_save flags */ };

The pickle stack would be a part of Pickler and save() would push a value onto that stack instead of recursion. (The order of save() and calls within save_*() needs to be reversed, though.)

Most objects would push a fixed number of objects to the pickle stack, for lists and dicts I would make pickle stack entries for all the items at once (e.g. as in batch_list()). An alternative would be to store the last item and the open iterator, but that seems wasteful.

This should not slow things down significantly (if at all), as the only overhead is having own stack of small structures (likely less than stackframe size of save_*). If frequent de/allocation will be a problem, I would modify the allocation strategy later (to blocks or such).

Are there any issues with changing the working of save() internally? The functionality of reduce, custom Picklers, persistent ids and dispatch tables should stay the same. The unpickling is not recursive and needs no changes (and has its own custom stack impl).

Looking forward to your comments.
msg270752 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2016-07-18 13:19
hello, I think we can close this issue because Python 3.1 is not supported. or we keep it if this issue is confirmed for Python 3.6.
msg271394 - (view) Author: Tomas Gavenciak (gavento) * Date: 2016-07-26 16:59
The issue is still present in Python 2.7.12 and Python 3.5.2, and the implementation has not been changed in the master branch either.
You can test it with the attached program constructing a graph (simplified, but a realistic application), or with the following obviously deep construction:

import pickle, _pickle
a=()
for i in range(1000): a = (a,)
pickle.dumps(a) # or _pickle.dumps(a)

Any further comments or thoughts on the solution?
msg278073 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-04 18:34
It would be nice to support unlimitedly nested structures. C stack is more hard limit than Python stack. But the code of the pickle module (especially C implementation) is complicated and hardly optimized. I think it would be not easy to implement stackless version without significant loss of speed and readability. You can try, but I don't have much hope.
msg278088 - (view) Author: Aaron Gallagher (habnabit) Date: 2016-10-04 20:46
Definitely not interested in pickle at all anymore.
History
Date User Action Args
2016-10-04 20:46:59habnabitsetnosy: - habnabit
2016-10-04 20:46:28habnabitsetnosy: jafo, jcea, pitrou, alexandre.vassalotti, schmir, habnabit, tleeuwenburg@gmail.com, serhiy.storchaka, matrixise, gavento
messages: + msg278088
2016-10-04 18:35:05serhiy.storchakasetpriority: normal -> low
stage: patch review -> needs patch
2016-10-04 18:34:42serhiy.storchakasetmessages: + msg278073
versions: + Python 3.7, - Python 3.6
2016-07-26 16:59:41gaventosetfiles: + graphtest.py

messages: + msg271394
versions: + Python 3.6, - Python 3.1
2016-07-18 13:19:15matrixisesetnosy: + matrixise
messages: + msg270752
2016-06-10 12:51:39gaventosetnosy: + serhiy.storchaka, gavento
messages: + msg268119
2014-02-03 19:17:42BreamoreBoysetnosy: - BreamoreBoy
2011-02-12 16:06:10eric.araujosetnosy: jafo, jcea, pitrou, alexandre.vassalotti, schmir, habnabit, tleeuwenburg@gmail.com, BreamoreBoy
superseder: eliminate recursion in pickling ->
2010-07-19 23:12:43habnabitsetfiles: + pickle4.patch
keywords: + patch
messages: + msg110834
2010-07-18 12:14:36BreamoreBoysetnosy: + BreamoreBoy
messages: + msg110642
2010-03-21 20:19:21jafosetmessages: + msg101452
2010-03-21 20:16:39jafosetpriority: normal
nosy: + jafo
messages: + msg101451

2009-06-29 00:19:33alexandre.vassalottilinkissue2480 superseder
2009-06-29 00:19:33alexandre.vassalottiunlinkissue2480 dependencies
2009-04-17 02:57:47tleeuwenburg@gmail.comsetkeywords: + needs review, - patch
stage: patch review
2009-04-08 22:41:22habnabitsetfiles: + pickle3.patch

messages: + msg85786
versions: + Python 3.1, - Python 2.6, Python 3.0
2009-04-08 01:09:44tleeuwenburg@gmail.comsetnosy: + tleeuwenburg@gmail.com
messages: + msg85761
2008-08-20 04:50:19habnabitsetfiles: + pickle2.patch
messages: + msg71510
2008-07-13 20:26:25alexandre.vassalottisetnosy: + alexandre.vassalotti
messages: + msg69620
2008-07-07 05:16:58schmirsetnosy: + schmir
2008-07-07 04:48:05gregory.p.smithlinkissue2480 dependencies
2008-07-07 04:47:48gregory.p.smithsetsuperseder: eliminate recursion in pickling
2008-07-02 21:10:04pitrousetmessages: + msg69126
2008-07-02 20:43:52habnabitsetmessages: + msg69118
2008-06-30 14:46:04pitrousetnosy: + pitrou
messages: + msg69007
2008-06-30 11:23:43jceasetnosy: + jcea
2008-06-16 04:06:24habnabitcreate