This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author gavento
Recipients alexandre.vassalotti, gavento, habnabit, jafo, jcea, pitrou, schmir, serhiy.storchaka, tleeuwenburg@gmail.com
Date 2016-06-10.12:51:39
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1465563099.98.0.595674307935.issue3119@psf.upfronthosting.co.za>
In-reply-to
Content
Hey all,

I would like to patch the C _pickle module to be non-recursive and help this patch go through. I have an idea on how to do that with a very small amount of changes below and I would like to get feedback and improvements before implementing it in a particular way or getting into details (basically to check if that is acceptable and whether I missed something major).

Also, what is missing in the Python patch to get it merged?

== Nonrecursive _pickle:

First, observe that all the save_* recursive calls do not rely on the return value of save_* except for error checking (which is propagated as -1 to the top). Also, almost all work in save_* is done before calling save() recursively (most of the work after calling save()s is decref or other minor cleanup).

I would propose a linked list of structures acting as a stack of objects and byte-sequences to be written (for the marks and opcodes). Imagine something like the following (a rough sketch, I need to check all the required info properly later):

struct _PickleStack {_PickleStack* next; PyObject *obj; char write_opcode; /* ... objects to decref / cleanup after the write */ /* possibly pers_save flags */ };

The pickle stack would be a part of Pickler and save() would push a value onto that stack instead of recursion. (The order of save() and calls within save_*() needs to be reversed, though.)

Most objects would push a fixed number of objects to the pickle stack, for lists and dicts I would make pickle stack entries for all the items at once (e.g. as in batch_list()). An alternative would be to store the last item and the open iterator, but that seems wasteful.

This should not slow things down significantly (if at all), as the only overhead is having own stack of small structures (likely less than stackframe size of save_*). If frequent de/allocation will be a problem, I would modify the allocation strategy later (to blocks or such).

Are there any issues with changing the working of save() internally? The functionality of reduce, custom Picklers, persistent ids and dispatch tables should stay the same. The unpickling is not recursive and needs no changes (and has its own custom stack impl).

Looking forward to your comments.
History
Date User Action Args
2016-06-10 12:51:41gaventosetrecipients: + gavento, jafo, jcea, pitrou, alexandre.vassalotti, schmir, habnabit, tleeuwenburg@gmail.com, serhiy.storchaka
2016-06-10 12:51:39gaventosetmessageid: <1465563099.98.0.595674307935.issue3119@psf.upfronthosting.co.za>
2016-06-10 12:51:39gaventolinkissue3119 messages
2016-06-10 12:51:39gaventocreate