This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author cwitty
Recipients cwitty
Date 2016-10-11.16:14:11
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1476202451.78.0.408768500752.issue28416@psf.upfronthosting.co.za>
In-reply-to
Content
On creation, _pickle.Pickler caches any .persistent_id() method defined by a subclass (in the pers_func field of PicklerObject).  This causes a reference cycle (pickler -> bound method of pickler -> pickler), so the pickler is held in memory until the next cycle collection.  (Then, because of the pickler's memo table, any objects that this pickler has pickled are also held until the next cycle collection.)

Looking at the source code, it looks like the same thing would happen with _pickle.Unpickler and .persistent_load(), but I haven't tested it.  Any fix should be applied to both classes.

I've attached a test file; when I run it with "python3 pickle_reference_cycle.py", all 3 print statements are executed.  I would prefer it if "Oops, still here" was not printed.  (I'm using Debian's python3.5 package, version 3.5.2-4 for amd64, but I believe the problem occurs across many versions of python3, looking at the history of _pickle.c.)

I don't see how to fix the problem with no performance impact.  (Setting pers_func at the beginning of dump() and clearing it at the end would have approximately the same performance in the common case that only one object was dumped per pickler, but would be slower when dumping multiple objects.)  If you decide not to fix the problem, could you at least describe the problem and a workaround in the documentation?
History
Date User Action Args
2016-10-11 16:14:12cwittysetrecipients: + cwitty
2016-10-11 16:14:11cwittysetmessageid: <1476202451.78.0.408768500752.issue28416@psf.upfronthosting.co.za>
2016-10-11 16:14:11cwittylinkissue28416 messages
2016-10-11 16:14:11cwittycreate