classification
Title: defining persistent_id in _pickle.Pickler subclass causes reference cycle
Type: resource usage Stage: resolved
Components: Extension Modules Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: alexandre.vassalotti, cwitty, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2016-10-11 16:14 by cwitty, last changed 2018-01-05 17:34 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
pickle_reference_cycle.py cwitty, 2016-10-11 16:14
Pull Requests
URL Status Linked Edit
PR 4080 merged serhiy.storchaka, 2017-10-23 09:23
PR 4653 merged python-dev, 2017-11-30 20:48
Messages (4)
msg278495 - (view) Author: Carl Witty (cwitty) Date: 2016-10-11 16:14
On creation, _pickle.Pickler caches any .persistent_id() method defined by a subclass (in the pers_func field of PicklerObject).  This causes a reference cycle (pickler -> bound method of pickler -> pickler), so the pickler is held in memory until the next cycle collection.  (Then, because of the pickler's memo table, any objects that this pickler has pickled are also held until the next cycle collection.)

Looking at the source code, it looks like the same thing would happen with _pickle.Unpickler and .persistent_load(), but I haven't tested it.  Any fix should be applied to both classes.

I've attached a test file; when I run it with "python3 pickle_reference_cycle.py", all 3 print statements are executed.  I would prefer it if "Oops, still here" was not printed.  (I'm using Debian's python3.5 package, version 3.5.2-4 for amd64, but I believe the problem occurs across many versions of python3, looking at the history of _pickle.c.)

I don't see how to fix the problem with no performance impact.  (Setting pers_func at the beginning of dump() and clearing it at the end would have approximately the same performance in the common case that only one object was dumped per pickler, but would be slower when dumping multiple objects.)  If you decide not to fix the problem, could you at least describe the problem and a workaround in the documentation?
msg304786 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-23 09:24
PR 4080 converts bound methods into unbound methods if possible.
msg307341 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-11-30 20:48
New changeset 986375ebde0dd5ff2b7349e445a06bd28a3a8ee2 by Serhiy Storchaka in branch 'master':
bpo-28416: Break reference cycles in Pickler and Unpickler subclasses (#4080)
https://github.com/python/cpython/commit/986375ebde0dd5ff2b7349e445a06bd28a3a8ee2
msg307345 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-11-30 21:30
New changeset c91bf742e542dceaf71042a44b5a04fb08bdda70 by Serhiy Storchaka (Miss Islington (bot)) in branch '3.6':
bpo-28416: Break reference cycles in Pickler and Unpickler subclasses (GH-4080) (#4653)
https://github.com/python/cpython/commit/c91bf742e542dceaf71042a44b5a04fb08bdda70
History
Date User Action Args
2018-01-05 17:34:05serhiy.storchakasetstatus: open -> closed
stage: patch review -> resolved
resolution: fixed
versions: - Python 2.7
2017-11-30 21:30:41serhiy.storchakasetmessages: + msg307345
2017-11-30 20:48:42python-devsetpull_requests: + pull_request4565
2017-11-30 20:48:33serhiy.storchakasetmessages: + msg307341
2017-10-23 09:24:56serhiy.storchakasetmessages: + msg304786
2017-10-23 09:23:33serhiy.storchakasetkeywords: + patch
stage: patch review
pull_requests: + pull_request4050
2017-10-22 09:52:51serhiy.storchakasetassignee: serhiy.storchaka
versions: - Python 3.5
2016-10-11 19:12:08serhiy.storchakasetnosy: + serhiy.storchaka

type: behavior -> resource usage
versions: + Python 2.7, Python 3.6, Python 3.7
2016-10-11 17:59:23SilentGhostsetnosy: + alexandre.vassalotti
2016-10-11 16:14:11cwittycreate