classification
Title: When deepcopying, don't store immutable objects in the memo dict
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.4, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Carl.Friedrich.Bolz, alex, python-dev
Priority: normal Keywords: patch

Created on 2011-06-27 15:29 by alex, last changed 2011-06-27 21:22 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
d.diff alex, 2011-06-27 15:29 review
d.diff alex, 2011-06-27 15:50 review
d.diff alex, 2011-06-27 17:05 review
Messages (5)
msg139300 - (view) Author: Alex Gaynor (alex) * (Python committer) Date: 2011-06-27 15:29
All storing immutable objects in the memo dict does is slow stuff down, due to having a larger hash table, and on some other Python's causing hilarious levels of GC pressure.  Using http://paste.pocoo.org/show/421310/ as a benchmark, CPython get's a 2x speedup on the deepcopy portion, and PyPy a 20x.  Patch is attached.
msg139303 - (view) Author: Alex Gaynor (alex) * (Python committer) Date: 2011-06-27 15:50
A slightly cleverer version (or less clever, depending on how you approach the issue) that also works with tuples of immutable content.
msg139315 - (view) Author: Alex Gaynor (alex) * (Python committer) Date: 2011-06-27 17:05
Switched to using assertIs, as merwok suggested.
msg139322 - (view) Author: Alex Gaynor (alex) * (Python committer) Date: 2011-06-27 18:34
Amaury points out: this is not strictly about immutable objects, but rather objects who's deepcopy is themselves (identity-wise), in some (rare I think) cases this could provide a slowdown.  Specifically a case of [(1, 2, 3)] * 10000 would be slower, because it would review each tuple individually, rather than using the memo'd instance.  I suspect this case is not so common (to have the same identity object, who's deepcopy is itself such as a tuple or object with custom __deepcopy__, many times in a deepcopy object graph), but I have no proof of this.
msg139327 - (view) Author: Roundup Robot (python-dev) Date: 2011-06-27 21:22
New changeset e24ad85e9608 by Benjamin Peterson in branch 'default':
don't memoize objects that are their own copies (closes #12422)
http://hg.python.org/cpython/rev/e24ad85e9608
History
Date User Action Args
2011-06-27 21:22:57python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg139327

resolution: fixed
stage: resolved
2011-06-27 18:34:51alexsetmessages: + msg139322
2011-06-27 17:05:06alexsetfiles: + d.diff

messages: + msg139315
2011-06-27 15:50:23alexsetfiles: + d.diff

messages: + msg139303
2011-06-27 15:41:09Carl.Friedrich.Bolzsetnosy: + Carl.Friedrich.Bolz
2011-06-27 15:29:44alexcreate