classification
Title: Improve copy.copy speed for built-in types (list/set/dict)
Type: performance Stage: resolved
Components: Library (Lib) Versions: Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: alexandre.vassalotti, josh.r, python-dev, rhettinger, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2016-01-20 18:07 by josh.r, last changed 2016-03-06 13:05 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
copy_speedup.patch serhiy.storchaka, 2016-01-21 09:19 review
Messages (10)
msg258704 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2016-01-20 18:07
copy.copy uses a relatively high overhead approach to copying list, set and dict, using:

    def _copy_with_constructor(x):
        return type(x)(x)

This is despite the fact that all three types implement a .copy() method, and there is already a defined method:

    def _copy_with_copy_method(x):
        return x.copy()

that (in %timeit tests) runs with substantially less overhead, in percentage terms; for empty lists, sets and dicts, calling _copy_with_constructor or _copy_with_copy_method directly on them, the times on my machine (Python 3.5.0, Linux x86-64) are:

empty list: 281 ns (constructor), 137 ns (method)
empty set: 275 ns (constructor), 175 ns (method)
empty dict: 372 ns (constructor), 211 ns (method)

The method costs could be trimmed further if _copy_with_copy_method was changed from a Python implementation to using operator.methodcaller, e.g.

    try:
        # If we have _operator, avoids cost of importing Python code; it's part of core modules in CPython, already loaded for free
        from _operator import methodcaller  
    except ImportError:
        from operator import methodcaller

    _copy_with_copy_method = methodcaller('copy')

This doesn't save a whole lot more (shaves another 9-17 ns off the times for the Python version of _copy_with_copy_method), but it's nice in that it avoids reinventing the wheel in the copy module. Combining the two changes (to use methodcaller for _copy_with_copy_method and to have list, set and dict use _copy_with_copy_method) means we can get rid of both Python defined functions in favor of a single use of operator.methodcaller used by all types that previously used either of them.

Obviously not a high priority fix, but I noticed this while trying to figure out a way to fix zlib's lack of support in the copy module ( #26166 which apparently duplicates #25007 ) and how to work around it.
msg258712 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-01-20 20:21
methodcaller is not needed. We can use just list.copy etc.

Proposed patch speeds up copying list, dict, set, bytearray, slice, NotImplemented. It makes deepcopying list, tuple, dict a little faster.  It makes the code for copying and deepcopying using reduce protocol cleaner and a little faster. It cleans up the module and adds new tests for builtin types.
msg258731 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2016-01-21 03:10
Good point. Though I don't see any attached patches...
msg258748 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-01-21 09:19
Oh, sorry. Here is a patch.
msg259843 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-08 13:47
Here are results of microbenchmarks (time in microseconds).

                                copy             deepcopy
                         unpatched patched unpatched   patched

()                           0.993    1.02      5.25      5.38
[]                            1.53    1.12      4.81      5.08
set()                         1.47    1.21      24.6      18.3
frozenset()                  0.991       1      23.4      16.7
{}                            1.59    1.19      5.24      5.45
bytearray()                   8.76    1.11      17.5      11.2
slice(1,10,2)                 7.94       1        23      18.7
NotImplemented                4.75       1      5.82      2.09

tuple(range(10))             0.975    1.01      26.1      26.6
list(range(10))               1.92    1.33      25.7      25.8
set(range(10))                2.17    1.94      47.6      40.6
frozenset(range(10))         0.973       1      47.3      40.3
dict.fromkeys(range(10))      2.19    1.87      43.1      44.8
bytearray(range(10))          10.5    1.17      21.8      17.4

tuple(range(1000))           0.967    1.01  1.97e+03  2.07e+03
list(range(1000))             5.74    5.53  2.02e+03  1.98e+03
set(range(1000))              20.5    20.9  2.15e+03  2.06e+03
frozenset(range(1000))       0.973   0.995  2.14e+03  2.06e+03
dict.fromkeys(range(1000))    49.6    49.3   3.8e+03  3.92e+03
bytearray(range(10))*100      11.5    1.47      23.5      18.9
msg261047 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-03-01 10:14
If there are no other comments, I'm going to commit the patch in short time.
msg261240 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-03-06 09:36
One suggestion:

    def _deepcopy_list(x, memo, deepcopy=deepcopy):
        y = []
        memo[id(x)] = y
        y[:] = [deepcopy(a, memo) for a in x]
        return y

This looks nicer and should run faster by taking advantage of the LIST_APPEND opcode.

It should be noted that the core concept of the patch is all about the minimizing the calling overhead of the copying operation (the cost to call type(x) and the overhead in the type constructors for parsing the positional and keyword arguments).  When it comes to actually copying the data in the containers, there underlying code to do the copying is the same for both the patched and unpatched version (that is why you see a speed-up for copying empty containers and almost no change for containers that have data in them).
msg261243 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-03-06 10:46
> This looks nicer and should run faster by taking advantage of the LIST_APPEND opcode.

The difference is insignificantly (less than 1%) for large lists (list(range(10000))), but it is 3-4% slower for small lists (list(range(10))) and 20-25% slower for empty lists.

On 2.7 your code always has advantage, but on 3.x seems it doesn't have any advantage (thanks to overhead of using a generator).
msg261245 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-03-06 11:24
Then go ahead with the patch as is.
msg261250 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-03-06 13:04
New changeset 496878ac412d by Serhiy Storchaka in branch 'default':
Issue #26167: Minimized overhead in copy.copy() and copy.deepcopy().
https://hg.python.org/cpython/rev/496878ac412d

New changeset 52d7a308e3b4 by Serhiy Storchaka in branch '3.5':
Issue #26167: Backported copy tests.
https://hg.python.org/cpython/rev/52d7a308e3b4

New changeset 8554423dd392 by Serhiy Storchaka in branch '2.7':
Issue #26167: Backported copy tests.
https://hg.python.org/cpython/rev/8554423dd392
History
Date User Action Args
2016-03-06 13:05:01serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2016-03-06 13:04:36python-devsetnosy: + python-dev
messages: + msg261250
2016-03-06 11:24:12rhettingersetmessages: + msg261245
2016-03-06 10:46:35serhiy.storchakasetmessages: + msg261243
2016-03-06 09:36:34rhettingersetnosy: + rhettinger
messages: + msg261240
2016-03-01 10:14:26serhiy.storchakasetmessages: + msg261047
2016-02-08 13:47:18serhiy.storchakasetmessages: + msg259843
2016-01-21 09:19:10serhiy.storchakasetfiles: + copy_speedup.patch
keywords: + patch
messages: + msg258748
2016-01-21 03:10:38josh.rsetmessages: + msg258731
2016-01-20 20:21:39serhiy.storchakasetversions: - Python 3.5
nosy: + alexandre.vassalotti, serhiy.storchaka

messages: + msg258712

assignee: serhiy.storchaka
stage: patch review
2016-01-20 18:13:47josh.rsettitle: Improve copy.copy speed for built-in types -> Improve copy.copy speed for built-in types (list/set/dict)
2016-01-20 18:07:38josh.rcreate