This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Speed-up dict.copy() up to 5.5 times.
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: methane, rhettinger, serhiy.storchaka, vstinner, yselivanov
Priority: normal Keywords:

Created on 2017-08-10 21:31 by yselivanov, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 3067 merged yselivanov, 2017-08-10 21:35
Messages (13)
msg300117 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-08-10 21:31
It's possible to significantly improve performance of shallow dict copy.  Currently, PyDict_Copy creates a new empty dict object and then inserts key/values into it one by one.

My idea is to simply memcpy the whole keys/items region and do the necessary increfs after it.  This works just fine for non-key-sharing dicts.

With the following simple microbenchmark:

    import time

    N = 1000000

    for size in [0, 1, 10, 20, 50, 100, 500, 1000]:
        d = dict([(str(i), i) for i in range(size)])

        t = time.monotonic()
        for i in range(N):
            d.copy()
        e = time.monotonic() - t

        print(f'dict(size={size}).copy() x {N} times:\t {e:.4f}')


Output for 3.7 master:

    dict(size=0).copy() x 1000000 times:     0.1299
    dict(size=1).copy() x 1000000 times:     0.1499
    dict(size=10).copy() x 1000000 times:    0.3758
    dict(size=20).copy() x 1000000 times:    0.7722
    dict(size=50).copy() x 1000000 times:    1.2784
    dict(size=100).copy() x 1000000 times:   2.5128
    dict(size=500).copy() x 1000000 times:   12.8968
    dict(size=1000).copy() x 1000000 times:  25.4276


Output for patched 3.7:

    dict(size=0).copy() x 1000000 times:     0.1352
    dict(size=1).copy() x 1000000 times:     0.1285
    dict(size=10).copy() x 1000000 times:    0.1632
    dict(size=20).copy() x 1000000 times:    0.3076
    dict(size=50).copy() x 1000000 times:    0.3663
    dict(size=100).copy() x 1000000 times:   0.5140
    dict(size=500).copy() x 1000000 times:   2.3419
    dict(size=1000).copy() x 1000000 times:  4.6176
msg300120 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-08-10 21:43
> PyDict_Copy creates a new empty dict object and then inserts key/values into it one by one.

Why not creating a "preallocated" dict in that case? _PyDict_NewPresized()
msg300123 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-08-10 21:46
>> PyDict_Copy creates a new empty dict object and then inserts key/values into it one by one.

> Why not creating a "preallocated" dict in that case? _PyDict_NewPresized()

I don't think it's related to the proposed patch.  Please take a look at the PR.  `_PyDict_NewPresized` and inserting entries one by one will not be faster than a `memcpy`.
msg300153 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2017-08-11 06:06
I like idea.
One worrying point is how to deal with dirty dict.
How about do it only when ma_used == keys->dk_nentries?

Slightly off topic. Copy on write can be implemented via dk_refcnt.
Functions just passing `**kwargs` has temporal copy of dict.
And CoW will reduce the cost of temporal dict.
msg300155 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-08-11 07:10
The PR adds over 50 lines of code for optimising not very often used feature. There are two obvious ways of copying, dict(d) and d.copy(), the PR optimises just the latter one, and I'm not sure this is the most used way. The PR duplicates the low-level code, that increases maintainability cost.

The PR changes the behavior. Currently the effect of copying is compacting the dict.

>>> import sys
>>> sys.getsizeof(d)
41020
>>> sys.getsizeof(d.copy())
41020
>>> sys.getsizeof(dict(d))
41020
>>> for i in range(1000): del d[i]
... 
>>> sys.getsizeof(dict(d))
20544
>>> sys.getsizeof(d.copy())
20544
>>> sys.getsizeof(d)
41020
>>> import sys
>>> d = dict.fromkeys(range(2000))
>>> sys.getsizeof(d)
41020
>>> sys.getsizeof(d.copy())
41020
>>> d = dict.fromkeys(range(2000))
>>> for i in range(1999): del d[i]
... 
>>> sys.getsizeof(d)
41020
>>> sys.getsizeof(d.copy())
136

The PR preserves non compact layout in the copy.
msg300159 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-08-11 09:24
>>>> d = dict.fromkeys(range(2000))
>>>> for i in range(1999): del d[i]
> ...
>>>> sys.getsizeof(d)
> 41020
>>>> sys.getsizeof(d.copy())
> 136

Why "del" doesn't compact the dict?
msg300163 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-08-11 14:48
> I like idea.
> One worrying point is how to deal with dirty dict.
> How about do it only when ma_used == keys->dk_nentries?

I've added this check.  See the updated PR.

> The PR changes the behavior. Currently the effect of copying is compacting the dict.

The check that INADA suggested enables compacting on copy, if it is needed.

> The PR adds over 50 lines of code for optimising not very often used feature.

I started to look into the problem because I need this for my upcoming PEP, so please don't dismiss this idea right away.

I also think that copying a dict isn't a "not very often used feature", it depends on your frame of references.  In some applications you do copy dict a lot.  50 lines of code speeding up one of the core methods 5.5x is a fair price to pay.

> There are two obvious ways of copying, dict(d) and d.copy()

That can also be easily optimized, btw.  I'll see if I can do that without impacting the performance of creating new dicts.

> The PR duplicates the low-level code, that increases maintainability cost.

FWIW, the PR doesn't duplicate any of the code.  It provides a new implementation that is more efficient than the old approach.
msg300164 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-08-11 14:49
> Why "del" doesn't compact the dict?

This is a good question, btw.
msg300423 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-08-17 13:55
The side effect of this patch is making dict.copy() atomic. This is a worthy feature if extent it to dict constructor. For now the only way of making an atomic (or almost atomic) copy of a dict is dict(list(d.itemview())). It isn't very time and memory efficient.

If you will make dict copying removing holes and extend your patch to dict constructor, it could be more useful.

Look at the set implementation. It doesn't just use memcpy, but it contains specialized insertion implementation for the case if all items are unique. Fast copying is more important for dicts since the copying is more common for sets. It is a part of set operations and it is common to convert a set to a frozenset.
msg310380 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-01-21 19:56
I've pushed a new version of the patch that I intend to merge tomorrow.

The last version has only one minor change: it uses fast-path for "slightly" non-compact dicts too (dicts don't use *at most* 3 entries).  This protects us from pathological cases when a huge dict being almost emptied with pop/del and then gets copied -- we indeed want the copy to be compact.

Although I believe that the real issue is that del and pop don't compact dicts from time to time, but I don't want that issue to hold off this patch in any way.

> If you will make dict copying removing holes and extend your patch to dict constructor, it could be more useful.

It's already useful -- I'm supporting a large code base (>0.5M LOC) which uses dict.copy() extensively, and it shows up in profile.  I've seen it in many other places (particularly ORMs love to store information in dicts and use dict.copy() to track dirty state/changes).  Please don't say that dict.copy() is not a common operation or that dict(other_dict) is more common than other_dict.copy() -- that's simply incorrect.

> The side effect of this patch is making dict.copy() atomic. This is a worthy feature if extent it to dict constructor.

I agree.  I'll work on that later in a follow-up PR.  Let's move in small steps.
msg310426 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-01-22 16:02
Yury: Would you mind to open an issue to investigate why dict are not compatected automatically?
msg310428 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-01-22 16:37
Victor: done; https://bugs.python.org/issue32623
msg310432 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-01-22 16:54
New changeset b0a7a037b8fde56b62f886d5188bced7776777b4 by Yury Selivanov in branch 'master':
bpo-31179: Make dict.copy() up to 5.5 times faster. (#3067)
https://github.com/python/cpython/commit/b0a7a037b8fde56b62f886d5188bced7776777b4
History
Date User Action Args
2022-04-11 14:58:50adminsetgithub: 75362
2018-01-22 16:55:17yselivanovsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2018-01-22 16:54:45yselivanovsetmessages: + msg310432
2018-01-22 16:37:55yselivanovsetmessages: + msg310428
2018-01-22 16:02:07vstinnersetmessages: + msg310426
2018-01-21 19:56:47yselivanovsetmessages: + msg310380
2017-08-17 13:55:20serhiy.storchakasetmessages: + msg300423
2017-08-11 14:49:02yselivanovsetmessages: + msg300164
2017-08-11 14:48:01yselivanovsetmessages: + msg300163
2017-08-11 09:24:27vstinnersetmessages: + msg300159
2017-08-11 07:10:53serhiy.storchakasetnosy: + rhettinger
2017-08-11 07:10:08serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg300155
2017-08-11 06:06:47methanesetmessages: + msg300153
2017-08-10 21:46:08yselivanovsetmessages: + msg300123
2017-08-10 21:43:02vstinnersetmessages: + msg300120
2017-08-10 21:35:14yselivanovsetpull_requests: + pull_request3104
2017-08-10 21:31:33yselivanovcreate