classification
Title: Speed-up dict.copy() up to 5.5 times.
Type: performance Stage: patch review
Components: Interpreter Core Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: haypo, inada.naoki, rhettinger, serhiy.storchaka, yselivanov
Priority: normal Keywords:

Created on 2017-08-10 21:31 by yselivanov, last changed 2017-08-17 13:55 by serhiy.storchaka.

Pull Requests
URL Status Linked Edit
PR 3067 open yselivanov, 2017-08-10 21:35
Messages (9)
msg300117 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-08-10 21:31
It's possible to significantly improve performance of shallow dict copy.  Currently, PyDict_Copy creates a new empty dict object and then inserts key/values into it one by one.

My idea is to simply memcpy the whole keys/items region and do the necessary increfs after it.  This works just fine for non-key-sharing dicts.

With the following simple microbenchmark:

    import time

    N = 1000000

    for size in [0, 1, 10, 20, 50, 100, 500, 1000]:
        d = dict([(str(i), i) for i in range(size)])

        t = time.monotonic()
        for i in range(N):
            d.copy()
        e = time.monotonic() - t

        print(f'dict(size={size}).copy() x {N} times:\t {e:.4f}')


Output for 3.7 master:

    dict(size=0).copy() x 1000000 times:     0.1299
    dict(size=1).copy() x 1000000 times:     0.1499
    dict(size=10).copy() x 1000000 times:    0.3758
    dict(size=20).copy() x 1000000 times:    0.7722
    dict(size=50).copy() x 1000000 times:    1.2784
    dict(size=100).copy() x 1000000 times:   2.5128
    dict(size=500).copy() x 1000000 times:   12.8968
    dict(size=1000).copy() x 1000000 times:  25.4276


Output for patched 3.7:

    dict(size=0).copy() x 1000000 times:     0.1352
    dict(size=1).copy() x 1000000 times:     0.1285
    dict(size=10).copy() x 1000000 times:    0.1632
    dict(size=20).copy() x 1000000 times:    0.3076
    dict(size=50).copy() x 1000000 times:    0.3663
    dict(size=100).copy() x 1000000 times:   0.5140
    dict(size=500).copy() x 1000000 times:   2.3419
    dict(size=1000).copy() x 1000000 times:  4.6176
msg300120 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-08-10 21:43
> PyDict_Copy creates a new empty dict object and then inserts key/values into it one by one.

Why not creating a "preallocated" dict in that case? _PyDict_NewPresized()
msg300123 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-08-10 21:46
>> PyDict_Copy creates a new empty dict object and then inserts key/values into it one by one.

> Why not creating a "preallocated" dict in that case? _PyDict_NewPresized()

I don't think it's related to the proposed patch.  Please take a look at the PR.  `_PyDict_NewPresized` and inserting entries one by one will not be faster than a `memcpy`.
msg300153 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-08-11 06:06
I like idea.
One worrying point is how to deal with dirty dict.
How about do it only when ma_used == keys->dk_nentries?

Slightly off topic. Copy on write can be implemented via dk_refcnt.
Functions just passing `**kwargs` has temporal copy of dict.
And CoW will reduce the cost of temporal dict.
msg300155 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-08-11 07:10
The PR adds over 50 lines of code for optimising not very often used feature. There are two obvious ways of copying, dict(d) and d.copy(), the PR optimises just the latter one, and I'm not sure this is the most used way. The PR duplicates the low-level code, that increases maintainability cost.

The PR changes the behavior. Currently the effect of copying is compacting the dict.

>>> import sys
>>> sys.getsizeof(d)
41020
>>> sys.getsizeof(d.copy())
41020
>>> sys.getsizeof(dict(d))
41020
>>> for i in range(1000): del d[i]
... 
>>> sys.getsizeof(dict(d))
20544
>>> sys.getsizeof(d.copy())
20544
>>> sys.getsizeof(d)
41020
>>> import sys
>>> d = dict.fromkeys(range(2000))
>>> sys.getsizeof(d)
41020
>>> sys.getsizeof(d.copy())
41020
>>> d = dict.fromkeys(range(2000))
>>> for i in range(1999): del d[i]
... 
>>> sys.getsizeof(d)
41020
>>> sys.getsizeof(d.copy())
136

The PR preserves non compact layout in the copy.
msg300159 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-08-11 09:24
>>>> d = dict.fromkeys(range(2000))
>>>> for i in range(1999): del d[i]
> ...
>>>> sys.getsizeof(d)
> 41020
>>>> sys.getsizeof(d.copy())
> 136

Why "del" doesn't compact the dict?
msg300163 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-08-11 14:48
> I like idea.
> One worrying point is how to deal with dirty dict.
> How about do it only when ma_used == keys->dk_nentries?

I've added this check.  See the updated PR.

> The PR changes the behavior. Currently the effect of copying is compacting the dict.

The check that INADA suggested enables compacting on copy, if it is needed.

> The PR adds over 50 lines of code for optimising not very often used feature.

I started to look into the problem because I need this for my upcoming PEP, so please don't dismiss this idea right away.

I also think that copying a dict isn't a "not very often used feature", it depends on your frame of references.  In some applications you do copy dict a lot.  50 lines of code speeding up one of the core methods 5.5x is a fair price to pay.

> There are two obvious ways of copying, dict(d) and d.copy()

That can also be easily optimized, btw.  I'll see if I can do that without impacting the performance of creating new dicts.

> The PR duplicates the low-level code, that increases maintainability cost.

FWIW, the PR doesn't duplicate any of the code.  It provides a new implementation that is more efficient than the old approach.
msg300164 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-08-11 14:49
> Why "del" doesn't compact the dict?

This is a good question, btw.
msg300423 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-08-17 13:55
The side effect of this patch is making dict.copy() atomic. This is a worthy feature if extent it to dict constructor. For now the only way of making an atomic (or almost atomic) copy of a dict is dict(list(d.itemview())). It isn't very time and memory efficient.

If you will make dict copying removing holes and extend your patch to dict constructor, it could be more useful.

Look at the set implementation. It doesn't just use memcpy, but it contains specialized insertion implementation for the case if all items are unique. Fast copying is more important for dicts since the copying is more common for sets. It is a part of set operations and it is common to convert a set to a frozenset.
History
Date User Action Args
2017-08-17 13:55:20serhiy.storchakasetmessages: + msg300423
2017-08-11 14:49:02yselivanovsetmessages: + msg300164
2017-08-11 14:48:01yselivanovsetmessages: + msg300163
2017-08-11 09:24:27hayposetmessages: + msg300159
2017-08-11 07:10:53serhiy.storchakasetnosy: + rhettinger
2017-08-11 07:10:08serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg300155
2017-08-11 06:06:47inada.naokisetmessages: + msg300153
2017-08-10 21:46:08yselivanovsetmessages: + msg300123
2017-08-10 21:43:02hayposetmessages: + msg300120
2017-08-10 21:35:14yselivanovsetpull_requests: + pull_request3104
2017-08-10 21:31:33yselivanovcreate