classification
Title: Clean up and speed up dict iteration
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: inada.naoki, ned.deily, python-dev, rhettinger, serhiy.storchaka, vstinner, xiang.zhang
Priority: normal Keywords: patch

Created on 2016-09-16 22:10 by serhiy.storchaka, last changed 2017-03-31 16:36 by dstufft. This issue is now closed.

Files
File name Uploaded Description Edit
dict_iter.patch serhiy.storchaka, 2016-09-16 22:10 review
dict_iter2.patch serhiy.storchaka, 2016-09-17 18:21 review
dict_iter3.patch inada.naoki, 2016-09-28 08:43 review
dict_iter4.patch inada.naoki, 2016-09-28 11:53 more PEP7 compliant code review
dict_iter5.patch inada.naoki, 2016-09-29 03:23
dict_iter6.patch inada.naoki, 2016-09-29 08:45 review
dict_iter7.patch inada.naoki, 2016-09-29 09:29 review
dict_iter8.patch inada.naoki, 2016-10-02 05:43 review
Pull Requests
URL Status Linked Edit
PR 552 closed dstufft, 2017-03-31 16:36
Messages (29)
msg276755 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-16 22:10
In some circumstances iterating dict under 3.6 can be 20% slower than under 3.5.

$ ./python -m perf timeit -s "d = dict.fromkeys(range(10**6))" -- "list(d)"

Python 3.5: Median +- std dev: 33.8 ms +- 0.7 ms
Python 3.6: Median +- std dev: 37.8 ms +- 0.5 ms

Seems this is compiler and platform specific, it is reproducible only with GCC on 32 bit.
 
Proposed patch restores 3.5 performance and simplifies the code.

Python 3.6 patched: Median +- std dev: 33.7 ms +- 0.7 ms

Other types of iteration:

$ ./python -m perf timeit -s "d = dict.fromkeys(range(10**6)); v = d.values()" -- "list(v)"
Python 3.5:            Median +- std dev: 26.2 ms +- 0.7 ms
Python 3.6 unpatched:  Median +- std dev: 28.0 ms +- 0.6 ms
Python 3.6 patched:    Median +- std dev: 26.3 ms +- 1.1 ms

$ ./python -m perf timeit -s "d = dict.fromkeys(range(10**6)); v = d.items()" -- "list(v)"
Python 3.5:            Median +- std dev: 232 ms +- 6 ms
Python 3.6 unpatched:  Median +- std dev: 259 ms +- 6 ms
Python 3.6 patched:    Median +- std dev: 243 ms +- 9 ms

_PyDict_Next():
$ ./python -m perf timeit -s "d = dict.fromkeys(range(10**6))" -- "set(d)"
Python 3.5:            Median +- std dev: 68.3 ms +- 1.8 ms
Python 3.6 unpatched:  Median +- std dev: 68.1 ms +- 2.5 ms
Python 3.6 patched:    Median +- std dev: 66.0 ms +- 1.2 ms

PyDict_Next():
$ ./python -m perf timeit -s "from _testcapi import test_dict_iteration as t" -- "t()"
Python 3.5:            Median +- std dev: 3.31 ms +- 0.10 ms
Python 3.6 unpatched:  Median +- std dev: 3.51 ms +- 0.09 ms
Python 3.6 patched:    Median +- std dev: 3.43 ms +- 0.09 ms
msg276789 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-09-17 12:44
Serhiy, I don't understand your patch. You delete the logic about split table and then iterating split table actually fails.

>>> class C:
...     pass
... 
>>> a, b = C(), C()
>>> a.a, a.b = 1, 2
>>> list(b.__dict__)
['a', 'b']
>>> 

Without attributes now b also get entries when iterating.
msg276811 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-17 18:21
Good catch Xiang Zhang! I missed this point. Here is fixed patch.

$ ./python -m perf timeit -s "d = dict.fromkeys(range(10**6))" -- "list(d)"
Python 3.5:            Median +- std dev: 33.8 ms +- 0.7 ms
Python 3.6 unpatched:  Median +- std dev: 37.8 ms +- 0.5 ms
Python 3.6 patched:    Median +- std dev: 34.0 ms +- 0.6 ms

$ ./python -m perf timeit -s "d = dict.fromkeys(range(10**6)); v = d.values()" -- "list(v)"
Python 3.5:            Median +- std dev: 26.2 ms +- 0.7 ms
Python 3.6 unpatched:  Median +- std dev: 28.0 ms +- 0.6 ms
Python 3.6 patched:    Median +- std dev: 26.1 ms +- 0.8 ms

$ ./python -m perf timeit -s "d = dict.fromkeys(range(10**6)); v = d.items()" -- "list(v)"
Python 3.5:            Median +- std dev: 232 ms +- 6 ms
Python 3.6 unpatched:  Median +- std dev: 259 ms +- 6 ms
Python 3.6 patched:    Median +- std dev: 249 ms +- 6 ms

$ ./python -m perf timeit -s "d = dict.fromkeys(range(10**6))" -- "set(d)"
Python 3.5:            Median +- std dev: 68.3 ms +- 1.8 ms
Python 3.6 unpatched:  Median +- std dev: 68.1 ms +- 2.5 ms
Python 3.6 patched:    Median +- std dev: 63.7 ms +- 1.5 ms

$ ./python -m perf timeit -s "from _testcapi import test_dict_iteration as t" -- "t()"
Python 3.5:            Median +- std dev: 3.31 ms +- 0.10 ms
Python 3.6 unpatched:  Median +- std dev: 3.51 ms +- 0.09 ms
Python 3.6 patched:    Median +- std dev: 3.43 ms +- 0.05 ms
msg276821 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-09-17 19:17
Also consider adding __length_hint__ to the various mapping views.  That would help with a common case of listing the view.
msg276823 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-17 19:57
Dict iterators already have __length_hint__. Dict views have __len__.
msg276830 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-09-17 20:02
Yes, I missed seeing that.
msg276868 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-09-18 02:00
Also, please take a look at resizing.  It looks like it is doing way too much work.  The original keys, values, and hashes shouldn't move at all, only the indices array needs to updated.

Here is the pure python version from the original proof-of-concept at https://code.activestate.com/recipes/578375

    def _resize(self, n):
        '''Reindex the existing hash/key/value entries.
           Entries do not get moved, they only get new indices.
           No calls are made to hash() or __eq__().

        '''
        n = 2 ** n.bit_length()                     # round-up to power-of-two
        self.indices = self._make_index(n)
        for index, hashvalue in enumerate(self.hashlist):
            for i in Dict._gen_probes(hashvalue, n-1):
                if self.indices[i] == FREE:
                    break
            self.indices[i] = index
        self.filled = self.used

Likewise, it looks like there is room for improvement in dict_copy().  It can memcpy() all four arrays and then incref all the keys.  That should be considerably faster than zeroing new arrays and applying re-insertion logic.
msg276873 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-09-18 07:49
Serhiy: Patch LGTM except two trivial comments on Rietveld.

Raymond: With such implementation keys, values and hashes are all organized together and there seems no _resize operation can only adjust hashes without breaking the entire layout.
msg276889 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2016-09-18 13:29
LGTM, thanks.
msg276899 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-09-18 17:15
> Raymond: With such implementation keys, values and hashes are all
> organized together and there seems no _resize operation can only 
> adjust hashes without breaking the entire layout.

None of those needs to change during a resize.  Only indices array needs to be rebuilt.
msg276902 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-09-18 18:00
We can still clean this up for Python 3.6.  We're in feature freeze, not development freeze.  The compact dict patch was very rough when it landed and is expected to continue to be polished so that the expected benefits are realized.
msg276903 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-18 18:37
Actually most of optimization is not specific for new dict implementation.
msg276915 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-09-18 21:14
Here is a rough (very rough and not compileable) sketch of the direction that resizing should take:

static void
insert_indices_clean(PyDictObject *mp, Py_hash_t hash)
{
    size_t i, perturb;
    PyDictKeysObject *k = mp->ma_keys;
    size_t mask = (size_t)DK_SIZE(k)-1;

    i = hash & mask;
    for (perturb = hash; dk_get_index(k, i) != DKIX_EMPTY;
         perturb >>= PERTURB_SHIFT) {
	i = mask & ((i << 2) + i + perturb + 1);
    }
    dk_set_index(k, i, k->dk_nentries);
}

static int
dictresize(PyDictObject *mp, Py_ssize_t minused)
{
    Py_ssize_t i, newsize;
    PyDictKeyEntry *ep0;

    /* Find the smallest table size > minused. */
    for (newsize = PyDict_MINSIZE;
         newsize <= minused && newsize > 0;
         newsize <<= 1)
        ;
    if (newsize <= 0) {
        PyErr_NoMemory();
        return -1;
    }

    realloc(dk->dk_indicies, es * newsize);
    memset(&dk->dk_indices.as_1[0], 0xff, es * size);
    dk->dk_size = size;

    for (i = 0; i < mp->dk_nentries; i++) {
        PyDictKeyEntry *ep = &ep0[i];
        if (ep->me_value != NULL) {
            insert_indices_clean(mp, ep->me_hash);
        }
    }
    return 0;
}
msg276919 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-18 21:29
Raymond, I think this is different issue.
msg276922 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-09-18 22:15
Moved dictresize() discussion to http://bugs.python.org/issue28199
msg277501 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2016-09-27 07:35
Serhiy, may I update your patch, if you're busy in this week?
msg277502 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-27 07:40
Feel free to do this. I'm not such busy, I'm experimenting with other variants.
msg277596 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2016-09-28 08:43
Updated with small refactoring.
Confirmed passes quicktest and no warning from clang on OS X.
msg277612 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2016-09-28 12:48
Draft Misc/NEWS entry (and commit message) is:

 Core and Builtins
 -----------------

+- Issue #28183: Optimize dict iteration.
+
msg277631 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-28 14:32
I reviewed dict_iter4.patch on Rietveld.
msg277687 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2016-09-29 08:45
recreate patch with different option, since Rietveld doesn't accept dict_iter5.patch
msg277699 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-29 10:37
dict_iter7.patch LGTM, go ahead. Just dont forget to mention Serhiy Storchaka as the co-author ;-)
msg277701 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-29 10:45
Please wait until I merge my changes.
msg277713 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-29 18:45
It looks to me that the patch contain unneeded changes. Some changes don't affect performance, others don't look well justified. Are there benchmarks that confirm the benefit of these changes?
msg277718 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-29 19:48
Serhiy Storchaka added the comment:
> It looks to me that the patch contain unneeded changes. Some changes don't affect performance, others don't look well justified. Are there benchmarks that confirm the benefit of these changes?

Hum, since I expect disagreements on the changes, I propose to split
the patch into two parts: minimum change to optimize, and a second
change (later, once the first one is merged) to cleanup further the
code.

"Some changes don't affect performance"

Well, a cleanup is supposed to not affect performances at all :-)
msg277752 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2016-09-30 13:08
Refactoring includes replacing PyDict_Next with _PyDict_Next.
It can improve performance by skipping some checks.
But it would be too small to see difference from benchmark.

I'll reduce diff size, hopefully in this weekend.
msg277854 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2016-10-02 05:43
dict_iter8.patch is based on dict_iter3.patch.  Added some comments and fixing indents.
No change about _PyDict_Next API.
msg278387 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-10-09 20:09
New changeset 112714f3745d by Serhiy Storchaka in branch '3.6':
Issue #28183: Optimize and cleanup dict iteration.
https://hg.python.org/cpython/rev/112714f3745d
msg278388 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-09 20:17
Few weeks I tried to rewrite loops in more fast way, but I have not found anything significantly faster or clearer. Some forms of was a few percent faster, but I would be difficult to explain this effect. Most likely it was an artifact of the compilation.

Removed unrelated changes from dict_iter8.patch and added additional check in _PyDict_Next() before committing.
History
Date User Action Args
2017-03-31 16:36:09dstufftsetpull_requests: + pull_request852
2016-10-09 20:17:19serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg278388

stage: resolved
2016-10-09 20:09:24python-devsetnosy: + python-dev
messages: + msg278387
2016-10-02 05:43:32inada.naokisetfiles: + dict_iter8.patch

messages: + msg277854
2016-09-30 13:08:05inada.naokisetmessages: + msg277752
2016-09-29 19:48:04vstinnersetmessages: + msg277718
2016-09-29 18:45:37serhiy.storchakasetmessages: + msg277713
2016-09-29 10:45:35serhiy.storchakasetmessages: + msg277701
2016-09-29 10:37:09vstinnersetmessages: + msg277699
2016-09-29 09:29:46inada.naokisetfiles: + dict_iter7.patch
2016-09-29 08:45:50inada.naokisetfiles: + dict_iter6.patch

messages: + msg277687
2016-09-29 03:23:46inada.naokisetfiles: + dict_iter5.patch
2016-09-28 14:32:48vstinnersetmessages: + msg277631
2016-09-28 12:48:48inada.naokisetmessages: + msg277612
2016-09-28 11:53:01inada.naokisetfiles: + dict_iter4.patch
versions: + Python 3.7
2016-09-28 08:43:05inada.naokisetfiles: + dict_iter3.patch

messages: + msg277596
2016-09-27 07:40:12serhiy.storchakasetmessages: + msg277502
2016-09-27 07:35:06inada.naokisetmessages: + msg277501
2016-09-18 22:15:03rhettingersetmessages: + msg276922
2016-09-18 21:29:45serhiy.storchakasetmessages: + msg276919
2016-09-18 21:14:27rhettingersetmessages: + msg276915
2016-09-18 18:37:20serhiy.storchakasetmessages: + msg276903
2016-09-18 18:00:56rhettingersetmessages: + msg276902
versions: + Python 3.6, - Python 3.7
2016-09-18 17:45:35serhiy.storchakasetassignee: serhiy.storchaka
versions: - Python 3.6
2016-09-18 17:15:57rhettingersetmessages: + msg276899
2016-09-18 13:29:08inada.naokisetmessages: + msg276889
2016-09-18 07:49:43xiang.zhangsetmessages: + msg276873
2016-09-18 02:00:04rhettingersetmessages: + msg276868
2016-09-17 20:02:41rhettingersetmessages: + msg276830
2016-09-17 19:57:07serhiy.storchakasetmessages: + msg276823
2016-09-17 19:17:11rhettingersetmessages: + msg276821
2016-09-17 18:21:42serhiy.storchakasetfiles: + dict_iter2.patch

messages: + msg276811
2016-09-17 12:44:32xiang.zhangsetmessages: + msg276789
2016-09-16 22:26:37vstinnersetnosy: + inada.naoki, xiang.zhang
2016-09-16 22:10:07serhiy.storchakacreate