New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compact dict resizing is doing too much work #72386
Comments
The dictresize() method unnecessarily copies the keys, values, and hashes as well as making insertions in the index table. Only the latter step is necessary or desirable. Here in the pure Python code for resizing taking from the original proof-of-concept code at https://code.activestate.com/recipes/578375 def _resize(self, n):
'''Reindex the existing hash/key/value entries.
Entries do not get moved, they only get new indices.
No calls are made to hash() or __eq__().
And here is a rough sketch of what it would look like in the C code (very rough, not yet compileable): static void
insert_indices_clean(PyDictObject *mp, Py_hash_t hash)
{
size_t i, perturb;
PyDictKeysObject *k = mp->ma_keys;
size_t mask = (size_t)DK_SIZE(k)-1; i = hash & mask;
for (perturb = hash; dk_get_index(k, i) != DKIX_EMPTY;
perturb >>= PERTURB_SHIFT) {
i = mask & ((i << 2) + i + perturb + 1);
}
dk_set_index(k, i, k->dk_nentries);
} static int
dictresize(PyDictObject *mp, Py_ssize_t minused)
{
Py_ssize_t i, newsize;
PyDictKeyEntry *ep0;
/* Find the smallest table size > minused. */
for (newsize = PyDict_MINSIZE;
newsize <= minused && newsize > 0;
newsize <<= 1)
;
if (newsize <= 0) {
PyErr_NoMemory();
return -1;
}
/* Loop over hashes, skipping NULLs, inserting new indices */
for (i = 0; i < mp->dk_nentries; i++) {
PyDictKeyEntry *ep = &ep0[i];
if (ep->me_value != NULL) {
insert_indices_clean(mp, ep->me_hash);
}
}
return 0;
} |
Just before the re-insertion, we should also do a compaction-in-place for the keys/values/hashes array if it has a significant number of holes for previously deleted entries. |
Then how about entries(key, value pairs)? The size of entries does not match the available hash slots. For example, an empty dict allocates a hash array with 5 available slots and 5 entries. Now we resize the hash array to size 16 and it can afford 10 entries but you only get room for 5 entries. How could we insert the 6th entry? |
Current compact ordered dict implementation is bit different from yours. But it's easy to detect clean dict. Your suggestion can be used when:
I think dictresize for split table can be split function. But I don't know it can improve performance or readability. |
Does it mean there is a chance to improve OrderedDict to use new dict implementation, if it seems safe enough? (After OrderedDict implementation is improved, functools.lru_cache can use OrderedDict and remove doubly linked list too.) |
functools.lru_cache can use just ordered dict. But simple implementation is 1.5 times slower. I'm working on this. I think that changing implementation of lru_cache and OrderedDict is a new feature and can came only in 3.7, when new dict implementation will be more tested. |
As written in comment, reusing keys object breaks odict implementation. |
$ ./install/bin/python3.6-default -m perf timeit -s 'x = range(1000); d={}' 'for i in x: d[i]=i; del d[i];'
....................
Median +- std dev: 363 us +- 11 us
$ ./install/bin/python3.6 -m perf timeit -s 'x = range(1000); d={}' 'for i in x: d[i]=i; del d[i];' # patched
....................
Median +- std dev: 343 us +- 17 us
$ ./install/bin/python3.6-default -m perf timeit -s 'x = range(1000); d={}' 'for i in x: d[i]=i; del d[i];'
....................
Median +- std dev: 362 us +- 11 us
$ ./install/bin/python3.6 -m perf timeit -s 'x = range(1000); d={}' 'for i in x: d[i]=i; del d[i];' # patched
....................
Median +- std dev: 342 us +- 14 us
$ ./install/bin/python3.6-default -m perf timeit -s 'x = range(1000); d={}' 'for i in x: d[i]=i; del d[i];'
....................
Median +- std dev: 364 us +- 11 us |
Oh, your message reminded me that I always wanted an option in the timeit module to run the benchmark on two Python versions and then directly compare the result. I just added the feature to perf and then I released perf 0.7.11, enjoy! The output is more compact and it's more reliable because the comparison ensures that the difference is significant. --- $ export PYTHONPATH=~/prog/GIT/perf
$ ./python-resize -m perf timeit --inherit-environ=PYTHONPATH --compare-to=./python-ref -s 'x = range(1000); d={}' 'for i in x: d[i]=i; del d[i];' --rigorous
python-ref: ........................................ 77.6 us +- 1.8 us
python-resize: ........................................ 74.8 us +- 1.9 us Median +- std dev: [python-ref] 77.6 us +- 1.8 us -> [python-resize] 74.8 us +- 1.9 us: 1.04x faster I can reproduced the 4% speedup. (I didn't review the patch yet.) |
dictresize.patch is quite big, it seems like half of the patch is more cleanup/refactoring. Can you please split the patch into two parts, to only a smaller patch just for the optimization part? |
Ah, I'm sorry. |
For the simple case with no dummy entries, I was expecting a fast path that just realloced the keys/values/hashes arrays and then updated the index table with reinsertion logic that only touches the indices. Use realloc() is nice because it makes it possible that the keys/values/hashes don't have to be recopied and if they did, it would use a fast memcpy to move them to the newly resized array. |
Since entries array is embedded in PyDictKeysObject, we can't realloc entries. Split table may have enough size of ma_values at first in typical case. |
@Haypo, could you review this? |
What if split copying entries from inserting indices? First fill a continuous array of entries (could use memcpy if there were not deletions), then build a hashtable of continuous indices. Following patch implements this idea. |
Hmm, what's the advantage? |
LGTM. |
The advantage is using memcpy in case of combined table without deletions. This is common case of creating dict without pre-resizing: dict(list), dict(iterator), etc. In future, when will be possible to reuse old entries array, this also could help in case of multiple modifications without significant size growth. First squeeze entries (skipping NULL values), then clear and rebuild index table. |
Current code and my patch called insertdict_clean() or insert_index() for each entry. Cons of Serhiy's patch is it's two pass. If entries are larger than L2 cache, (My forecast is no performance difference between my patch and Serhiy's on amd64 |
I doubt how many memcpy could benefit. Two pass does not necessarily make faster. I make a simple test: With dictresize3, (I make insert_index inline): [bin]$ ./python3 -m perf timeit -s 'd = {i:i for i in range(6)}' 'dict(d)' With dictresize4: [bin]$ ./python3 -m perf timeit -s 'd = {i:i for i in range(6)}' 'dict(d)' Just like INAKA states, there is hardly any difference. And you need 2 pass for dicts with deleted member. |
Remark about perf timeout: you can use --compare-to with the patched |
Yes, the difference is pretty small. The largest difference I got is 7% in following microbenchmark: $ ./python -m perf timeit -s 'x = list(range(1000))' -- 'dict.fromkeys(x)'
Unpatched: Median +- std dev: 80.6 us +- 0.5 us
dictresize3.patch: Median +- std dev: 80.9 us +- 2.8 us
dictresize4.patch: Median +- std dev: 75.4 us +- 2.1 us I suppose this is due to using memcpy. In other microbenchmarks the difference usually is 1-2%. Xiang, your microbenchmark don't work for this patch because dict(d) make only one resizing.
Agree. But on other hand, dictresize3.patch needs to fit in L2 cache all indices table and prefetch two entries arrays: old and new. dictresize4.patch in every loop works with smaller memory: two entries arrays in the first loop and an indices table and new entries arrays in the second loop. |
If you inline insert_index, dictresize3 is not that bad. ./python3 -m perf timeit -s 'x = list(range(1000))' -- 'dict.fromkeys(x)' But don't bother on microbenchmark, just move on. I just think the logic is not as clear as dictresize3. But the easiness for future modification makes sense. |
Serhiy, would you commit it by 3.6b3? -- sent from mobile |
New changeset 6b88dfc7b25d by Serhiy Storchaka in branch '3.6': New changeset f0fbc6071d7e by Serhiy Storchaka in branch 'default': |
I don't have strong preferences, but at this moment dictresize4.patch looks a little better to me. Maybe I wrong and in future optimizations we will returned to insert_index(). Yet one opportunity for future optimization -- inline dk_get_index() and dk_set_index() and move check for indices width out of the loop. I don't know whether there is significant effect of this. |
In pypa/setuptools#836, I've pinpointed this commit as implicated in dictionaries spontaneously losing keys. I have not yet attempted to replicate the issue in a standalone environment, and I'm hoping someone with a better understanding of the implementation can devise a reproduction that distills the issue that setuptools seems only to hit in very specialized conditions. Please let me know if I can help by providing more detail in the environment where this occurs or by filing another ticket. Somewhere we should capture that this is a regression pending release in 3.6.0b3 today, and for that reason, I'm adding Ned to this ticket. |
Thanks, Jason, for the heads-up. Serhiy, can you take a look at this quickly? I'm going to hold 360b3 until we have a better idea what's going on. |
I have rolled back the changes. |
Thanks, Serhiy! Jason, can you verify that there is no longer a 3.6 regression with your tests? |
Testing now... |
Confirmed. Tests are now passing where they were failing before. Thanks for the quick response! |
Excellent, thanks everyone! I'll leave this open for re-evaluation for 3.7. |
I use gdb to run setuptools test suite and find the assumption, split tables are always dense is broken for both dictresize3 and dictresize4. #0 0x00007ffff71171c7 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55
#1 0x00007ffff7118e2a in __GI_abort () at abort.c:89
#2 0x00007ffff71100bd in __assert_fail_base (fmt=0x7ffff7271f78 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x5e4b90 "oldvalues[i] != ((void *)0)", file=file@entry=0x5e4aa0 "Objects/dictobject.c", line=line@entry=1270,
function=function@entry=0x5e59f0 <__PRETTY_FUNCTION__.12083> "dictresize") at assert.c:92
#3 0x00007ffff7110172 in __GI___assert_fail (assertion=assertion@entry=0x5e4b90 "oldvalues[i] != ((void *)0)",
file=file@entry=0x5e4aa0 "Objects/dictobject.c", line=line@entry=1270, function=function@entry=0x5e59f0 <__PRETTY_FUNCTION__.12083> "dictresize")
at assert.c:101
#4 0x000000000048bddc in dictresize (mp=mp@entry=0x7ffff219d2b0, minused=<optimized out>) at Objects/dictobject.c:1270
#5 0x000000000048bf93 in insertion_resize (mp=mp@entry=0x7ffff219d2b0) at Objects/dictobject.c:1100
#6 0x000000000048c5fd in insertdict (mp=mp@entry=0x7ffff219d2b0, key=key@entry=0x7ffff579c3c0, hash=-3681610201421769281,
value=value@entry=0x7ffff07f56e8) at Objects/dictobject.c:1136
#7 0x000000000048fdfd in PyDict_SetItem (op=op@entry=0x7ffff219d2b0, key=key@entry=0x7ffff579c3c0, value=value@entry=0x7ffff07f56e8)
at Objects/dictobject.c:1572
#8 0x0000000000492cb5 in _PyObjectDict_SetItem (tp=tp@entry=0xd52548, dictptr=0x7ffff080cbd8, key=key@entry=0x7ffff579c3c0,
value=value@entry=0x7ffff07f56e8) at Objects/dictobject.c:4274
#9 0x000000000049df8a in _PyObject_GenericSetAttrWithDict (obj=0x7ffff080cbb8, name=0x7ffff579c3c0, value=0x7ffff07f56e8, dict=dict@entry=0x0)
at Objects/object.c:1172
#10 0x000000000049e0cf in PyObject_GenericSetAttr (obj=<optimized out>, name=<optimized out>, value=<optimized out>) at Objects/object.c:1194
#11 0x000000000049d80e in PyObject_SetAttr (v=v@entry=0x7ffff080cbb8, name=name@entry=0x7ffff579c3c0, value=value@entry=0x7ffff07f56e8)
at Objects/object.c:932 Thanks to Victor's _PyDict_CheckConsistency, it's easy then to find even without dictresize3 and dictresize4 (the current version), the test suite still fails (#define DEBUG_PYDICT). #0 0x00007ffff71171c7 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55
#1 0x00007ffff7118e2a in __GI_abort () at abort.c:89
#2 0x00007ffff71100bd in __assert_fail_base (fmt=0x7ffff7271f78 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x5e53a0 "mp->ma_values[i] != ((void *)0)", file=file@entry=0x5e4d00 "Objects/dictobject.c", line=line@entry=498,
function=function@entry=0x5e5dd0 <__PRETTY_FUNCTION__.11869> "_PyDict_CheckConsistency") at assert.c:92
#3 0x00007ffff7110172 in __GI___assert_fail (assertion=assertion@entry=0x5e53a0 "mp->ma_values[i] != ((void *)0)",
file=file@entry=0x5e4d00 "Objects/dictobject.c", line=line@entry=498,
function=function@entry=0x5e5dd0 <__PRETTY_FUNCTION__.11869> "_PyDict_CheckConsistency") at assert.c:101
#4 0x000000000048ba17 in _PyDict_CheckConsistency (mp=mp@entry=0x7ffff0806e68) at Objects/dictobject.c:498
#5 0x00000000004927a3 in PyDict_SetDefault (d=d@entry=0x7ffff0806e68, key=0x7ffff2ffcdd8, defaultobj=0x8abf20 <_Py_NoneStruct>)
at Objects/dictobject.c:2807
#6 0x0000000000492854 in dict_setdefault (mp=0x7ffff0806e68, args=<optimized out>) at Objects/dictobject.c:2824
#7 0x0000000000499469 in _PyCFunction_FastCallDict (func_obj=func_obj@entry=0x7ffff0f2f8c8, args=args@entry=0x105afe8, nargs=nargs@entry=2,
kwargs=kwargs@entry=0x0) at Objects/methodobject.c:234
#8 0x0000000000499815 in _PyCFunction_FastCallKeywords (func=func@entry=0x7ffff0f2f8c8, stack=stack@entry=0x105afe8, nargs=nargs@entry=2,
kwnames=kwnames@entry=0x0) at Objects/methodobject.c:295
#9 0x0000000000537b6f in call_function (pp_stack=pp_stack@entry=0x7fffffff5cd0, oparg=oparg@entry=2, kwnames=kwnames@entry=0x0)
at Python/ceval.c:4793 From the backtrace we can see PyDict_SetDefault breaks the invariant. And reading the code, yes, it doesn't handle split table separately. I simply replace the logic in PyDict_SetDefault with insertdict to make a test. It doesn't fail, even with dictresize4. An easy example to reproduce: >>> class C:
... pass
...
>>> c1, c2 = C(), C()
>>> c1.a, c1.b = 1, 2
>>> c2.__dict__.setdefault('b', None)
python: Objects/dictobject.c:498: _PyDict_CheckConsistency: Assertion `mp->ma_values[i] != ((void *)0)' failed.
Aborted (core dumped) |
Thanks, Xiang. Shard-key dict is very hard to maintain... |
New changeset 39f33c15243b by Serhiy Storchaka in branch 'default': |
Re-committed. It might be dangerous to commit this in 3.6 at that stage. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: