classification
Title: Optimize dict equality test
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: haypo, jcea, loewis, pitrou, python-dev, rhettinger, serhiy.storchaka
Priority: low Keywords: easy, patch

Created on 2012-11-26 20:26 by rhettinger, last changed 2012-12-02 18:12 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
dict_equal_hash_reuse.patch serhiy.storchaka, 2012-11-26 21:26 review
Messages (7)
msg176453 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2012-11-26 20:26
The code for dict_equal() in Objects/dictobject.c currently loops over the key/value pairs in self and uses PyDict_GetItem() to check for the corresponding key/value pair in the other dictionary.  This causes an unnecessary call to PyObject_Hash().

Instead, the code should loop over the key/value/hash triplets in self and do a direct lookup in the other dictionary with " ep = (otherdict->ma_lookup)(otherdict, key, hash)".   The reuses the known hash value for the key; thereby avoiding the potentially slow call to PyObject_Hash().

See _PyDict_Contains() for an example of how to do a lookup when the hash value is already known.

Note, the optimized path should be used only when PyDict_CheckExact() is true.
msg176456 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-26 21:26
Here is a simple patch.

> Note, the optimized path should be used only when PyDict_CheckExact() is true.

Actually this is not needed. dict_equal() uses the same code for dict subclasses.
msg176485 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-11-27 19:26
The patch looks good to me.
msg176488 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-27 20:10
And here is a synthetic microbenchmark:

$ ./python -m timeit -s "n=10**3; k=2; a={(i,)*k:i for i in range(n)}; b={(i,)*k:i for i in range(n)}"  "a == b"

Vanilla: 251 usec per loop
Patched: 195 usec per loop

$ ./python -m timeit -s "n=10**3; k=2; a={(i,)*k:i for i in range(n)}; b=dict(a)"  "a == b"

Vanilla: 116 usec per loop
Patched: 58.6 usec per loop

The use of tuple keys is quite common.
msg176506 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2012-11-28 02:11
The patch looks correct.  If the tests pass, go ahead and apply it.
msg176801 - (view) Author: Roundup Robot (python-dev) Date: 2012-12-02 18:11
New changeset d12785ecca72 by Antoine Pitrou in branch 'default':
Issue #16562: Optimize dict equality testing.
http://hg.python.org/cpython/rev/d12785ecca72
msg176802 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-12-02 18:12
> The patch looks correct.  If the tests pass, go ahead and apply it.

Done.
History
Date User Action Args
2012-12-02 18:12:09pitrousetstatus: open -> closed

nosy: + pitrou
messages: + msg176802

resolution: fixed
stage: patch review -> resolved
2012-12-02 18:11:43python-devsetnosy: + python-dev
messages: + msg176801
2012-11-28 02:11:08rhettingersetmessages: + msg176506
2012-11-27 20:54:53hayposetnosy: + haypo
2012-11-27 20:10:28serhiy.storchakasetmessages: + msg176488
2012-11-27 19:26:57loewissetnosy: + loewis
messages: + msg176485
2012-11-26 21:26:17serhiy.storchakasetfiles: + dict_equal_hash_reuse.patch

nosy: + serhiy.storchaka
messages: + msg176456

keywords: + patch
stage: patch review
2012-11-26 20:40:41jceasetnosy: + jcea
2012-11-26 20:26:14rhettingercreate