Issue 16562: Optimize dict equality test

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/60766

classification

Title:	Optimize dict equality test
Type:	performance	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.4

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	jcea, loewis, pitrou, python-dev, rhettinger, serhiy.storchaka, vstinner
Priority:	low	Keywords:	easy, patch

Created on 2012-11-26 20:26 by rhettinger, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
dict_equal_hash_reuse.patch	serhiy.storchaka, 2012-11-26 21:26		review

Messages (7)
msg176453 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2012-11-26 20:26
The code for dict_equal() in Objects/dictobject.c currently loops over the key/value pairs in self and uses PyDict_GetItem() to check for the corresponding key/value pair in the other dictionary. This causes an unnecessary call to PyObject_Hash(). Instead, the code should loop over the key/value/hash triplets in self and do a direct lookup in the other dictionary with " ep = (otherdict->ma_lookup)(otherdict, key, hash)". The reuses the known hash value for the key; thereby avoiding the potentially slow call to PyObject_Hash(). See _PyDict_Contains() for an example of how to do a lookup when the hash value is already known. Note, the optimized path should be used only when PyDict_CheckExact() is true.
msg176456 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2012-11-26 21:26
Here is a simple patch. > Note, the optimized path should be used only when PyDict_CheckExact() is true. Actually this is not needed. dict_equal() uses the same code for dict subclasses.
msg176485 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2012-11-27 19:26
The patch looks good to me.
msg176488 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2012-11-27 20:10
And here is a synthetic microbenchmark: $ ./python -m timeit -s "n=10*3; k=2; a={(i,)k:i for i in range(n)}; b={(i,)k:i for i in range(n)}" "a == b" Vanilla: 251 usec per loop Patched: 195 usec per loop $ ./python -m timeit -s "n=103; k=2; a={(i,)k:i for i in range(n)}; b=dict(a)" "a == b" Vanilla: 116 usec per loop Patched: 58.6 usec per loop The use of tuple keys is quite common.
msg176506 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2012-11-28 02:11
The patch looks correct. If the tests pass, go ahead and apply it.
msg176801 - (view)	Author: Roundup Robot (python-dev)	Date: 2012-12-02 18:11
New changeset d12785ecca72 by Antoine Pitrou in branch 'default': Issue #16562: Optimize dict equality testing. http://hg.python.org/cpython/rev/d12785ecca72
msg176802 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2012-12-02 18:12
> The patch looks correct. If the tests pass, go ahead and apply it. Done.

History
Date	User	Action	Args
2022-04-11 14:57:38	admin	set	github: 60766
2012-12-02 18:12:09	pitrou	set	status: open -> closed nosy: + pitrou messages: + msg176802 resolution: fixed stage: patch review -> resolved
2012-12-02 18:11:43	python-dev	set	nosy: + python-dev messages: + msg176801
2012-11-28 02:11:08	rhettinger	set	messages: + msg176506
2012-11-27 20:54:53	vstinner	set	nosy: + vstinner
2012-11-27 20:10:28	serhiy.storchaka	set	messages: + msg176488
2012-11-27 19:26:57	loewis	set	nosy: + loewis messages: + msg176485
2012-11-26 21:26:17	serhiy.storchaka	set	files: + dict_equal_hash_reuse.patch nosy: + serhiy.storchaka messages: + msg176456 keywords: + patch stage: patch review
2012-11-26 20:40:41	jcea	set	nosy: + jcea
2012-11-26 20:26:14	rhettinger	create