Issue 23601: use small object allocator for dict key storage

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/67789

classification

Title:	use small object allocator for dict key storage
Type:	performance	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.6

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	rhettinger	Nosy List:	Mark.Shannon, jtaylor, lemburg, pitrou, python-dev, rhettinger, scoder, serhiy.storchaka, tim.peters, vstinner
Priority:	normal	Keywords:	patch

Created on 2015-03-07 11:24 by jtaylor, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
0001-use-small-object-allocator-for-keys-object.patch	jtaylor, 2015-03-07 11:24

Messages (24)
msg237439 - (view)	Author: Julian Taylor (jtaylor)	Date: 2015-03-07 11:24
dictionary creation spends a not insignificant amount of time in malloc allocating keys objects. Python has a nice small object allocator that avoids a lot of this overhead and falls back to malloc for larger allocations. Is there a reason the dictionary does not use that allocator for its keys objects? doing so e.g. via attached incomplete patch improves small dict creation performance by 15%. import timeit print(timeit.repeat("dict(a=5, b=2)")) with change: [0.42825599999923725, 0.4272580000015296, 0.4362329999985377] without [0.5160610000002634, 0.5181720000000496, 0.518421999999191] or is there something I am overlooking and the use of PyMem_Malloc instead of PyObject_Malloc is an intentional design decision?
msg237442 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-03-07 12:57
See also issue16465.
msg237444 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2015-03-07 13:06
Interesting. I can reproduce the speedup on 64-bit Linux (Ubuntu 14.10).
msg237445 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-03-07 13:16
$ ./python -m timeit "dict(a=5, b=2)" Unpatched: 100000 loops, best of 3: 2.5 usec per loop issue16465 patch: 1000000 loops, best of 3: 1.87 usec per loop issue23601 patch: 1000000 loops, best of 3: 1.98 usec per loop
msg237448 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2015-03-07 13:43
There seem to be quite a few other places where this simple optimization could make sense as well, perhaps even going as far as converting all uses of PyMem_MALLOC to PyObject_MALLOC. There was a time when the small memory allocator did not return free arenas to the system allocator, but those are long ago. The only detail to watch out for is not mixing the malloc/free APIs. In particular, memory which is allocated in the interpreter and then deallocated elsewhere in extensions needs to continue using the PyMem_* APIs.
msg237473 - (view)	Author: Mark Shannon (Mark.Shannon) *	Date: 2015-03-07 20:03
I don't remember why PyMem_Malloc rather than PyObject_MALLOC was used, it may have been inherited from the allocation of dict tables in the earlier implementation. My only concern is that the benchmark only tests performance for very small dictionaries. The small object allocator is limited to 512 bytes before it falls back on malloc, so for larger dict keys the speed up would vanish. A benchmark with a range of sizes would be more convincing, if only to show that it is no slower for larger dicts.
msg237477 - (view)	Author: Julian Taylor (jtaylor)	Date: 2015-03-07 21:06
PyObject_Malloc just calls malloc above the threshold so there is no problem for larger dicts. For larger dicts the performance of malloc is also irrelevant as the time will be spent elsewhere.
msg246520 - (view)	Author: Mark Shannon (Mark.Shannon) *	Date: 2015-07-09 19:43
I still think that this is a good idea, but I would like to see a small speed test for large objects. Just to be sure that it is no slower.
msg246522 - (view)	Author: Julian Taylor (jtaylor)	Date: 2015-07-09 20:01
Large objects are just if size > 512: return malloc(size) there is no reason it should be slower. Also for large objects allocation speed does not matter as much.
msg246526 - (view)	Author: Mark Shannon (Mark.Shannon) *	Date: 2015-07-09 20:46
Indeed there is no obvious reason why they should be slower. But when it comes to optimisation, never trust your (or anyone else's) intuition. Running a simple check is always worth the effort.
msg246544 - (view)	Author: Stefan Behnel (scoder) *	Date: 2015-07-10 04:34
It's generally worth running the benchmark suite for this kind of optimisation. Being mostly Python code, it should benefit quite clearly from dictionary improvements, but it should also give an idea of how much of an improvement actual Python code (and not just micro-benchmarks) can show. And it can help detecting unexpected regressions that would not necessarily be revealed by micro-benchmarks. https://hg.python.org/benchmarks/ And I'm with Mark: when it comes to performance optimisations, repeating even a firm intuition doesn't save us from validating that this intuition actually matches reality. Anything that seems obvious at first sight may still be proven wrong by benchmarks, and has often enough been so in the past.
msg246546 - (view)	Author: Julian Taylor (jtaylor)	Date: 2015-07-10 07:32
Your benchmarks are not affected by this change see the other issue. They are also not representative of every workload out there. I can at least see the argument why you didn't want to put the other variant of this change in as it made the code a tiny bit more complicated, but I do not understand the reluctance for this variant. It doesn't change the complexity of the code one bit. If you doubt the performance of pythons own small object allocator, python should maybe stop using it alltogether?
msg246571 - (view)	Author: Stefan Behnel (scoder) *	Date: 2015-07-10 17:26
> Your benchmarks are not affected by this change see the other issue. Then you ran them, I assume? Could you show the output here that you got? > They are also not representative of every workload out there. Certainly not, but they do provide a certain variety of workloads that reduce the likelihood of not spotting a regression that this change might introduce. Note that people seem to be rather in favour of your proposed change. If you can show that there are no visible drawbacks, one of them will most likely apply it for you.
msg246603 - (view)	Author: Julian Taylor (jtaylor)	Date: 2015-07-11 12:53
ok I ran it again, but note the machine was under use the full time so the results are likely have no meaning. python perf.py -r -b default /tmp/normal/bin/python3 /tmp/opt/bin/python3 Min: 0.399279 -> 0.376527: 1.06x faster Avg: 0.410819 -> 0.383315: 1.07x faster Significant (t=49.29) Stddev: 0.00450 -> 0.00330: 1.3631x smaller ### etree_iterparse ### Min: 0.639638 -> 0.630989: 1.01x faster Avg: 0.658744 -> 0.641842: 1.03x faster Significant (t=14.82) Stddev: 0.00959 -> 0.00617: 1.5557x smaller ### etree_parse ### Min: 0.433050 -> 0.377830: 1.15x faster Avg: 0.444014 -> 0.389695: 1.14x faster Significant (t=43.28) Stddev: 0.01010 -> 0.00745: 1.3570x smaller ### tornado_http ### Min: 0.335834 -> 0.326492: 1.03x faster Avg: 0.346100 -> 0.334186: 1.04x faster Significant (t=13.66) Stddev: 0.01024 -> 0.00689: 1.4864x smaller The following not significant results are hidden, use -v to show them: 2to3, django_v2, etree_process, fastpickle, fastunpickle, json_dump_v2, json_load, nbody, regex_v8. /tmp$ /tmp/normal/bin/python3 -c 'import timeit; print(timeit.repeat("dict(a=5, b=2)"))' [0.5112445619997743, 0.5141109460000735, 0.5185121280010208] /tmp$ /tmp/opt/bin/python3 -c 'import timeit; print(timeit.repeat("dict(a=5, b=2)"))' [0.4426167189994885, 0.4465744609988178, 0.4467797579982289]
msg246791 - (view)	Author: Stefan Behnel (scoder) *	Date: 2015-07-16 06:40
Benchmark results look good to me (although a header line is missing) and seem to match my expectations. Sounds like we should allow this change.
msg246822 - (view)	Author: Mark Shannon (Mark.Shannon) *	Date: 2015-07-16 20:36
+1 from me.
msg246825 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-07-16 20:50
If use small object allocator for dict key storage, why not use it for dict value storage?
msg246829 - (view)	Author: Mark Shannon (Mark.Shannon) *	Date: 2015-07-16 20:59
Yes, but that shouldn't block this issue. I've opened issue 24648 instead.
msg259217 - (view)	Author: Julian Taylor (jtaylor)	Date: 2016-01-29 17:45
ping, this has been sitting for 4 years and two python releases. Its about time this stupidly simple thing gets merged.
msg259218 - (view)	Author: Tim Peters (tim.peters) *	Date: 2016-01-29 17:54
+1 from me. Julian, you have the patience of a saint ;-)
msg259236 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2016-01-30 05:06
If there are no objects, I'll apply the patch tomorrow.
msg259284 - (view)	Author: Roundup Robot (python-dev)	Date: 2016-01-31 16:56
New changeset 425fec4a79c7 by Raymond Hettinger in branch 'default': Issue #23601: Use small object allocator for dict key objects https://hg.python.org/cpython/rev/425fec4a79c7
msg259286 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2016-01-31 16:57
Thanks for the patch and for your patience.
msg259291 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-01-31 17:48
Nice change. I opened the issue #26249 to continue the investigation.

History
Date	User	Action	Args
2022-04-11 14:58:13	admin	set	github: 67789
2016-03-11 13:02:04	serhiy.storchaka	link	issue16465 superseder
2016-01-31 17:48:49	vstinner	set	nosy: + vstinner messages: + msg259291
2016-01-31 16:57:34	rhettinger	set	status: open -> closed resolution: fixed messages: + msg259286 stage: patch review -> resolved
2016-01-31 16:56:33	python-dev	set	nosy: + python-dev messages: + msg259284
2016-01-30 05:06:39	rhettinger	set	assignee: rhettinger messages: + msg259236 versions: + Python 3.6, - Python 3.5
2016-01-29 17:54:14	tim.peters	set	messages: + msg259218
2016-01-29 17:45:15	jtaylor	set	messages: + msg259217
2015-07-16 20:59:18	Mark.Shannon	set	messages: + msg246829
2015-07-16 20:50:32	serhiy.storchaka	set	messages: + msg246825
2015-07-16 20:36:43	Mark.Shannon	set	messages: + msg246822
2015-07-16 06:40:25	scoder	set	messages: + msg246791
2015-07-11 12:53:44	jtaylor	set	messages: + msg246603
2015-07-10 17:26:56	scoder	set	messages: + msg246571
2015-07-10 07:32:49	jtaylor	set	messages: + msg246546
2015-07-10 04:34:50	scoder	set	nosy: + scoder messages: + msg246544
2015-07-09 20:46:15	Mark.Shannon	set	messages: + msg246526
2015-07-09 20:01:02	jtaylor	set	messages: + msg246522
2015-07-09 19:43:52	Mark.Shannon	set	messages: + msg246520
2015-03-07 21:06:57	jtaylor	set	messages: + msg237477
2015-03-07 20:03:00	Mark.Shannon	set	messages: + msg237473
2015-03-07 13:43:18	lemburg	set	messages: + msg237448
2015-03-07 13:16:49	serhiy.storchaka	set	messages: + msg237445
2015-03-07 13:06:22	pitrou	set	type: performance messages: + msg237444 stage: patch review
2015-03-07 12:57:11	serhiy.storchaka	set	nosy: + rhettinger, pitrou, serhiy.storchaka messages: + msg237442
2015-03-07 12:48:33	Mark.Shannon	set	nosy: + Mark.Shannon
2015-03-07 11:40:40	SilentGhost	set	nosy: + lemburg, tim.peters
2015-03-07 11:24:41	jtaylor	create