Issue 31954: Don't prevent dict optimization by coupling with OrderedDict

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/76135

classification

Title:	Don't prevent dict optimization by coupling with OrderedDict
Type:	enhancement	Stage:	patch review
Components:	Interpreter Core	Versions:	Python 3.8

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	eric.snow, methane, rhettinger, serhiy.storchaka, tim.peters
Priority:	normal	Keywords:	patch

Created on 2017-11-06 09:38 by serhiy.storchaka, last changed 2022-04-11 14:58 by admin.

Pull Requests
URL	Status	Linked	Edit
PR 4292	open	serhiy.storchaka, 2017-11-06 09:40
PR 4901	open	serhiy.storchaka, 2017-12-16 11:56

Messages (15)
msg305623 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-11-06 09:38
Currently OrderedDict uses a table of nodes that mirrors the dict table. For keeping it in sync it saves the size and address of the dict table. There are two issues with this. First, this prevent some kind of dict optimization. When dict is resized (after exhausting usable entries at the end of table) it should allocate a new table even if it's size isn't changed. Second, this doesn't guarantees that both tables are in sync. If the dict table was reallocated twice before using OrderedDict methods, it can have the same size and address, but totally different layout of elements. Proposed PR adds a new flag to dict object. It is set when OrderedDict creates its table, and is cleared when dict reallocates its table or moves items in the same table.
msg305626 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-11-06 10:24
Later I'm going to add a flag that will allow regular dicts reuse holes when delete items, while keep OrderedDict and dicts used as class namespace ordered. I'm not sure there will be a benefit, but at least this will open an option.
msg305715 - (view)	Author: Inada Naoki (methane) *	Date: 2017-11-07 07:54
Quick microbench: ``` $ ./python -m perf timeit --compare-to=`pwd`/python.master -s 'd = dict.fromkeys(range(1000))' -- "for i in range(1000): del d[i] d[i]=i " python.master: ..................... 124 us +- 9 us python: ..................... 116 us +- 0 us Mean +- std dev: [python.master] 124 us +- 9 us -> [python] 116 us +- 0 us: 1.06x faster (-6%) ```
msg305721 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-11-07 08:51
Thank you for your microbenchmark Inada. The difference is small, but repeating it with different modifications almost always show small speedup. The only problem is that this change increases the size of dict object by one word (up to 3% for small dicts). I don't know what is better place for this flag, the dict object itself or the dict keys object.
msg305722 - (view)	Author: Inada Naoki (methane) *	Date: 2017-11-07 09:11
I don't know why dk_lookup is in dictkeys object. But I think it's because sharing 1 word from all key-sharing dict. So ma_clean flag can be in dictkeys object for same reason. BTW, We use dk_lookup function pointer and it tooks 1 word. But PyPy use flags for it. So they can pack other informations into same word. static dict_lookup_func lookup_funcs = {lookdict_unicode_nodummy, lookdict_unicode, lookdict_split, lookdict}; ... unsigned int ma_clean:1; unsigned int ma_lookup_func:2; // lookup_funcs[ma_lookup_func] ... In this way, we can have more flags for future optimization. (e.g. "all keys are interned string and comparing pointer is enough for searching interned key" flag).
msg308459 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-12-16 08:14
Could you please make a review Raymond?
msg308460 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2017-12-16 08:26
This looks reasonable to me. The previous design was too tightly coupled. The regular dict shouldn't have to care about the OrderedDict at all.
msg308462 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-12-16 08:32
But this increases the size of regular dicts. Is it appropriate?
msg308465 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2017-12-16 08:47
I also have misgivings about the extra field. Perhaps solicit some other opinions about whether the trade-off is worth it. Maybe Tim can provide an insight. My suspicion is that the current tight coupling is going to cost us in other ways. The OrderedDict is now less important than ever, so we really don't want regular dicts to have to pay any price just for the convenience of OrderedDict.
msg308473 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-12-16 12:00
PR 4901 implements Inada's idea about changing dk_lookup to an index. This decreases the size, but can have negative performance effect (I didn't make benchmarks).
msg308635 - (view)	Author: Inada Naoki (methane) *	Date: 2017-12-19 12:46
pyperformance doesn't show significant performance degration. (using commit 5d8d3d1) $ ./python -m perf compare_to -G --min-speed=2 default.json patched.json Slower (5): - pickle_list: 9.40 us +- 0.98 us -> 9.96 us +- 1.20 us: 1.06x slower (+6%) - pickle_dict: 63.1 us +- 0.6 us -> 65.6 us +- 5.2 us: 1.04x slower (+4%) - regex_effbot: 5.39 ms +- 0.09 ms -> 5.60 ms +- 0.35 ms: 1.04x slower (+4%) - genshi_xml: 188 ms +- 2 ms -> 193 ms +- 3 ms: 1.03x slower (+3%) - pickle: 22.4 us +- 0.2 us -> 22.9 us +- 0.3 us: 1.02x slower (+2%) Faster (5): - mako: 38.8 ms +- 0.2 ms -> 37.6 ms +- 0.2 ms: 1.03x faster (-3%) - regex_v8: 44.3 ms +- 0.7 ms -> 43.1 ms +- 0.4 ms: 1.03x faster (-3%) - scimark_monte_carlo: 254 ms +- 10 ms -> 248 ms +- 6 ms: 1.03x faster (-3%) - scimark_fft: 740 ms +- 20 ms -> 721 ms +- 14 ms: 1.03x faster (-2%) - regex_dna: 289 ms +- 5 ms -> 282 ms +- 9 ms: 1.02x faster (-2%) Benchmark hidden because not significant (50)...
msg340183 - (view)	Author: Inada Naoki (methane) *	Date: 2019-04-14 05:57
There are some more aggressive ideas. When Eric created C version of OrderedDict, he intended to use it for PEP 468. Unlike pure Python implementation, PyDict_GetItem returns value, not node of linked list. But now, PEP 468 is implemented in regular dict. How about raising DeprecationWarning when OrderedDict is passed to PyDict_* APIs? LRU implementation of functools is much more efficient than OrderedDict. OrderedDict can be achieve same performance and efficiency when node of linked list is stored in underlaying dict.
msg340185 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2019-04-14 06:17
It the pure Python implementation PyDict_GetItem also returns value, not node of linked list. > How about raising DeprecationWarning when OrderedDict is passed to PyDict_* APIs? This would violate the Liskov substitution principle and add an overhead for using PyDict_* APIs with regular dicts.
msg340289 - (view)	Author: Eric Snow (eric.snow) *	Date: 2019-04-15 16:07
Please don't miss the fact that the main reason for mirroring the dict table is to get O(1) node lookup (in the linked list). Otherwise most lookup-dependent operations, like __delitem__(), would become O(n); whereas in the pure-Python implementation they are O(1). This is all explained in the notes at the top of Objects/odictobject.c. Also, I didn't change anything in the dict implementation to rely on the OrderedDict implementation. So while I would say OrderedDict is coupled to dict, I wouldn't say the reverse, that dict is coupled to OrderedDict. If dict changes then OrderedDict must be updated apporpropriately, but not vice-versa. That should still hold.
msg340306 - (view)	Author: Inada Naoki (methane) *	Date: 2019-04-15 23:11
@Serhiy > It the pure Python implementation PyDict_GetItem also > returns value, not node of linked list. I missed pure Python implementation used two dicts. @Eric > Please don't miss the fact that the main reason for mirroring the dict table is to get O(1) node lookup (in the linked list). I don't miss it, of course. I'm proposing make linked list node as Python Object, and store it directly into dict, like LRU implementation in _functools. In this idea, if dict.__getitem__ is called directly, a node of linked list is returned instead of value in the node. I must admit this idea is too aggressive. If we can redesign OrderedDict from scratch, I propose OrderedDict uses dict, without inheriting it. But it is too late.

History
Date	User	Action	Args
2022-04-11 14:58:54	admin	set	github: 76135
2019-04-15 23:11:52	methane	set	messages: + msg340306
2019-04-15 16:07:07	eric.snow	set	messages: + msg340289
2019-04-14 06:17:54	serhiy.storchaka	set	messages: + msg340185
2019-04-14 05:57:56	methane	set	nosy: + eric.snow messages: + msg340183
2019-04-12 23:26:20	cheryl.sabella	set	versions: + Python 3.8, - Python 3.7
2017-12-19 12:46:33	methane	set	messages: + msg308635
2017-12-16 12:00:02	serhiy.storchaka	set	messages: + msg308473
2017-12-16 11:56:50	serhiy.storchaka	set	pull_requests: + pull_request4796
2017-12-16 08:47:00	rhettinger	set	assignee: rhettinger -> messages: + msg308465
2017-12-16 08:32:11	serhiy.storchaka	set	messages: + msg308462
2017-12-16 08:26:57	rhettinger	set	messages: + msg308460
2017-12-16 08:14:21	serhiy.storchaka	set	assignee: rhettinger messages: + msg308459
2017-11-07 09:11:08	methane	set	messages: + msg305722
2017-11-07 08:51:15	serhiy.storchaka	set	messages: + msg305721
2017-11-07 07:54:46	methane	set	messages: + msg305715
2017-11-06 10:24:11	serhiy.storchaka	set	messages: + msg305626
2017-11-06 09:40:47	serhiy.storchaka	set	keywords: + patch stage: patch review pull_requests: + pull_request4254
2017-11-06 09:38:01	serhiy.storchaka	create