Issue 46713: Provide a C implementation of collections.abc.KeysView and friends

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/90869

classification

Title:	Provide a C implementation of collections.abc.KeysView and friends
Type:	performance	Stage:
Components:		Versions:	Python 3.11

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	jab, rhettinger
Priority:	normal	Keywords:

Created on 2022-02-10 19:19 by jab, last changed 2022-04-11 14:59 by admin.

Messages (3)
msg413020 - (view)	Author: Joshua Bronson (jab) *	Date: 2022-02-10 19:19
As suggested by @rhettinger in https://bugs.python.org/msg409443, I'm creating a feature request for C implementations of collections.abc.KeysView, ValuesView, and ItemsView. Because these do not currently benefit from C speedups, they're a lot slower than their dict_keys, dict_values, and dict_items counterparts. As a result, libraries that implement custom Mapping types that are backed by dicts are incentivized to override the implementations of keys(), values(), and items() they inherit from collections.abc.Mapping to instead return their backing dicts' mapping views, causing a potential abstraction leak. An example can be found in https://github.com/jab/bidict, which implements bidirectional mapping types that wrap a forward and an inverse dict which are kept in sync with one another. >>> from bidict import * >>> bi = bidict({1: 'one', 2: 'two'}) >>> bi.items() # Overridden for performance: dict_items([(1, 'one'), (2, 'two')]) Ditto for OrderedBidict: >>> OrderedBidict(bi).keys() _OrderedBidictItemsView(OrderedBidict([(1, 'one'), (2, 'two')])) (The _OrderedBidictItemsView is a custom view whose __iter__ uses the implementation inherited by its collections.abc.ItemsView base class so that the correct order is respected, but proxies other method calls through to the backing dict's dict_items object: https://github.com/jab/bidict/blob/2ab42a/bidict/_orderedbidict.py#L90-L150) Here is a microbenchmark of calling __eq__ on an _OrderedBidictItemsView vs. a collections.abc.ItemsView, to estimate the performance impact (using Python 3.10): ❯ set setup ' from collections.abc import ItemsView from bidict import OrderedBidict d = dict(zip(range(9999), range(9999))) ob = OrderedBidict(d)' ❯ python -m pyperf timeit -s $setup 'ob.items() == d.items()' -o 1.json ❯ python -m pyperf timeit -s $setup 'ItemsView(ob) == d.items()' -o 2.json ❯ pyperf compare_to 2.json 1.json Mean +- std dev: [2] 4.21 ms +- 1.10 ms -> [1] 168 us +- 6 us: 25.13x faster This demonstrates a potentially significant speedup. Similar microbenchmarks for ItemsView vs. dict_items, as well as KeysView vs. both dict_keys and _OrderedBidictKeysView, also indicate similarly significant potential. Note that the performance benefits of this may propagate to other code as well. For example, bidicts' __eq__ methods are implemented in terms of their itemsviews (see https://github.com/jab/bidict/blob/2ab42a/bidict/_base.py#L285-L286), so speeding up bidict.items().__eq__ speeds up bidict.__eq__ commensurately.
msg413093 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2022-02-11 18:55
Some thoughts: * Other than set operations, most of the pure python code in the dict view ABCs are fast pass throughs. There is no point in rewriting these in C: def __contains__(self, key): return key in self._mapping def __iter__(self): yield from self._mapping def __len__(self): return len(self._mapping) * For the set operations, the pure python code has only a simple for-loop and contains test. There isn't much fat to be cut: def __ge__(self, other): if not isinstance(other, Set): return NotImplemented if len(self) < len(other): return False for elem in other: if elem not in self: return False return True * Possibly the eval-loop overhead can be eliminated with Python by using itertools: def __ge__(self, other): if not isinstance(other, Set): return NotImplemented if len(self) < len(other): return False for elem in filterfalse(self.__contains__, other): return False return True * That leaves the question of why the dict views are so much faster (presuming that the posted timings are representative). I haven't looked in detail, but the first candidate that comes to mind is that dictviews_to_set() has a fast path for exact dicts. That lets it bypass the mapping proxy and exploit the fast path in PySet_New for exact dicts. That fast path reuses the stored hash values and exploits knowing that the input has no duplicates. Off-hand, I don't see how that can generalize to the arbitrary mappings supports by the view ABCs. * Summary: A first look, it doesn't seem like a C rewrite of the view ABCs would bear fruit.
msg413095 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2022-02-11 20:10
- arbitrary mappings supports by the view ABCs + arbitrary mappings supported by the view ABCs - A first look, + At first glance,

History
Date	User	Action	Args
2022-04-11 14:59:56	admin	set	github: 90869
2022-02-11 20:10:43	rhettinger	set	messages: + msg413095
2022-02-11 18:55:55	rhettinger	set	messages: + msg413093
2022-02-10 22:32:24	AlexWaygood	set	nosy: + rhettinger type: performance versions: + Python 3.11
2022-02-10 19:19:45	jab	create