This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Provide a C implementation of collections.abc.KeysView and friends
Type: performance Stage:
Components: Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: jab, rhettinger
Priority: normal Keywords:

Created on 2022-02-10 19:19 by jab, last changed 2022-04-11 14:59 by admin.

Messages (3)
msg413020 - (view) Author: Joshua Bronson (jab) * Date: 2022-02-10 19:19
As suggested by @rhettinger in https://bugs.python.org/msg409443, I'm creating a feature request for C implementations of collections.abc.KeysView, ValuesView, and ItemsView.

Because these do not currently benefit from C speedups, they're a lot slower than their dict_keys, dict_values, and dict_items counterparts. As a result, libraries that implement custom Mapping types that are backed by dicts are incentivized to override the implementations of keys(), values(), and items() they inherit from collections.abc.Mapping to instead return their backing dicts' mapping views, causing a potential abstraction leak.

An example can be found in https://github.com/jab/bidict, which implements bidirectional mapping types that wrap a forward and an inverse dict which are kept in sync with one another.

>>> from bidict import *
>>> bi = bidict({1: 'one', 2: 'two'})
>>> bi.items()  # Overridden for performance:
dict_items([(1, 'one'), (2, 'two')])

Ditto for OrderedBidict:

>>> OrderedBidict(bi).keys()
_OrderedBidictItemsView(OrderedBidict([(1, 'one'), (2, 'two')]))


(The _OrderedBidictItemsView is a custom view whose __iter__ uses the implementation inherited by its collections.abc.ItemsView base class so that the correct order is respected, but proxies other method calls through to the backing dict's dict_items object: https://github.com/jab/bidict/blob/2ab42a/bidict/_orderedbidict.py#L90-L150)


Here is a microbenchmark of calling __eq__ on an _OrderedBidictItemsView vs. a collections.abc.ItemsView, to estimate the performance impact (using Python 3.10):

❯ set setup '
    from collections.abc import ItemsView
    from bidict import OrderedBidict
    d = dict(zip(range(9999), range(9999)))
    ob = OrderedBidict(d)'

❯ python -m pyperf timeit -s $setup 'ob.items() == d.items()' -o 1.json

❯ python -m pyperf timeit -s $setup 'ItemsView(ob) == d.items()' -o 2.json

❯ pyperf compare_to 2.json 1.json
Mean +- std dev: [2] 4.21 ms +- 1.10 ms -> [1] 168 us +- 6 us: 25.13x faster


This demonstrates a potentially significant speedup. Similar microbenchmarks for ItemsView vs. dict_items, as well as KeysView vs. both dict_keys and _OrderedBidictKeysView, also indicate similarly significant potential.

Note that the performance benefits of this may propagate to other code as well. For example, bidicts' __eq__ methods are implemented in terms of their itemsviews (see https://github.com/jab/bidict/blob/2ab42a/bidict/_base.py#L285-L286), so speeding up bidict.items().__eq__ speeds up bidict.__eq__ commensurately.
msg413093 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2022-02-11 18:55
Some thoughts:

* Other than set operations, most of the pure python code in the dict view ABCs are fast pass throughs.  There is no point in rewriting these in C:

    def __contains__(self, key):
        return key in self._mapping

    def __iter__(self):
        yield from self._mapping

    def __len__(self):
        return len(self._mapping)

* For the set operations, the pure python code has only a simple for-loop and contains test.  There isn't much fat to be cut:

    def __ge__(self, other):
        if not isinstance(other, Set):
            return NotImplemented
        if len(self) < len(other):
            return False
        for elem in other:
            if elem not in self:
                return False
        return True

* Possibly the eval-loop overhead can be eliminated with Python by using itertools:

    def __ge__(self, other):
        if not isinstance(other, Set):
            return NotImplemented
        if len(self) < len(other):
            return False
        for elem in filterfalse(self.__contains__, other):
            return False
        return True

* That leaves the question of why the dict views are so much faster (presuming that the posted timings are representative).  I haven't looked in detail, but the first candidate that comes to mind is that dictviews_to_set() has a fast path for exact dicts.  That lets it bypass the mapping proxy and exploit the fast path in PySet_New for exact dicts.  That fast path reuses the stored hash values and exploits knowing that the input has no duplicates.  Off-hand, I don't see how that can generalize to the arbitrary mappings supports by the view ABCs.  

* Summary:  A first look, it doesn't seem like a C rewrite of the view ABCs would bear fruit.
msg413095 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2022-02-11 20:10
- arbitrary mappings supports by the view ABCs
+ arbitrary mappings supported by the view ABCs

- A first look,
+ At first glance,
History
Date User Action Args
2022-04-11 14:59:56adminsetgithub: 90869
2022-02-11 20:10:43rhettingersetmessages: + msg413095
2022-02-11 18:55:55rhettingersetmessages: + msg413093
2022-02-10 22:32:24AlexWaygoodsetnosy: + rhettinger

type: performance
versions: + Python 3.11
2022-02-10 19:19:45jabcreate