This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: New .mapping attribute is broken for some existing uses of dict views
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.10
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: jab, rhettinger
Priority: normal Keywords:

Created on 2021-10-29 16:36 by jab, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (6)
msg405317 - (view) Author: Joshua Bronson (jab) * Date: 2021-10-29 16:36
As of bpo-40890 (released in Python 3.10), dict views now provide a public .mapping attribute, intended to allow users to recover a mappingproxy pointing to the original mapping.

However, this new attribute can actually point to the wrong mapping for some existing uses of dict views. And since the .mapping attribute is read-only, these existing uses have no way to set it to the correct value.

My bidict library (see https://github.com/jab/bidict) provides an example of this.

A bidict implements a bidirectional mapping by building on top of two dicts (i.e. regular old one-directional mappings) internally -- one for the forward direction, and one for the inverse. When you call e.g. keys() or values() on a bidict, you get back a dict_keys view from one of the backing dicts, because this is a much more optimized implementation of these views than collections.abc.KeysView would be:

>>> import bidict
>>> b = bidict.bidict(one=1, two=2)
>>> b
bidict({'one': 1, 'two': 2})
>>> b.keys()
dict_keys(['one', 'two'])
>>> b.values()
dict_keys([1, 2])


However, now that these built-in dict_keys objects provide a .mapping attribute in Python 3.10, it points to one of the internal, backing dicts in this case, which is an implementation detail, rather than to the bidict instance:

>>> b.keys().mapping  # wrong
mappingproxy({'one': 1, 'two': 2})
>>> b.values().mapping  # wrong
mappingproxy({1: 'one', 2: 'two'})


Instead of the above, you should get:

>>> b.keys().mapping  # corrected:
mappingproxy(bidict({'one': 1, 'two': 2}))
>>> b.values().mapping  # corrected:
mappingproxy(bidict({'one': 1, 'two': 2}))


Since the .mapping attribute is read-only, there's no way for bidict to both keep exposing the optimized dict_keys implementations, which up till now have been perfectly correct, while now exposing a correct .mapping attribute for users of Python 3.10+.

(Other bidict types demonstrate this problem more by exposing even more obviously-unintended implementation details via this new .mapping attribute:

>>> f = bidict.FrozenOrderedBidict(b)
>>> f
FrozenOrderedBidict([('one', 1), ('two', 2)])
>>> f.keys().mapping  # ouch
mappingproxy({'one': _Node(prv=..., self=..., nxt=...), 'two': _Node(prv=..., self=..., nxt=...)})

Those internal _Node objects were never meant to be exposed to consumers, they're an implementation detail.)


It looks like cases like this were not considered when discussing bpo-40890, and if they had been, I wonder if the implementation would have been accepted as-is.

Here are some potential directions for how to improve things for the future:

1. Expose a way for dict view users like bidict to continue to use optimized dict view implementations while opting out of the new .mapping attribute

2. Make the .mapping attribute no longer read-only, so libraries like bidict can set it correctly before exposing it to users

3. Merely update the documentation in https://docs.python.org/3/library/stdtypes.html#:~:text=dictview.mapping,in%20version%203.10. to mention that, because the .mapping attribute is read-only, it may not point to the original, intended mapping, but rather some internal mapping that the user was not intended to be exposed to.


Looking forward to hearing your thoughts, and thanks for your consideration.
msg405417 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-10-31 19:46
Sorry, but this isn't bug in Python.  The documented and supported API is the MappingView ABC where the _mapping attribute is private.

As an optimization, the bidict project has elected to forgo the supported API and instead use a concrete implementation designed specifically for actual dictionaries.  This is an unsupported use and the constructor is not a public API.

We don't even do this internally in the standard library. OrderedDict, for example, has its own odict_keys() type for the C version and _OrderedDictKeysView() for the pure python version.

It would be possible for us to add a set_mapping() method or make the attribute writeable but that would send us down the path for having to support non-dicts throughout and would tie our hands with respect to implementation techniques that rely on the mapping being an actual dictionary.  That isn't worth it for a code optimization and it loses the benefits arising from separate concrete and abstract classes.

Likewise, I don't think a documentation update makes sense because we don't document a constructor, so there is no documented way for a user to get into this position.

For the bidict project, several options come to mind:

1) To continue down the path of using dict_keys, dict_values, and dict_items, consider building a subclass that hides the mapping attribute or that raises a NotImplementedError. That is what collections.Counter() does with the inherited fromkeys() classmethod.

2) Build a C or Cython extension to implement an optimized concrete implementation designed specifically for bidict. That is what collections.OrderedDict() does for keys, values, and items.

3) Forgo the optimization and use the supported MappingView ABC.  That is what collections.ChainMap() does.

4) Just document that *mapping* behaves differently in bidict.

Thank you for the clear bug report with examples. It made it easier to drill into what was happening.
msg405421 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-10-31 21:43
Hmm, I'm looking through the bidict code a bit more.  Rather than saying the dict views are being used in an unsupported way, it is more accurate to say that your intended meaning for the *mapping* attribute differs from the published meaning.

Note, the implementation is already skating on thin ice.  The values() call unexpectedly returns an instance of dict_keys().  At first, I was surprised that this got past the type checker -- you can do set operations with KeysView but not with a ValuesView.

    >>> b.values()
    dict_keys([1, 2])

One suggestion is to document *mapping* do exactly what it currently does.  The mappingproxy means that you aren't promising a specific upstream implementation.   

Also consider that bidict could guarantee the meaning of *mapping* and its underlying store.  This would allow users to make fast conversions to other mapping types:

    # This is weird, but useful. 
    # It might be nice to guarantee it.
    >>> OrderedDict(b.values().mapping)
    >>> OrderedDict([(1, 'a'), (2, 'b')])
msg409431 - (view) Author: Joshua Bronson (jab) * Date: 2021-12-31 19:35
Dear Raymond,

Thanks so much for the detailed response!

I wonder if you could explain how this case is different from the following:

>>> c = collections.ChainMap({1: 1}, {1: 2})
>>> iter(c)
<dict_keyiterator object at ...>
>>> isinstance(c, dict)  # it's fine, we <3 duck typing!
False


Now suppose a new .mapping attribute were added to dict_keyiterator objects in a future version of Python, like the one that was added to dict view objects in Python 3.10. Then, wouldn't ChainMap find itself in a similar predicament as this issue is raising?

>>> iter(c).mapping
mappingproxy({1: None})


As in my example above, the {1: None} mapping here is an implementation detail of ChainMap.__iter__() that was never intended to be leaked to users. (Ref: <https://github.com/python/cpython/blob/e18d815/Lib/collections/__init__.py#L998-L1002>)

Note also that ChainMap.__iter__() returns a dict_keyiterator object even though issubclass(ChainMap, dict) is False. We consider this just fine though, because the dict_keyiterator object currently behaves exactly like the generator object we would get if the code had done a `yield from d` rather than a `return iter(d)`. Except for being faster.

This parallels the implementations of bidict.keys()/values()/items(), which currently return dict_keys/dict_values/dict_items objects generated from internal data, that behave exactly like KeysViews(b)/ValuesView(b)/ItemsView(b) would*, except for being faster. And, as this issue is raising, except for this new .mapping attribute in Python 3.10+ now leaking internal state.

* I even have the passing property-based tests to prove it: <https://github.com/jab/bidict/pull/217/files#diff-995af13b14fe897c5d200fa97ed88fad03e401b2fc0cc167624d482ea512ba96R431-R459> :)


(I also have counterpoints for your other feedback, but wanted to post this part first. And sorry for my delay in responding – hope it's not too late! And finally thanks so much again for considering this and for the time you took to give feedback on bidict – there's literally no better-qualified person in the world. I so appreciate it!)
msg409443 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2022-01-01 00:25
To the list of four options I suggested for bidict, you can add one more.  Create a feature request for a C implementation of collections.abc.KeyView. ISTM that your core issue is that bidict won't use the intended and guaranteed solution because the KeysView ABC is thought to be too slow (you should test this -- the `yield from self._mapping` code is very fast).


> .mapping attribute in Python 3.10+ now leaking internal state.

We do not agree on this essential point.  The "leak", as in "all abstractions are leaky", is in the bidict code, not in the dict_keys code.  The .mapping attribute itself is accurately refers to its underlying mapping.

This is consistent with how other wrappers provide a __wrapped__ attribute (or __fget__, __func__, func, maps, etc).  If some class ever uses an lru_cache, cache, property, boundmethod, partial object, or chainmap, then it is possible for the user to unwrap one level of abstraction.


> Now suppose a new .mapping attribute were added to dict_keyiterator objects in a future version of Python

We don't do that for iterator objects.  If we did, then ChainMap would have the same options as bidict has.  Most likely, we would leave it as an undocumented, non-guaranteed implementation detail (much like random.randrange is a bound method with a __func__ attribute rather than an actual function).

If we thought users really cared about the exact object returned and its full API, then we would have several options:
1) Document that that aspect of the API is non-guaranteed,
2) Document that that aspect is guaranteed.  This would lock in our design choice forever, but all allow users the opportunity to exploit the new capability.
3) Wrap the iterator to hide parts of the API we don't want to expose.  For example, return `chain(d)` instead of `iter(d)`.


> I even have the passing property-based tests to prove it

That tests an invented property that isn't true and wasn't ever intended to be true: "The API of the concrete dict_keys type is the same as the API of KeysView ABC".  The correct property is: "The API of the concrete dict_keys type implements the KeysView ABC but may do other things as well". This is the common and normal relationship between ABCs and concrete classes.  It is no more surprising that Mapping not supporting __or__ while dict does.


> I also have counterpoints for your other feedback,

I fear that we have an asymmetry of time available to explore every possible nuance including the counterfactual you gave in the last post.  Doing so eats heavily into my limited time available for development.

I don't think there is more that can be done here.  We already have an official and supported API for your purpose. The dict_keys concrete API is unlikely to ever be built-out as a general purpose optimization of the KeysView ABC (that violates a number of design principles regarding separation of concerns, tight coupling, premature optimization, and the proper relationship between concrete and abstract classes).

Please consider using one of the four options (now up to five) that I suggested previously.
msg413023 - (view) Author: Joshua Bronson (jab) * Date: 2022-02-10 20:48
Thank you for confirming that ChainMap.__iter__() would be in the same boat as bidict if a similar .mapping attribute were ever added to dict_keyiterators. The specifics of this issue are interesting, but even more interesting to me is whatever learnings we can generalize from this.


After testing that the performance impact would be significant, I created the feature request you suggested in https://bugs.python.org/issue46713. Thanks for suggesting that.


In the meantime, I've updated the relevant docstrings:

>>> help(b.keys)
keys() -> KeysView[~KT] method of bidict.bidict instance
    A set-like object providing a view on the contained keys.
    
    When *b._fwdm* is a :class:`dict`, *b.keys()* returns a
    *dict_keys* object that behaves exactly the same as
    *collections.abc.KeysView(b)*, except for
    
      - offering better performance
    
      - being reversible on Python 3.8+
    
      - having a .mapping attribute in Python 3.10+
        that exposes a mappingproxy to *b._fwdm*.

>>> help(b.values)
values() -> bidict.BidictKeysView[~VT] method of bidict.bidict instance
    A set-like object providing a view on the contained values.
    
    Since the values of a bidict are equivalent to the keys of its inverse,
    this method returns a set-like object for this bidict's values
    rather than just a collections.abc.ValuesView.
    This object supports set operations like union and difference,
    and constant- rather than linear-time containment checks,
    and is no more expensive to provide than the less capable
    collections.abc.ValuesView would be.
    
    See :meth:`keys` for more information.

etc.


Regarding:
> The values() call unexpectedly returns an instance of dict_keys(). At first, I was surprised that this got past the type checker -- you can do set operations with KeysView but not with a ValuesView.

Check out https://github.com/jab/bidict/blob/82f931/bidict/_base.py#L38-L45:

```
class BidictKeysView(t.KeysView[KT], t.ValuesView[KT]):
    """Since the keys of a bidict are the values of its inverse (and vice versa),
    the ValuesView result of calling *bi.values()* is also a KeysView of *bi.inverse*.
    """


dict_keys: t.Type[t.KeysView[t.Any]] = type({}.keys())
BidictKeysView.register(dict_keys)
```

See also https://github.com/python/typeshed/issues/4435 for a request that typeshed use a Protocol (structural typing) here.


Thanks again for taking the time to look at my code and discuss so generously.
History
Date User Action Args
2022-04-11 14:59:51adminsetgithub: 89833
2022-02-10 20:48:44jabsetmessages: + msg413023
2022-01-01 00:25:24rhettingersetmessages: + msg409443
2021-12-31 19:35:56jabsetmessages: + msg409431
2021-10-31 21:43:50rhettingersetmessages: + msg405421
2021-10-31 19:46:24rhettingersetstatus: open -> closed
messages: + msg405417

assignee: rhettinger
resolution: not a bug
stage: resolved
2021-10-29 16:36:50jabcreate