classification
Title: collections.ChainMap should have a get_where method
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.8
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Zahari.Dim, rhettinger, serhiy.storchaka
Priority: normal Keywords:

Created on 2018-09-05 14:00 by Zahari.Dim, last changed 2018-09-11 20:21 by Zahari.Dim. This issue is now closed.

Messages (9)
msg324632 - (view) Author: Zahari Dim (Zahari.Dim) Date: 2018-09-05 14:00
When using ChainMap I have frequently needed to know the mapping inside the list
that contains the effective instance of a particular key. I have needed this
when using ChainMap to contain a piece of configuration with multiple sources,
like for example

```
from mycollections import ChainMap
configsources = ["Command line", "Config file", "Defaults"]
config = ChainMap(config_from_commandline(), config_from_file(),
                  default_config())

class BadConfigError(Exception): pass
def get_key(key):
    try:
        index, value = config.get_where(key)
    except KeyError as e:
        raise BadConfigError(f"No such key: '{key}'") from e
    try:
        result = validate(key, value)
    except ValidationError as e:
        raise BadConfigError(f"Key '{key}' defined in {configsources[index] }"
                             f"is invalid: {e}") from e
    return result
```

I have also needed this when implementing custom DSLs (e.g. specifying which
context is a particular construct allowed to see).

I think this method would be generally useful for the ChainMap class and
moreover the best way of implementing it I can think of is  by copying the
`__getitem__` method and retaining the index:

```
class ChainMap(collections.ChainMap):
    def get_where(self, key):
        for i, mapping in enumerate(self.maps):
            try:
                return i, mapping[key]             # can't use 'key in mapping' with defaultdict
            except KeyError:
                pass
        return self.__missing__(key)            # support subclasses that define __missing__
```

I'd be happy to write a patch that does just this.
msg324661 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-09-06 05:07
I haven't run across this requirement before but it does seem plausible that a person might want to know which underlying mapping found a match (compare with the "which" utility in Bash). On the other hand, we haven't had requests for anything like this for other lookup chains such as determining where a variable appears in the sequence locals-to-nested-scopes-to-globals-to-builtins.

Also, I'm not sure I like the proposed API (the method name and signature).  Perhaps, this should be a docs recipe for a ChainMap subclass or be an example of a standalone search function that the takes the *maps* attribute as one of its arguments.  Will discuss this with the other core devs to get their thoughts.
msg324681 - (view) Author: Zahari Dim (Zahari.Dim) Date: 2018-09-06 10:20
I believe an argument for including this functionality in the standard library
is that it facilitates writing better error messages and thus better code. Some
results that are returned when one searches for *python ChainMap* are:

  - <https://stackoverflow.com/questions/23392976/what-is-the-purpose-of-collections-chainmap>
  - <http://www.blog.pythonlibrary.org/2016/03/29/python-201-what-is-a-chainmap/>
  - <http://rahmonov.me/posts/python-chainmap/>

All of these mention prominently a layered configuration of some kind. I would
argue that all of the examples would benefit from error checking done along the
lines of the snippet above.

An additional consideration is that the method is best implemented by copying
the `__getitem__` method, which, while short, contains a couple of non trivial
details.

One analog could be `re.search`, which returns an object with information of
both the value that is found and its location, though the `span` attribute of
the Match object. Maybe the method could be called ChainMap.search?
On Thu, Sep 6, 2018 at 6:07 AM Raymond Hettinger <report@bugs.python.org> wrote:
>
>
> Raymond Hettinger <raymond.hettinger@gmail.com> added the comment:
>
> I haven't run across this requirement before but it does seem plausible that a person might want to know which underlying mapping found a match (compare with the "which" utility in Bash). On the other hand, we haven't had requests for anything like this for other lookup chains such as determining where a variable appears in the sequence locals-to-nested-scopes-to-globals-to-builtins.
>
> Also, I'm not sure I like the proposed API (the method name and signature).  Perhaps, this should be a docs recipe for a ChainMap subclass or be an example of a standalone search function that the takes the *maps* attribute as one of its arguments.  Will discuss this with the other core devs to get their thoughts.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue34586>
> _______________________________________
msg324725 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-09-07 08:00
> I would argue that all of the examples would benefit from error
> checking done along the lines of the snippet above.

ISTM that this is the wrong stage to perform validation of allowable values.  That should occur upstream when the underlying mappings are first created.  At that earlier stage it possible to give a more timely response to erroneous input and there is access to more information (such as the line and row number of an error in a configuration file).  

It doesn't make sense to me to defer value validation downstream after a ChainMap instance has been formed and after a successful lookup has occurred. That just complicates the task of tracing back to the root cause.

> Maybe the method could be called ChainMap.search?

That would be better than get_where().
msg324730 - (view) Author: Zahari Dim (Zahari.Dim) Date: 2018-09-07 08:55
> ISTM that this is the wrong stage to perform validation of allowable values.  That should occur upstream when the underlying mappings are first created.  At that earlier stage it possible to give a more timely response to erroneous input and there is access to more information (such as the line and row number of an error in a configuration file).
>
> It doesn't make sense to me to defer value validation downstream after a ChainMap instance has been formed and after a successful lookup has occurred. That just complicates the task of tracing back to the root cause.

This is certainly the case in the situation where the validation only
depends on the value of the corresponding configuration entry, as it
admittedly does in the example above. However the example was
oversimplified insofar non trivial validation depends on the whole
ensemble configuration settings. For example taking the example
described at the top of
<http://rahmonov.me/posts/python-chainmap/>
I think it would be useful to have an error message of the form:
f"User '{db_username}', defined in {configsetttings[user_index]} is
not found in database '{database}', defined in
{configsettings[database_index]}'

>
> > Maybe the method could be called ChainMap.search?
>
> That would be better than get_where().
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue34586>
> _______________________________________
msg324834 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-09-08 12:15
I concur with Raymond. The purpose of ChainMap is providing a mapping that hides the implementation detail of using several mappings as fallbacks. If you don't want to hide the implementation detail, you don't need to use ChainMap.

ChainMap exposes underlying mappings as the maps attribute, so you can use this implementation detail if you know that it is a ChainMap an not a general mapping. It is easy to write a code for searching what mapping contains the specified key.

    for m in cm.maps:
        if key in m:
            found = m
            break
    else:
        # raise an error or set a default,
        # what is appropriate for your concrete case

or even

    found = next(m for m in cm.maps if key in m)

if the key is always found or StopIteration is appropriate.

I don't think that such trivial snatch of code is worth adding a special ChainMap method or even documenting it explicitly in the ChainMap documentation.
msg324842 - (view) Author: Zahari Dim (Zahari.Dim) Date: 2018-09-08 14:00
On Sat, Sep 8, 2018 at 1:15 PM Serhiy Storchaka <report@bugs.python.org> wrote:
>
>
> Serhiy Storchaka <storchaka+cpython@gmail.com> added the comment:
>
> I concur with Raymond. The purpose of ChainMap is providing a mapping that hides the implementation detail of using several mappings as fallbacks. If you don't want to hide the implementation detail, you don't need to use ChainMap.
>
> ChainMap exposes underlying mappings as the maps attribute, so you can use this implementation detail if you know that it is a ChainMap an not a general mapping. It is easy to write a code for searching what mapping contains the specified key.

I don't know where the idea that the underlying mappings are an
implementation detail comes from. It certainly isn't from the
documentation, which mentions uses such as nested scopes and
templates, which cannot be attained with a single mapping. It also
doesn't match my personal usage, where as discussed, even the simpler
cases benefit from information on the underlying mappings. It is a
surprising claim to make given than the entirety of the public
interface specific to ChainMap (maps, new_child and parents) deals
with the fact that there is more structure than one mapping. I also
have a hard time discerning this idea from Raymond's messages.

>
>     for m in cm.maps:
>         if key in m:
>             found = m
>             break
>     else:
>         # raise an error or set a default,
>         # what is appropriate for your concrete case

This "trivial snatch of code" contains at least two issues that make
it fail in situations where the actual implementation of `__getitem__`
would work, opening the door for hard to diagnose corner cases. If
anything, in my opinion, the fact that this code is being proposed as
an alternative reinforces the idea that the implementation of the
searching method should be in the standard library.
msg324964 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-09-11 00:28
I've discussed this with other core devs and spent a good deal of time evaluating this proposal.  I'm going to pass on the this one but do think it was a inspired suggestion.  Thank you for the proposal.

----------

Note, the original get_where() recipe has an issue.  Upon successful lookup, it returns a 2-tuple but on failure it calls __missing__ which typically returns a scalar (if it doesn't raise an exception).

The proposed use case is plausible but rare.  As Serhiy mentioned, it is also reasonable for the user to write their own loops over the *maps* attribute.  That loop can have subtleties but it is better to let the user choose between try/except semantics and if-else semantics according to the use case.

FWIW, I tried out several variants of the proposal including "cm.index(key) -> int" and "cm.search(key) -> mapping".  Each of the variants had some awkwardness when trying to use it there didn't seem to be a variant that was a net win.
msg325056 - (view) Author: Zahari Dim (Zahari.Dim) Date: 2018-09-11 20:21
>
> I've discussed this with other core devs and spent a good deal of time evaluating this proposal.  I'm going to pass on the this one but do think it was a inspired suggestion.  Thank you for the proposal.

Thank you for taking the time to consider it. I understand that there
are many proposals.

>
> ----------
>
> Note, the original get_where() recipe has an issue.  Upon successful lookup, it returns a 2-tuple but on failure it calls __missing__ which typically returns a scalar (if it doesn't raise an exception).

FWIW this was intended to work when `__missing__` was subclassed to
raise a more specific exception. The case where it is made to return a
value clearly doesn't play well with the proposed method, and would
likely need to be subclassed as well. I consider this an acceptable
trade off because I find this use case rather esoteric: the same
functionality could be achieved in an arguably clearer way by passing
a mapping with the desired missing semantics as the outermost scope.
History
Date User Action Args
2018-09-11 20:21:27Zahari.Dimsetmessages: + msg325056
2018-09-11 00:28:56rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg324964

stage: resolved
2018-09-08 14:00:26Zahari.Dimsetmessages: + msg324842
2018-09-08 12:15:54serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg324834
2018-09-07 08:55:35Zahari.Dimsetmessages: + msg324730
2018-09-07 08:00:45rhettingersetmessages: + msg324725
2018-09-06 10:20:59Zahari.Dimsetmessages: + msg324681
2018-09-06 05:07:10rhettingersetmessages: + msg324661
2018-09-05 15:29:01rhettingersetassignee: rhettinger

nosy: + rhettinger
2018-09-05 14:00:10Zahari.Dimcreate