Issue 34586: collections.ChainMap should have a get_where method

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/78767

classification

Title:	collections.ChainMap should have a get_where method
Type:	enhancement	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.8

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:	rhettinger	Nosy List:	Zahari.Dim, rhettinger, serhiy.storchaka
Priority:	normal	Keywords:

Created on 2018-09-05 14:00 by Zahari.Dim, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (9)
msg324632 - (view)	Author: Zahari Dim (Zahari.Dim)	Date: 2018-09-05 14:00
When using ChainMap I have frequently needed to know the mapping inside the list that contains the effective instance of a particular key. I have needed this when using ChainMap to contain a piece of configuration with multiple sources, like for example ``` from mycollections import ChainMap configsources = ["Command line", "Config file", "Defaults"] config = ChainMap(config_from_commandline(), config_from_file(), default_config()) class BadConfigError(Exception): pass def get_key(key): try: index, value = config.get_where(key) except KeyError as e: raise BadConfigError(f"No such key: '{key}'") from e try: result = validate(key, value) except ValidationError as e: raise BadConfigError(f"Key '{key}' defined in {configsources[index] }" f"is invalid: {e}") from e return result ``` I have also needed this when implementing custom DSLs (e.g. specifying which context is a particular construct allowed to see). I think this method would be generally useful for the ChainMap class and moreover the best way of implementing it I can think of is by copying the `__getitem__` method and retaining the index: ``` class ChainMap(collections.ChainMap): def get_where(self, key): for i, mapping in enumerate(self.maps): try: return i, mapping[key] # can't use 'key in mapping' with defaultdict except KeyError: pass return self.__missing__(key) # support subclasses that define __missing__ ``` I'd be happy to write a patch that does just this.
msg324661 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2018-09-06 05:07
I haven't run across this requirement before but it does seem plausible that a person might want to know which underlying mapping found a match (compare with the "which" utility in Bash). On the other hand, we haven't had requests for anything like this for other lookup chains such as determining where a variable appears in the sequence locals-to-nested-scopes-to-globals-to-builtins. Also, I'm not sure I like the proposed API (the method name and signature). Perhaps, this should be a docs recipe for a ChainMap subclass or be an example of a standalone search function that the takes the maps attribute as one of its arguments. Will discuss this with the other core devs to get their thoughts.
msg324681 - (view)	Author: Zahari Dim (Zahari.Dim)	Date: 2018-09-06 10:20
I believe an argument for including this functionality in the standard library is that it facilitates writing better error messages and thus better code. Some results that are returned when one searches for python ChainMap are: - <https://stackoverflow.com/questions/23392976/what-is-the-purpose-of-collections-chainmap> - <http://www.blog.pythonlibrary.org/2016/03/29/python-201-what-is-a-chainmap/> - <http://rahmonov.me/posts/python-chainmap/> All of these mention prominently a layered configuration of some kind. I would argue that all of the examples would benefit from error checking done along the lines of the snippet above. An additional consideration is that the method is best implemented by copying the `__getitem__` method, which, while short, contains a couple of non trivial details. One analog could be `re.search`, which returns an object with information of both the value that is found and its location, though the `span` attribute of the Match object. Maybe the method could be called ChainMap.search? On Thu, Sep 6, 2018 at 6:07 AM Raymond Hettinger <report@bugs.python.org> wrote: > > > Raymond Hettinger <raymond.hettinger@gmail.com> added the comment: > > I haven't run across this requirement before but it does seem plausible that a person might want to know which underlying mapping found a match (compare with the "which" utility in Bash). On the other hand, we haven't had requests for anything like this for other lookup chains such as determining where a variable appears in the sequence locals-to-nested-scopes-to-globals-to-builtins. > > Also, I'm not sure I like the proposed API (the method name and signature). Perhaps, this should be a docs recipe for a ChainMap subclass or be an example of a standalone search function that the takes the maps attribute as one of its arguments. Will discuss this with the other core devs to get their thoughts. > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <https://bugs.python.org/issue34586> > _______________________________________
msg324725 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2018-09-07 08:00
> I would argue that all of the examples would benefit from error > checking done along the lines of the snippet above. ISTM that this is the wrong stage to perform validation of allowable values. That should occur upstream when the underlying mappings are first created. At that earlier stage it possible to give a more timely response to erroneous input and there is access to more information (such as the line and row number of an error in a configuration file). It doesn't make sense to me to defer value validation downstream after a ChainMap instance has been formed and after a successful lookup has occurred. That just complicates the task of tracing back to the root cause. > Maybe the method could be called ChainMap.search? That would be better than get_where().
msg324730 - (view)	Author: Zahari Dim (Zahari.Dim)	Date: 2018-09-07 08:55
> ISTM that this is the wrong stage to perform validation of allowable values. That should occur upstream when the underlying mappings are first created. At that earlier stage it possible to give a more timely response to erroneous input and there is access to more information (such as the line and row number of an error in a configuration file). > > It doesn't make sense to me to defer value validation downstream after a ChainMap instance has been formed and after a successful lookup has occurred. That just complicates the task of tracing back to the root cause. This is certainly the case in the situation where the validation only depends on the value of the corresponding configuration entry, as it admittedly does in the example above. However the example was oversimplified insofar non trivial validation depends on the whole ensemble configuration settings. For example taking the example described at the top of <http://rahmonov.me/posts/python-chainmap/> I think it would be useful to have an error message of the form: f"User '{db_username}', defined in {configsetttings[user_index]} is not found in database '{database}', defined in {configsettings[database_index]}' > > > Maybe the method could be called ChainMap.search? > > That would be better than get_where(). > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <https://bugs.python.org/issue34586> > _______________________________________
msg324834 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2018-09-08 12:15
I concur with Raymond. The purpose of ChainMap is providing a mapping that hides the implementation detail of using several mappings as fallbacks. If you don't want to hide the implementation detail, you don't need to use ChainMap. ChainMap exposes underlying mappings as the maps attribute, so you can use this implementation detail if you know that it is a ChainMap an not a general mapping. It is easy to write a code for searching what mapping contains the specified key. for m in cm.maps: if key in m: found = m break else: # raise an error or set a default, # what is appropriate for your concrete case or even found = next(m for m in cm.maps if key in m) if the key is always found or StopIteration is appropriate. I don't think that such trivial snatch of code is worth adding a special ChainMap method or even documenting it explicitly in the ChainMap documentation.
msg324842 - (view)	Author: Zahari Dim (Zahari.Dim)	Date: 2018-09-08 14:00
On Sat, Sep 8, 2018 at 1:15 PM Serhiy Storchaka <report@bugs.python.org> wrote: > > > Serhiy Storchaka <storchaka+cpython@gmail.com> added the comment: > > I concur with Raymond. The purpose of ChainMap is providing a mapping that hides the implementation detail of using several mappings as fallbacks. If you don't want to hide the implementation detail, you don't need to use ChainMap. > > ChainMap exposes underlying mappings as the maps attribute, so you can use this implementation detail if you know that it is a ChainMap an not a general mapping. It is easy to write a code for searching what mapping contains the specified key. I don't know where the idea that the underlying mappings are an implementation detail comes from. It certainly isn't from the documentation, which mentions uses such as nested scopes and templates, which cannot be attained with a single mapping. It also doesn't match my personal usage, where as discussed, even the simpler cases benefit from information on the underlying mappings. It is a surprising claim to make given than the entirety of the public interface specific to ChainMap (maps, new_child and parents) deals with the fact that there is more structure than one mapping. I also have a hard time discerning this idea from Raymond's messages. > > for m in cm.maps: > if key in m: > found = m > break > else: > # raise an error or set a default, > # what is appropriate for your concrete case This "trivial snatch of code" contains at least two issues that make it fail in situations where the actual implementation of `__getitem__` would work, opening the door for hard to diagnose corner cases. If anything, in my opinion, the fact that this code is being proposed as an alternative reinforces the idea that the implementation of the searching method should be in the standard library.
msg324964 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2018-09-11 00:28
I've discussed this with other core devs and spent a good deal of time evaluating this proposal. I'm going to pass on the this one but do think it was a inspired suggestion. Thank you for the proposal. ---------- Note, the original get_where() recipe has an issue. Upon successful lookup, it returns a 2-tuple but on failure it calls __missing__ which typically returns a scalar (if it doesn't raise an exception). The proposed use case is plausible but rare. As Serhiy mentioned, it is also reasonable for the user to write their own loops over the maps attribute. That loop can have subtleties but it is better to let the user choose between try/except semantics and if-else semantics according to the use case. FWIW, I tried out several variants of the proposal including "cm.index(key) -> int" and "cm.search(key) -> mapping". Each of the variants had some awkwardness when trying to use it there didn't seem to be a variant that was a net win.
msg325056 - (view)	Author: Zahari Dim (Zahari.Dim)	Date: 2018-09-11 20:21
> > I've discussed this with other core devs and spent a good deal of time evaluating this proposal. I'm going to pass on the this one but do think it was a inspired suggestion. Thank you for the proposal. Thank you for taking the time to consider it. I understand that there are many proposals. > > ---------- > > Note, the original get_where() recipe has an issue. Upon successful lookup, it returns a 2-tuple but on failure it calls __missing__ which typically returns a scalar (if it doesn't raise an exception). FWIW this was intended to work when `__missing__` was subclassed to raise a more specific exception. The case where it is made to return a value clearly doesn't play well with the proposed method, and would likely need to be subclassed as well. I consider this an acceptable trade off because I find this use case rather esoteric: the same functionality could be achieved in an arguably clearer way by passing a mapping with the desired missing semantics as the outermost scope.

History
Date	User	Action	Args
2022-04-11 14:59:05	admin	set	github: 78767
2018-09-11 20:21:27	Zahari.Dim	set	messages: + msg325056
2018-09-11 00:28:56	rhettinger	set	status: open -> closed resolution: rejected messages: + msg324964 stage: resolved
2018-09-08 14:00:26	Zahari.Dim	set	messages: + msg324842
2018-09-08 12:15:54	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg324834
2018-09-07 08:55:35	Zahari.Dim	set	messages: + msg324730
2018-09-07 08:00:45	rhettinger	set	messages: + msg324725
2018-09-06 10:20:59	Zahari.Dim	set	messages: + msg324681
2018-09-06 05:07:10	rhettinger	set	messages: + msg324661
2018-09-05 15:29:01	rhettinger	set	assignee: rhettinger nosy: + rhettinger
2018-09-05 14:00:10	Zahari.Dim	create