classification
Title: Dictionary union. (PEP 584)
Type: enhancement Stage: patch review
Components: Interpreter Core Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Aaron Hall, Raps Uk, brandtbucher, chaburkland, curtisbucher, gvanrossum, josh.r, justjais, mark.dickinson, rhettinger, scoder, serhiy.storchaka, slam, steve.dower, xtreak
Priority: normal Keywords: patch

Created on 2019-02-28 04:18 by brandtbucher, last changed 2020-07-07 05:39 by Raps Uk. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 12088 merged brandtbucher, 2019-02-28 04:19
PR 18659 merged brandtbucher, 2020-02-25 19:19
PR 18729 merged brandtbucher, 2020-03-02 05:36
PR 18814 merged brandtbucher, 2020-03-06 19:13
PR 18832 merged curtisbucher, 2020-03-07 19:38
PR 18911 merged chaburkland, 2020-03-11 02:34
PR 18931 merged brandtbucher, 2020-03-11 16:27
PR 18967 merged brandtbucher, 2020-03-12 19:14
PR 19106 merged curtisbucher, 2020-03-21 23:00
PR 19127 merged curtisbucher, 2020-03-23 22:15
PR 19221 merged curtisbucher, 2020-03-30 00:12
Messages (66)
msg336798 - (view) Author: Brandt Bucher (brandtbucher) * (Python triager) Date: 2019-02-28 04:18
...as discussed in python-ideas. Semantically:

d1 + d2 <-> d3 = d1.copy(); d3.update(d2); d3
d1 += d2 <-> d1.update(d2)

Attached is a working implementation with new/fixed tests for consideration. I've also updated collections.UserDict with the new __add__/__radd__/__iadd__ methods.
msg336803 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-02-28 06:33
I believe that Guido rejected this when it was proposed a few years ago.
msg336808 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2019-02-28 07:07
Python ideas discussion in 2015 : https://mail.python.org/pipermail/python-ideas/2015-February/031748.html
LWN summary : https://lwn.net/Articles/635397/
msg336810 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-02-28 07:29
I believe it was proposed and rejected multiple times.
msg336811 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-02-28 07:36
For the record, I'm opposed to the idea.

* Use of the + operator is a temptation to produce new dictionaries rather than update an existing dict in-place which is usually what you want.

* We already have ChainMap() which presents a single view of multiple mappings with any copying.

* It is natural to expect the plus operator to be commutative, but this operation would necessarily be non-commutative.

* Many other APIs are modeled on the dict API, so we should not grow the API unless there is a big win.  The effects would be pervasive.

* I don't see other languages going down this path, nor am I seeing dict subclasses that implement this functionality.  Those are indications that this more of a "fun thing we could do" rather than a "thing that people need".

* The existing code already reads nicely:

     options.update(user_selections)

  That reads more like self explanatory English than:

     options += user_selections

  The latter takes more effort to correctly parse and
  makes it less clear that you're working with dicts.

* It isn't self-evident that the right operand needs to be another dictionary. If a person is trying to "add a key / value pair" to an existing dictionary, the "+=" operator would be tempting but it wouldn't work.
msg336812 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-02-28 07:52
> * It is natural to expect the plus operator to be commutative, but this operation would necessarily be non-commutative.

In Python, the plus operator for sequences (strings, lists, tuples) is non-commutative.

But I have other arguments against it:

* It conflicts with the plus operator of Counter (which is a specialized dict): Counter(a=2) + Counter(a=3) == Counter(a=5), but the proposed idea makes dict(a=2) + dict(a=3) == dict(a=3).

* We already have a syntax for dict merging: {**d1, **d2}. It works with arbitrary mappings, in contrary to the plus operator, which needs a special support in argument types.
msg336816 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-02-28 08:23
> In Python, the plus operator for sequences (strings, lists, 
> tuples) is non-commutative.

For sequences, that is obvious and expected, but not so much with mappings where the order of overlapping keys is determined by the left operand and the value associated with those keys is determined by the right operand.

Also with sequences the + operator actually means "add to", but with dictionaries it means "add/or replace" which is contrary to the normal meaning of plus.  I think that was one of Guido's reasons for favoring "|" instead of "+" for set-to-set operations.

> We already have a syntax for dict merging: {**d1, **d2}. 
> It works with arbitrary mappings,

This is a good point.
msg336820 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-02-28 09:20
> We already have a syntax for dict merging: {**d1, **d2}. 

Which doesn't mean that "d1 + d2" isn't much more intuitive than this special-character heavy version. It takes me a while to see the dict merge under that heap of stars. And that's already the shortest example.


> It works with arbitrary mappings,

The RHS of "d += M" doesn't have to be a dict IMHO, it could be any mapping. And even "dict(X) + M" doesn't look all too bad to me, even though there's "dict(X, **M)".


> Use of the + operator is a temptation to produce new dictionaries rather than update an existing dict in-place which is usually what you want.

That's why there would be support for "+=". The exact same argument already fails for lists, where concatenation is usually much more performance critical than for the average little dict. (And remember that most code isn't performance critical at all.)


> We already have ChainMap() which presents a single view of multiple mappings with any copying.

Which is a different use case that is unlikely to go away with this proposal.


> makes it less clear that you're working with dicts.

This is a valid argument, although it always depends on the concrete code what the most readable way to express its intentions is. Again, this doesn't really differ for lists.

Let's wait for the PEP, I'd say.
msg336847 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2019-02-28 16:21
scoder: dict(X, **M) is broken unless M is known to be string keyed (it used to work, but in Python 3, it will raise a TypeError). It's part of the argument for the additional unpacking generalizations from PEP 448; {**X, **M} does what dict(X, **M) is trying to do, but without abusing the keyword argument passing convention.

You also claim "It takes me a while to see the dict merge under that heap of stars", but that's at least as much about the newness of PEP 448 (and for many Python coders, a complete lack of familiarity with the pre-existing varargs unpacking rules for functions) as it is about the punctuation; after all, you clearly recognize dict(X, **M) even though it's been wrong in most contexts for years.

In any event, I'm a strong -1 on this, for largely the same reasons as Raymond and others:

1. It doesn't provide any new functionality, just one more way to do it; += is satisfied by .update, + is satisfied (more generally and efficiently) by the unpacking generalizations

2. It's needlessly confusing; addition is, for all existing types in the standard library I can think of, lossless; the information from both sides of the + is preserved in some form, either by addition or concatenation (and in the concatenation case, addition is happening, just to the length of the resulting sequence, and order is preserved). Addition for dictionaries would introduce new rules specific to dicts that do not exist for any other type regarding loss of values, non-additive resulting length, etc. Those rules would likely be similar to those of dict literals and the update method, but they'd need to be made explicit. By contrast, the PEP 448 unpacking generalization rules followed the existing rules for dict literals; no special rules occur, it just behaves intuitively (if you already knew the rules for dict literals without unpacking being involved).

3. Almost any generic, duck-typing based code for which addition makes sense will not make sense for dicts simply because it loosens the definition of addition too much to be useful, so best case, it still raises TypeError (when dicts added to non-dict things), worst case, it silently operates in a way that violates the rules of both addition and concatenation rather than raising a TypeError that the generic code could use to determine the correct thing to do.

4. The already mentioned conflict with Counter (which already has an addition operator, with lossless semantics)

5. (Minor) It means PyDict_Type needs a non-NULL tp_as_number, so now it's slightly slower to reject dicts as being non-numeric at the C layer

Problem #2 could be used to argue for allowing | instead of + (which would also resolve #4, and parts of #3), since | is already used for unioning with sets, and this operation is much closer to a union operation than addition or concatenation. Even so, it would still be misleading; at least with sets, there is no associated value, so it's still mostly lossless (you lose the input lengths, but the unique input data is kept); with dicts, you'd be losing values too.

Basically, I think the PEP 448 unpacking syntax should remain as the "one-- and preferably only one --obvious way to" combine dictionaries as a one-liner. It's more composable, since it allows adding arbitrary additional key/value pairs, and more efficient, since it allows combining more than two dicts at once with no additional temporaries: dicta + dictb + dictc requires "dictab" to be made first, then thrown away after dictab + dictc produces dictabc, while {**dicta, **dictb, **dictc} builds dictabc directly.

The only real argument I can see for not sticking to unpacking is that it doesn't allow for arbitrary dict-like things to produce new dict-like things directly; you'd have to rewrap as myspecialdict({**speciala, **specialb}). But I don't think that's a flaw worth fixing if it means major changes to the behavior of what I'm guessing is one of the three most commonly used types in Python (along with int and tuple, thanks to the integration of dicts into so many facets of the implementation).
msg336848 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2019-02-28 16:34
I changed my mind and am now in favor. Most of the arguments against could also be used against list+list. Counter addition is actually a nice special case of this -- it produces the same keys but has a more sophisticated way of merging values for common keys. Please read the python-ideas thread!
msg336849 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2019-02-28 16:42
Also note: That Python ideas thread that xtreak linked ( https://mail.python.org/pipermail/python-ideas/2015-February/031748.html ) largely rejected the proposal a couple weeks before PEP 448 was approved. At the time, the proposal wasn't just about +/+=; that was the initial proposal, but operator overloading was heavily criticized for the failure to adhere to either addition or concatenation semantics, so alternate constructors and top-level functions similar to sorted were proposed as alternatives (e.g. merged(dicta, dictb)). The whole thread ended up being about creating an approved, built-in way of one-lining: d3 = d1.copy(); d3.update(d2)

A key quote though is that this was needed because there was no other option without rolling your own merged function. Andrew Barnert summarized it best:

"I'm +1 on constructor, +0.5 on a function (whether it's called updated or merged, whether it's in builtins or collections), +0.5 on both constructor and function, -0.5 on a method, and -1 on an operator.

"Unless someone is seriously championing PEP 448 for 3.5, in which case I'm -0.5 on anything, because it looks like PEP 448 would already give us one obvious way to do it, and none of the alternatives are sufficiently nicer than that way to be worth having another."

As it happens, PEP 448 was put in 3.5, and we got the one obvious way to do it.

Side-note: It occurs to me there will be one more "way to do it" in 3.8 already, thanks to PEP 572:

(d3 := d1.copy()).update(d2)

I think I'll stick with d3 = {**d1, **d2} though. :-)
msg336854 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2019-02-28 17:30
Current python-ideas thread for the issue : https://mail.python.org/pipermail/python-ideas/2019-February/055509.html
msg337094 - (view) Author: Viktor Kharkovets (slam) * Date: 2019-03-04 12:09
If we're going to forget about commutativity of +, should we also implement +/+= for sets?
msg337107 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-03-04 13:05
> should we also implement +/+= for sets?

The question is: what would that do? The same as '|=' ? That would be rather confusing, I think. "|" (meaning: "or") seems a very natural operation for sets, in the same way that "|" operates on bits in integers. That suggests that "|" is the right operator for sets.

In any case, this is an unrelated proposal that is better not discussed in this ticket. The only link is whether "|" is the more appropriate operator also for dicts, which is to be discussed in the PEP and thus also not in this ticket.
msg337266 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-06 01:54
Is this issue directly or indirectly related to the PEP 584 "Add + and - operators to the built-in dict class"?
https://www.python.org/dev/peps/pep-0584/
msg337267 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-06 01:55
> Is this issue directly or indirectly related to the PEP 584 "Add + and - operators to the built-in dict class"?
> https://www.python.org/dev/peps/pep-0584/

Ah yes, it's written in the title of the PR. I add it to the bug title as well.
msg358305 - (view) Author: Aaron Hall (Aaron Hall) * Date: 2019-12-12 21:16
Another obvious way to do it, but I'm +1 on it.

A small side point however - PEP 584 reads:

> To create a new dict containing the merged items of two (or more) dicts, one can currently write:

> {**d1, **d2}

> but this is neither obvious nor easily discoverable. It is only guaranteed to work if the keys are all strings. If the keys are not strings, it currently works in CPython, but it may not work with other implementations, or future versions of CPython[2].

...

> [2] Non-string keys: https://bugs.python.org/issue35105 and https://mail.python.org/pipermail/python-dev/2018-October/155435.html

The references cited does not back this assertion up. Perhaps the intent is to reference the "cool/weird hack" dict(d1, **d2) (see https://mail.python.org/pipermail/python-dev/2010-April/099485.html and https://mail.python.org/pipermail/python-dev/2010-April/099459.html), which allowed any hashable keys in Python 2 but only strings in Python 3.

If I see {**d1, **d2}, my expectations are that this is the new generalized unpacking and I currently expect any keys to be allowed, and the PEP should be updated to accurately reflect this to prevent future misunderstandings.
msg362619 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-02-25 03:46
PEP 584 has been approved by the Steering Council (at my recommendation). We will shortly begin landing PRs related to this.
msg362620 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-02-25 03:47
New changeset eb8ac57af26c4eb96a8230eba7492ce5ceef7886 by Brandt Bucher in branch 'master':
bpo-36144: Dictionary Union (PEP 584) (#12088)
https://github.com/python/cpython/commit/eb8ac57af26c4eb96a8230eba7492ce5ceef7886
msg362621 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-02-25 03:49
While the main code has been merged now, I propose to keep this issue open until some other things have happened:

- Documentation
- Add | operators to some dict subclasses in the stdlib
- (What else?)
msg362626 - (view) Author: Brandt Bucher (brandtbucher) * (Python triager) Date: 2020-02-25 06:29
My current PR plans are:

- Docs. This will include the dict docs and the whatsnew 3.9. I assume we have no plans to cover this in the tutorials, etc. Let me know if I'm missing anything here.
- collections.defaultdict, with tests. I don't think this needs docs beyond a short "changed in version 3.9" note.
- collections.OrderedDict, with tests. Ditto defaultdict on docs.
- collections.ChainMap, ditto.
- types.MappingProxy, ditto.

I'll also create a BPO issue to discuss whether the dict subclasses in http.cookies should be updated.

That should do it for CPython; I'm planning on updating typeshed and adding a handful of tests to mypy for TypedDict, etc. after these are landed.

Guido, okay to tag you on these for review?
msg362645 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-02-25 15:07
Yup, great plan.

On Mon, Feb 24, 2020 at 22:29 Brandt Bucher <report@bugs.python.org> wrote:

>
> Brandt Bucher <brandtbucher@gmail.com> added the comment:
>
> My current PR plans are:
>
> - Docs. This will include the dict docs and the whatsnew 3.9. I assume we
> have no plans to cover this in the tutorials, etc. Let me know if I'm
> missing anything here.
> - collections.defaultdict, with tests. I don't think this needs docs
> beyond a short "changed in version 3.9" note.
> - collections.OrderedDict, with tests. Ditto defaultdict on docs.
> - collections.ChainMap, ditto.
> - types.MappingProxy, ditto.
>
> I'll also create a BPO issue to discuss whether the dict subclasses in
> http.cookies should be updated.
>
> That should do it for CPython; I'm planning on updating typeshed and
> adding a handful of tests to mypy for TypedDict, etc. after these are
> landed.
>
> Guido, okay to tag you on these for review?
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue36144>
> _______________________________________
>
-- 
--Guido (mobile)
msg362660 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-02-25 20:05
Not sure if this is a big deal or not, and it seems likely that the preexisting behaviour of .update() and ** unpacking have already decided it, but is it intentional that you end up with the first-seen key and the last-seen value in the case of collisions?

class C:
    def __init__(self, *a): self.a = a
    def __hash__(self): return hash(self.a[0])
    def __eq__(self, o): return self.a[0] == o.a[0]
    def __repr__(self): return f"C{self.a}"

>>> c1 = C(1, 1); c1
C(1, 1)
>>> c2 = C(1, 2); c2
C(1, 2)

For set union we get the first seen value:
>>> {c1} | {c2}
{C(1, 1)}

For dict union we get the first seen key and the last seen value:
>>> {c1: 'a'} | {c2: 'b'}
{C(1, 1): 'b'}

But similarly for dict unpack (and .update(); code left as an exercise to the reader):
>>> {**{c1: 'a'}, **{c2: 'b'}}
{C(1, 1): 'b'}

So the union of two dicts may contain .items() elements that were not in either of the inputs.

Honestly, I've never noticed this before, as the only time I create equivalent objects with meaningfully-distinct identities is to use with sets. I just figured I'd try it out after seeing suggestions that the dict union operands were transposed from set union.
msg362663 - (view) Author: Brandt Bucher (brandtbucher) * (Python triager) Date: 2020-02-25 20:19
As a somewhat simpler example:

>>> f = {False: False}
>>> z = {0: 0}
>>> f | z
{False: 0}
>>> {**f, **z}
{False: 0}
>>> f.update(z); f
{False: 0}

Though these hairier cases aren't explicitly addressed, the conflict behavior is covered in the Rationale and Reference Implementation sections of the PEP. All of the above examples share code (`dict_update_arg`), and that's definitely intentional. I for one think it would be confusing (and probably a bug) if one of the examples above gave a different key-value pair!

I find it makes more sense if you see a set as valueless keys (rather than keyless values).
msg362666 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-02-25 20:42
That's a much simpler example. And of course:

>>> z[False] = False
>>> z
{0: False}

So the precedent is well established that the key doesn't get updated with the value.

No further questions, yer honour ;)
msg362726 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-02-26 20:01
New changeset d0ca9bd93bb9d8d4aa9bbe939ca7fd54ac870c8f by Brandt Bucher in branch 'master':
bpo-36144: Document PEP 584 (GH-18659)
https://github.com/python/cpython/commit/d0ca9bd93bb9d8d4aa9bbe939ca7fd54ac870c8f
msg362729 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-02-26 20:03
@Brandt: you have some more followup PRs planned right? Let's keep this issue open until you've done all of those.
msg362740 - (view) Author: Brandt Bucher (brandtbucher) * (Python triager) Date: 2020-02-26 21:20
Yep. I'm currently working on OrderedDict, defaultdict, and MappingProxyType.

My brother is looking to make his first contribution, so he'll be taking care of ChainMap.
msg362757 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2020-02-27 00:29
What is ChainMap going to do? Normally, the left-most argument to ChainMap is the "top level" dict, but in a regular union scenario, last value wins. 

Seems like layering the right hand side's dict on top of the left hand side's would match dict union semantics best, but it feels... wrong, given ChainMap's normal left-to-right precedence. And top-mostness affects which dict receives all writes, so if  chain1 |= chain2 operates with dict-like precedence (chain2 layers over chain1), then that also means the target of writes/deletions/etc. changes to what was on top in chain2.
msg362759 - (view) Author: Brandt Bucher (brandtbucher) * (Python triager) Date: 2020-02-27 00:56
The plan is to follow dict’s semantics. The |= operator will basically delegate to the first map in the chain. The | operator will create a new ChainMap where the first map is the merged result of the old first map, and the others are the same.

So, basically update / copy-and-update, respectively.
msg362761 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2020-02-27 01:26
Sorry, I think I need examples to grok this in the general case. ChainMap unioned with dict makes sense to me (it's equivalent to update or copy-and-update on the top level dict in the ChainMap). But ChainMap unioned with another ChainMap is less clear. Could you give examples of what the expected end result is for:

    d1 = {'a': 1, 'b': 2}
    d2 = {'b': 3, 'c': 4}
    d3 = {'a': 5, 'd': 6}
    d4 = {'d': 7, 'e': 8}
    cm1 = ChainMap(d1, d2)
    cm2 = ChainMap{d3, d4)

followed by either:

    cm3 = cm1 | cm2

or
    cm1 |= cm2

? As in, what is the precise state of the ChainMap cm3 or the mutated cm1, referencing d1, d2, d3 and d4 when they are still incorporated by references in the chain?

My impression from what you said is that the plan would be for the updated cm1 to preserve references to d1 and d2 only, with the contents of cm2 (d3 and d4) effectively flattened and applied as an in-place update to d1, with an end result equivalent to having done:

    cm1 = ChainMap(d1, d2)
    d1 |= d4
    d1 |= d3

(except the key ordering would actually follow d3 first, and d4 second), while cm3 would effectively be equivalent to having done (note ordering):

    cm3 = ChainMap(d1 | d4 | d3, d2)

though again, key ordering would be based on d1, then d3, then d4, not quite matching the union behavior. And a reference to d2 would be preserved in the final result, but not any other original dict. Is that correct? If so, it seems like it's wasting ChainMap's key feature (lazy accumulation of maps), where:

    cm1 |= cm2

could be equivalent to either:

    cm1.maps += cm2.maps

though that means cm1 wins overlaps, where normal union would have cm2 win, or to hew closer to normal union behavior, make it equivalent to:

    cm1.map[:0] = cm2.maps

prepending all of cm2's maps to have the same duplicate handling rules as regular dicts (right side wins) at the expense of changing which map cm1 uses as the target for writes and deletes. In either case it would hew to the spirit of ChainMap, making dict "union"-ing an essentially free operation, in exchange for increasing the costs of lookups that don't hit the top dict.
msg362774 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-02-27 05:59
I think for `|=` the only choice is for it to be essentially an alias to `.update()`. So that means `cm |= other` becomes `cm.maps[0].update(other)`.

For `|` we are breaking new ground and we could indeed make `cm | other` do something like `ChainMap(other, *cm.maps)`.

I've not used ChainMap much (though I've seen some code that uses it) so I'm probably not the best judge of whether this is a good feature to have.

Note that `other | cm` will just do whatever `other.__or__` does, since ChainMap isn't a true subclass of dict, so it will not fall back to `cm.__ror__`. Basically ChainMap will not get control in this case.

Other thoughts:

- Maybe `cm1 | cm2` (both ChainMaps) ought to return `ChainMap(*cm2.maps, *cm1.maps)`?

- These semantics make `|=` behave rather differently from `|`. Is that okay? If not, which of them should change, and how?
msg362779 - (view) Author: Brandt Bucher (brandtbucher) * (Python triager) Date: 2020-02-27 07:04
> I think for `|=` the only choice is for it to be essentially an alias to `.update()`. So that means `cm |= other` becomes `cm.maps[0].update(other)`.

Agreed.

> These semantics make `|=` behave rather differently from `|`. Is that okay? If not, which of them should change, and how?

I don’t like this. Let me try to explain why:

So far (and to the best of my knowledge), setting and updating values on a ChainMap works exactly the same as it does for dict, with all of the same semantics (the docs themselves even say that “all of the usual dictionary methods are supported”… which now could be interpreted as meaning | and |= as well). It’s only when deleting or using the new interfaces that things get more specialized.

But that doesn’t really apply here. Having different (or worse, inconsistent) behavior for these operators, I feel, would be more confusing than helpful. Remember, a major goal of this proposal is to aid in duck typing.

So, Josh’s understanding of my intended semantics is correct, I propose that, for:

    d1 = {'a': 1, 'b': 2}
    d2 = {'b': 3, 'c': 4}
    d3 = {'a': 5, 'd': 6}
    d4 = {'d': 7, 'e': 8}
    cm1 = ChainMap(d1, d2)
    cm2 = ChainMap{d3, d4)

    cm3 = cm1 | cm2

Gives cm3 a value of:

    ChainMap(d1 | d4 | d3, d2)  # Or, equivalently: ChainMap(d1 | dict(cm2), d2)

And:

    cm1 |= cm2

Is equivalent to:

    d1 |= cm2

I don’t want to change which map is "first", and I think changing the winning behavior from that of dict will create more problems than it solves. We only need to look at how ChainMap handles the update method… it keeps the same exact behavior, rather than trying to be lazy or reversed or something.

If we *are* deciding to do something different, then I think it should have no relationship to PEP 584, which reasons out a carefully considered merge operation for dict, not ChainMap. But, it would also probably need a different operator, and be able to stand on its own merits.
msg362810 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-02-27 15:43
I had just come to a different conclusion. Maybe ChainMap should just not
grow `|` and `|=` operators? That way there can be no confusion. `dict() |
ChainMap()` and `ChainMap() | dict()` will fail because ChainMap doesn't
inherit from dict. (Note that in your last message, `d1 |= cm2` will fail
for this reason. You can of course fix that with `d1 |= dict(cm2)`,
although IIUC there's no reason one of the maps couldn't be some other
[Mutable]Mapping.)
msg362813 - (view) Author: Brandt Bucher (brandtbucher) * (Python triager) Date: 2020-02-27 15:57
> Note that in your last message, `d1 |= cm2` will fail for this reason. You can of course fix that with `d1 |= dict(cm2)`, although IIUC there's no reason one of the maps couldn't be some other [Mutable]Mapping.

Mappings and iterables are fine for the in-place variant. :)

>>> from collections import ChainMap
>>> d = {}
>>> c = ChainMap({"r": 2, "d":2})
>>> d |= c
>>> d
{'r': 2, 'd': 2}

I think it would be confusing to have `ChainMap | ChainMap` behave subtly different than `dict | ChainMap`. It would be *especially* odd if it also differed subtly from `ChainMap | dict`.

To recap:

+1 on adding the operators with dict semantics,
+0 on no PEP 584 for ChainMap.
-0 on implementing them, but changing the winning behavior by concatenating the maps lists or something. This would probably make more sense to me as a `+` operator, honestly. :(
-1 for having the operators behave differently (other than performance shortcuts) for `cm | d`, `cm | cm`, `cm |= d`, `cm |= cm`.
msg362816 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-02-27 17:01
OK, assuming `|=` gets the same semantics as update(), can you repeat once
more (without motivation) what the specification for `cm | other` will be?
msg362817 - (view) Author: Brandt Bucher (brandtbucher) * (Python triager) Date: 2020-02-27 17:10
I believe that:

    cm | other

Should return the equivalent of:

    ChainMap(cm.maps[0] | dict(other), *cm.maps[1:])
msg362819 - (view) Author: Brandt Bucher (brandtbucher) * (Python triager) Date: 2020-02-27 17:18
...however, I could also see the (similar):

    ChainMap(other, *cm.maps)  # Note that `other` is the original reference here.

Being okay as well. Maybe even better, now that I've written it out.
msg362821 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-02-27 17:26
OK, that makes sense, it works similar to ChainMap.copy(), which copies maps[0] and keeps links to the rest. So in particular `cm | {}` will do the same thing as cm.copy().

Im not sure if the dict(other) cast is the best way to go about it. Maybe this would work?

def __or__(self, other):
    new = self.copy()
    new |= other  # OR new.update(other) ???
    return new

def __ior__(self, other):
    self.update(other)
    return self

Note that there is no ChainMap.update() definition -- it relies on MutableMapping.update().

I guess we need a __ror__ as well, in case there's some other mapping that doesn't implement __or__:

def __ror__(self, other):
    new = other.copy()
    new.update(self)
    return new

Note that this doesn't return a ChainMap but an instance of type(other).  If other doesn't have a copy() method it'll fail.

As a refinement, __or__ and __ror__ should perhaps check whether the operation can possibly succeed and return NotImplemented instead of raising? (Based on the type of other only, not its contents.)
msg362823 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-02-27 17:28
I didn't see your second reply, with `ChainMap(other, *cm.maps)`.

I'm not so keen on that, because its special behavior can't be mimicked by `|=`.
msg362827 - (view) Author: Brandt Bucher (brandtbucher) * (Python triager) Date: 2020-02-27 17:49
> Im not sure if the dict(other) cast is the best way to go about it. Maybe this would work?

Yeah, I was imagining something like that... I used the cast for brevity in my reply but that probably wasn't helpful.

Note that for __or__, we probably want to check the type of the argument (for either dict or ChainMap, or maybe just Mapping), to keep it from working on an iterable of key-value pairs.

> I guess we need a __ror__ as well, in case there's some other mapping that doesn't implement __or__:

Agreed. Again, we can check for Mapping here to assure success for the copy() move.

> As a refinement, __or__ and __ror__ should perhaps check whether the operation can possibly succeed and return NotImplemented instead of raising? (Based on the type of other only, not its contents.)

Yup, see above. I think a check for Mapping should be fine.
msg362828 - (view) Author: Brandt Bucher (brandtbucher) * (Python triager) Date: 2020-02-27 17:52
Just to clarify:

If we decide to check isinstance(other, (ChainMap, dict)), '|' should probably be used.

If we decide to check isinstance(other, Mapping), I think the copy/update methods should be used.
msg362831 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-02-27 20:00
1.

def __or__(self, other):
    return self.__class__(self.maps[0] | other, *self.maps[1:])

def __ror__(self, other):
    return other | dict(self)


2.

def __or__(self, other):
    return self.__class__(other, *self.maps)

def __ror__(self, other):
    return self.__class__(*self.maps, other)


There are problems with both variants, so I think it may be better to not add this operator to ChainMap.
msg362835 - (view) Author: Brandt Bucher (brandtbucher) * (Python triager) Date: 2020-02-27 20:12
I think we're only seriously considering the first variant (although implemented slightly differently, see my last two messages). And __ror__ would probably change, returning the type of self.

What are the "problems" with it, exactly? We seem to be in agreement that the update behavior is reasonable, even for ChainMaps.
msg362852 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-02-28 00:48
We already have somewhat different semantics of `|` for Counter, and hence I think it's fine to give it the most useful semantics for ChainMap given that class's special behavior. I think we've come up with the right solution there.

Let's stop the debate and put up a PR.
msg362855 - (view) Author: Brandt Bucher (brandtbucher) * (Python triager) Date: 2020-02-28 00:58
Sounds good, I'll have these up soon.
msg363526 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-03-06 17:24
New changeset 57c9d1725689dde068a7fccaa7500772ecd16d2e by Brandt Bucher in branch 'master':
bpo-36144: Implement defaultdict union (GH-18729)
https://github.com/python/cpython/commit/57c9d1725689dde068a7fccaa7500772ecd16d2e
msg363527 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-03-06 17:34
Still waiting for ChainMap -- what else?
msg363528 - (view) Author: Brandt Bucher (brandtbucher) * (Python triager) Date: 2020-03-06 17:51
My brother will have a ChainMap PR up soon. I'm just finishing up MappingProxyType, myself. Probably both this weekend.

Then I'll move on to OrderedDict, which looks like it could be tricky. I'll need to familiarize myself with the implementation better (unless there's somebody who is already familiar with it who wants to take over). It looks well-commented, though.

I think we can pass on the http.cookies subclasses since there don't appear to be any experts/maintainers for that module.
msg363613 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-03-07 19:03
New changeset 4663f66f3554dd8e2ec130e40f6abb3c6a514775 by Brandt Bucher in branch 'master':
bpo-36144: Update MappingProxyType with PEP 584's operators (#18814)
https://github.com/python/cpython/commit/4663f66f3554dd8e2ec130e40f6abb3c6a514775
msg363625 - (view) Author: Brandt Bucher (brandtbucher) * (Python triager) Date: 2020-03-08 02:36
Issue 39857 just reminded me that we should update os._Environ as well (the type of os.environ and os.environb).

I have another first-timer who will probably want to take it.
msg363889 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-11 09:14
Once this issue will be done, would you mind to update PEP 584? Currently, it only says "This PEP proposes adding merge (|) and update (|=) operators to the built-in dict class." It doesn't mention that other types like OrderedDict, MappingProxy or ChainMap are updated as well.
msg363953 - (view) Author: Brandt Bucher (brandtbucher) * (Python triager) Date: 2020-03-11 18:04
Yes, I can add a section explaining that after the PEP was accepted, we decided to add the operators to several non-dict mappings as well. I'm also going to add some explanation as to why Mapping/MutableMapping didn't grow them, too.
msg364106 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-03-13 16:04
New changeset d648ef10c5c7659ed3c9f34d5c751dc55e2c6007 by Charles Burkland in branch 'master':
bpo-36144: Update os.environ and os.environb for PEP 584  (#18911)
https://github.com/python/cpython/commit/d648ef10c5c7659ed3c9f34d5c751dc55e2c6007
msg364107 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-03-13 16:06
New changeset 6d674a1bf456945eb758e85c11484a9f1494f2b4 by Brandt Bucher in branch 'master':
bpo-36144: OrderedDict Union (PEP 584) (#18967)
https://github.com/python/cpython/commit/6d674a1bf456945eb758e85c11484a9f1494f2b4
msg364195 - (view) Author: Brandt Bucher (brandtbucher) * (Python triager) Date: 2020-03-14 19:31
Three other MutableMappings we might want to update:

- shelve.Shelf
- weakref.WeakKeyDictionary
- weakref.WeakValueDictionary

Shelf is up in the air, since it doesn't look like it defines a copy() equivalent... I also have no experience with it. Since it's a MutableMapping subclass, (not a dict subclass), we could in theory hold off on updating this until someone asks for it, without backward compatibility issues.

I think the other two should be updated, though. I can coordinate PRs for them this week.
msg364196 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-03-14 20:05
I definitely think we should leave Shelf alone, it's a toy class from a different era.

It makes sense to update the weak dicts; hopefully the | and |= operators can be implemented in terms of other, more primitive operations, so we will have assurance that these classes' essential behavior is preserved.
msg364887 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-03-23 19:02
New changeset f393b2c588559162dc2e77f8079a42e48558870a by Curtis Bucher in branch 'master':
bpo-36144: Add PEP 584 operators to collections.ChainMap (#18832)
https://github.com/python/cpython/commit/f393b2c588559162dc2e77f8079a42e48558870a
msg364900 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-03-23 20:49
New changeset 25e580a73c163f472fdeb5489bebef85da21655c by Curtis Bucher in branch 'master':
bpo-36144: Add union operators to WeakKeyDictionary (#19106)
https://github.com/python/cpython/commit/25e580a73c163f472fdeb5489bebef85da21655c
msg364969 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-03-25 01:51
New changeset 8f1ed21ecf57cc8b8095d9d1058af2b9b3ed0413 by Curtis Bucher in branch 'master':
bpo-36144: Add union operators to WeakValueDictionary584 (#19127)
https://github.com/python/cpython/commit/8f1ed21ecf57cc8b8095d9d1058af2b9b3ed0413
msg364992 - (view) Author: Brandt Bucher (brandtbucher) * (Python triager) Date: 2020-03-25 15:24
And... that's it! Big thanks to everybody who had a part in making this happen.
msg365245 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-03-29 04:56
I'm guessing this can be closed?
msg365333 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-03-30 16:54
I guess we should keep this open until Raymond Hettinger has given feedback on https://github.com/python/cpython/pull/18832 (where we have the option of changing to Brandt's proposal from https://github.com/python/cpython/pull/18832#issuecomment-596910350).
msg371520 - (view) Author: Sumit Jaiswal (justjais) Date: 2020-06-15 05:50
First off, thanks for adding the feature, it's much appreciated.

But it'd be great if you guys can enable list merge for the dict having list as its value, in current form I believe it's handling only "key: value" merge.

for e.g.:
>>> d1 = {'spam': [1, 2, 3]}
>>> d2 = {'spam': [2, 3, 4]}
>>> d1 | d2
>>> {'spam': [1, 2, 3, 4]}
msg371549 - (view) Author: Brandt Bucher (brandtbucher) * (Python triager) Date: 2020-06-15 14:24
Similar behavior was considered and ultimately rejected by the PEP as being too specialized:

https://www.python.org/dev/peps/pep-0584/#concatenate-values-in-a-list

What you're asking for it subtly different, and even *more* specialized than that. I'd recommend just subclassing dict and overriding the operator, as the PEP suggests.
msg373205 - (view) Author: Raps Uk (Raps Uk) Date: 2020-07-07 05:39
Taking the union of items() in Python 3 (viewitems() in Python 2.7) will also fail when values are unhashable objects (like lists, for example). Even if your values are hashable, since sets are semantically unordered, the behavior is undefined in regards to precedence. So don't do this:

>>> c = dict(a.items() | b.items())
This example demonstrates what happens when values are unhashable:

>>> x = {'a': []}
>>> y = {'b': []}
>>> dict(x.items() | y.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
Here's an example where y should have precedence, but instead the value from x is retained due to the arbitrary order of sets:

>>> x = {'a': 2}
>>> y = {'a': 1}
>>> dict(x.items() | y.items())
{'a': 2}

http://net-informations.com/python/ds/merge.htm
History
Date User Action Args
2020-07-07 05:39:40Raps Uksetnosy: + Raps Uk
messages: + msg373205
2020-06-15 14:24:34brandtbuchersetmessages: + msg371549
2020-06-15 05:50:31justjaissetnosy: + justjais
messages: + msg371520
2020-03-30 16:54:02gvanrossumsetmessages: + msg365333
2020-03-30 00:12:40curtisbuchersetpull_requests: + pull_request18583
2020-03-29 04:56:12gvanrossumsetmessages: + msg365245
stage: resolved -> patch review
2020-03-25 15:24:24brandtbuchersetstatus: open -> closed
resolution: fixed
messages: + msg364992

stage: patch review -> resolved
2020-03-25 01:51:35gvanrossumsetmessages: + msg364969
2020-03-23 22:15:37curtisbuchersetpull_requests: + pull_request18488
2020-03-23 20:49:53gvanrossumsetmessages: + msg364900
2020-03-23 19:02:12gvanrossumsetmessages: + msg364887
2020-03-21 23:43:55vstinnersetnosy: - vstinner
2020-03-21 23:00:46curtisbuchersetpull_requests: + pull_request18467
2020-03-14 20:05:43gvanrossumsetmessages: + msg364196
2020-03-14 19:31:31brandtbuchersetmessages: + msg364195
2020-03-13 16:06:08gvanrossumsetmessages: + msg364107
2020-03-13 16:04:47gvanrossumsetmessages: + msg364106
2020-03-12 19:14:27brandtbuchersetpull_requests: + pull_request18315
2020-03-11 18:04:08brandtbuchersetmessages: + msg363953
2020-03-11 16:27:33brandtbuchersetpull_requests: + pull_request18283
2020-03-11 09:14:19vstinnersetnosy: + vstinner
messages: + msg363889
2020-03-11 02:34:44chaburklandsetnosy: + chaburkland
pull_requests: + pull_request18266
2020-03-08 02:36:48brandtbuchersetmessages: + msg363625
2020-03-07 19:38:52curtisbuchersetnosy: + curtisbucher
pull_requests: + pull_request18189
2020-03-07 19:03:13gvanrossumsetmessages: + msg363613
2020-03-06 19:13:51brandtbuchersetpull_requests: + pull_request18172
2020-03-06 17:51:08brandtbuchersetmessages: + msg363528
2020-03-06 17:34:15gvanrossumsetmessages: + msg363527
2020-03-06 17:24:14gvanrossumsetmessages: + msg363526
2020-03-02 05:36:21brandtbuchersetpull_requests: + pull_request18086
2020-02-28 00:58:56brandtbuchersetmessages: + msg362855
2020-02-28 00:48:12gvanrossumsetmessages: + msg362852
2020-02-27 20:12:28brandtbuchersetmessages: + msg362835
2020-02-27 20:00:33serhiy.storchakasetmessages: + msg362831
2020-02-27 17:52:25brandtbuchersetmessages: + msg362828
2020-02-27 17:49:23brandtbuchersetmessages: + msg362827
2020-02-27 17:28:42gvanrossumsetmessages: + msg362823
2020-02-27 17:26:54gvanrossumsetmessages: + msg362821
2020-02-27 17:18:57brandtbuchersetmessages: + msg362819
2020-02-27 17:10:59brandtbuchersetmessages: + msg362817
2020-02-27 17:01:25gvanrossumsetmessages: + msg362816
2020-02-27 15:57:52brandtbuchersetmessages: + msg362813
2020-02-27 15:43:45gvanrossumsetmessages: + msg362810
2020-02-27 07:04:31brandtbuchersetmessages: + msg362779
2020-02-27 05:59:41gvanrossumsetmessages: + msg362774
2020-02-27 01:26:37josh.rsetmessages: + msg362761
2020-02-27 00:56:25brandtbuchersetmessages: + msg362759
2020-02-27 00:29:35josh.rsetmessages: + msg362757
2020-02-26 21:20:54brandtbuchersetmessages: + msg362740
2020-02-26 20:03:22gvanrossumsetmessages: + msg362729
2020-02-26 20:01:54gvanrossumsetmessages: + msg362726
2020-02-25 21:16:20brandtbuchersettitle: Dictionary addition. (PEP 584) -> Dictionary union. (PEP 584)
2020-02-25 20:42:02steve.dowersetmessages: + msg362666
2020-02-25 20:19:59brandtbuchersetmessages: + msg362663
2020-02-25 20:05:25steve.dowersetnosy: + steve.dower
messages: + msg362660
2020-02-25 19:19:08brandtbuchersetpull_requests: + pull_request18020
2020-02-25 15:07:40gvanrossumsetmessages: + msg362645
2020-02-25 06:29:00brandtbuchersetmessages: + msg362626
2020-02-25 03:49:09gvanrossumsetmessages: + msg362621
2020-02-25 03:47:37gvanrossumsetmessages: + msg362620
2020-02-25 03:46:03gvanrossumsetmessages: + msg362619
versions: + Python 3.9, - Python 3.8
2019-12-12 21:16:45Aaron Hallsetnosy: + Aaron Hall
messages: + msg358305
2019-03-06 01:55:57vstinnersetnosy: - vstinner
2019-03-06 01:55:53vstinnersetmessages: + msg337267
title: Dictionary addition. -> Dictionary addition. (PEP 584)
2019-03-06 01:54:38vstinnersetnosy: + vstinner
messages: + msg337266
2019-03-04 13:05:30scodersetmessages: + msg337107
2019-03-04 12:09:17slamsetnosy: + slam
messages: + msg337094
2019-02-28 17:30:25xtreaksetmessages: + msg336854
2019-02-28 16:42:22josh.rsetmessages: + msg336849
2019-02-28 16:34:00gvanrossumsetmessages: + msg336848
2019-02-28 16:21:02josh.rsetnosy: + josh.r
messages: + msg336847
2019-02-28 10:12:23mark.dickinsonsetnosy: + mark.dickinson
2019-02-28 09:20:07scodersetnosy: + scoder
messages: + msg336820
2019-02-28 08:23:10rhettingersetmessages: + msg336816
2019-02-28 07:52:06serhiy.storchakasetmessages: + msg336812
2019-02-28 07:36:52rhettingersetmessages: + msg336811
2019-02-28 07:29:11serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg336810
2019-02-28 07:07:07xtreaksetnosy: + xtreak
messages: + msg336808
2019-02-28 06:33:14rhettingersetnosy: + gvanrossum
messages: + msg336803
2019-02-28 05:01:17xtreaksetnosy: + rhettinger
2019-02-28 04:19:14brandtbuchersetkeywords: + patch
stage: patch review
pull_requests: + pull_request12098
2019-02-28 04:18:58brandtbuchercreate