classification
Title: Dictionary addition. (PEP 584)
Type: enhancement Stage: patch review
Components: Interpreter Core Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: brandtbucher, gvanrossum, josh.r, mark.dickinson, rhettinger, scoder, serhiy.storchaka, slam, xtreak
Priority: normal Keywords: patch

Created on 2019-02-28 04:18 by brandtbucher, last changed 2019-03-06 01:55 by vstinner.

Pull Requests
URL Status Linked Edit
PR 12088 open brandtbucher, 2019-02-28 04:19
Messages (16)
msg336798 - (view) Author: Brandt Bucher (brandtbucher) * Date: 2019-02-28 04:18
...as discussed in python-ideas. Semantically:

d1 + d2 <-> d3 = d1.copy(); d3.update(d2); d3
d1 += d2 <-> d1.update(d2)

Attached is a working implementation with new/fixed tests for consideration. I've also updated collections.UserDict with the new __add__/__radd__/__iadd__ methods.
msg336803 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-02-28 06:33
I believe that Guido rejected this when it was proposed a few years ago.
msg336808 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2019-02-28 07:07
Python ideas discussion in 2015 : https://mail.python.org/pipermail/python-ideas/2015-February/031748.html
LWN summary : https://lwn.net/Articles/635397/
msg336810 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-02-28 07:29
I believe it was proposed and rejected multiple times.
msg336811 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-02-28 07:36
For the record, I'm opposed to the idea.

* Use of the + operator is a temptation to produce new dictionaries rather than update an existing dict in-place which is usually what you want.

* We already have ChainMap() which presents a single view of multiple mappings with any copying.

* It is natural to expect the plus operator to be commutative, but this operation would necessarily be non-commutative.

* Many other APIs are modeled on the dict API, so we should not grow the API unless there is a big win.  The effects would be pervasive.

* I don't see other languages going down this path, nor am I seeing dict subclasses that implement this functionality.  Those are indications that this more of a "fun thing we could do" rather than a "thing that people need".

* The existing code already reads nicely:

     options.update(user_selections)

  That reads more like self explanatory English than:

     options += user_selections

  The latter takes more effort to correctly parse and
  makes it less clear that you're working with dicts.

* It isn't self-evident that the right operand needs to be another dictionary. If a person is trying to "add a key / value pair" to an existing dictionary, the "+=" operator would be tempting but it wouldn't work.
msg336812 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-02-28 07:52
> * It is natural to expect the plus operator to be commutative, but this operation would necessarily be non-commutative.

In Python, the plus operator for sequences (strings, lists, tuples) is non-commutative.

But I have other arguments against it:

* It conflicts with the plus operator of Counter (which is a specialized dict): Counter(a=2) + Counter(a=3) == Counter(a=5), but the proposed idea makes dict(a=2) + dict(a=3) == dict(a=3).

* We already have a syntax for dict merging: {**d1, **d2}. It works with arbitrary mappings, in contrary to the plus operator, which needs a special support in argument types.
msg336816 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-02-28 08:23
> In Python, the plus operator for sequences (strings, lists, 
> tuples) is non-commutative.

For sequences, that is obvious and expected, but not so much with mappings where the order of overlapping keys is determined by the left operand and the value associated with those keys is determined by the right operand.

Also with sequences the + operator actually means "add to", but with dictionaries it means "add/or replace" which is contrary to the normal meaning of plus.  I think that was one of Guido's reasons for favoring "|" instead of "+" for set-to-set operations.

> We already have a syntax for dict merging: {**d1, **d2}. 
> It works with arbitrary mappings,

This is a good point.
msg336820 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-02-28 09:20
> We already have a syntax for dict merging: {**d1, **d2}. 

Which doesn't mean that "d1 + d2" isn't much more intuitive than this special-character heavy version. It takes me a while to see the dict merge under that heap of stars. And that's already the shortest example.


> It works with arbitrary mappings,

The RHS of "d += M" doesn't have to be a dict IMHO, it could be any mapping. And even "dict(X) + M" doesn't look all too bad to me, even though there's "dict(X, **M)".


> Use of the + operator is a temptation to produce new dictionaries rather than update an existing dict in-place which is usually what you want.

That's why there would be support for "+=". The exact same argument already fails for lists, where concatenation is usually much more performance critical than for the average little dict. (And remember that most code isn't performance critical at all.)


> We already have ChainMap() which presents a single view of multiple mappings with any copying.

Which is a different use case that is unlikely to go away with this proposal.


> makes it less clear that you're working with dicts.

This is a valid argument, although it always depends on the concrete code what the most readable way to express its intentions is. Again, this doesn't really differ for lists.

Let's wait for the PEP, I'd say.
msg336847 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2019-02-28 16:21
scoder: dict(X, **M) is broken unless M is known to be string keyed (it used to work, but in Python 3, it will raise a TypeError). It's part of the argument for the additional unpacking generalizations from PEP 448; {**X, **M} does what dict(X, **M) is trying to do, but without abusing the keyword argument passing convention.

You also claim "It takes me a while to see the dict merge under that heap of stars", but that's at least as much about the newness of PEP 448 (and for many Python coders, a complete lack of familiarity with the pre-existing varargs unpacking rules for functions) as it is about the punctuation; after all, you clearly recognize dict(X, **M) even though it's been wrong in most contexts for years.

In any event, I'm a strong -1 on this, for largely the same reasons as Raymond and others:

1. It doesn't provide any new functionality, just one more way to do it; += is satisfied by .update, + is satisfied (more generally and efficiently) by the unpacking generalizations

2. It's needlessly confusing; addition is, for all existing types in the standard library I can think of, lossless; the information from both sides of the + is preserved in some form, either by addition or concatenation (and in the concatenation case, addition is happening, just to the length of the resulting sequence, and order is preserved). Addition for dictionaries would introduce new rules specific to dicts that do not exist for any other type regarding loss of values, non-additive resulting length, etc. Those rules would likely be similar to those of dict literals and the update method, but they'd need to be made explicit. By contrast, the PEP 448 unpacking generalization rules followed the existing rules for dict literals; no special rules occur, it just behaves intuitively (if you already knew the rules for dict literals without unpacking being involved).

3. Almost any generic, duck-typing based code for which addition makes sense will not make sense for dicts simply because it loosens the definition of addition too much to be useful, so best case, it still raises TypeError (when dicts added to non-dict things), worst case, it silently operates in a way that violates the rules of both addition and concatenation rather than raising a TypeError that the generic code could use to determine the correct thing to do.

4. The already mentioned conflict with Counter (which already has an addition operator, with lossless semantics)

5. (Minor) It means PyDict_Type needs a non-NULL tp_as_number, so now it's slightly slower to reject dicts as being non-numeric at the C layer

Problem #2 could be used to argue for allowing | instead of + (which would also resolve #4, and parts of #3), since | is already used for unioning with sets, and this operation is much closer to a union operation than addition or concatenation. Even so, it would still be misleading; at least with sets, there is no associated value, so it's still mostly lossless (you lose the input lengths, but the unique input data is kept); with dicts, you'd be losing values too.

Basically, I think the PEP 448 unpacking syntax should remain as the "one-- and preferably only one --obvious way to" combine dictionaries as a one-liner. It's more composable, since it allows adding arbitrary additional key/value pairs, and more efficient, since it allows combining more than two dicts at once with no additional temporaries: dicta + dictb + dictc requires "dictab" to be made first, then thrown away after dictab + dictc produces dictabc, while {**dicta, **dictb, **dictc} builds dictabc directly.

The only real argument I can see for not sticking to unpacking is that it doesn't allow for arbitrary dict-like things to produce new dict-like things directly; you'd have to rewrap as myspecialdict({**speciala, **specialb}). But I don't think that's a flaw worth fixing if it means major changes to the behavior of what I'm guessing is one of the three most commonly used types in Python (along with int and tuple, thanks to the integration of dicts into so many facets of the implementation).
msg336848 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2019-02-28 16:34
I changed my mind and am now in favor. Most of the arguments against could also be used against list+list. Counter addition is actually a nice special case of this -- it produces the same keys but has a more sophisticated way of merging values for common keys. Please read the python-ideas thread!
msg336849 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2019-02-28 16:42
Also note: That Python ideas thread that xtreak linked ( https://mail.python.org/pipermail/python-ideas/2015-February/031748.html ) largely rejected the proposal a couple weeks before PEP 448 was approved. At the time, the proposal wasn't just about +/+=; that was the initial proposal, but operator overloading was heavily criticized for the failure to adhere to either addition or concatenation semantics, so alternate constructors and top-level functions similar to sorted were proposed as alternatives (e.g. merged(dicta, dictb)). The whole thread ended up being about creating an approved, built-in way of one-lining: d3 = d1.copy(); d3.update(d2)

A key quote though is that this was needed because there was no other option without rolling your own merged function. Andrew Barnert summarized it best:

"I'm +1 on constructor, +0.5 on a function (whether it's called updated or merged, whether it's in builtins or collections), +0.5 on both constructor and function, -0.5 on a method, and -1 on an operator.

"Unless someone is seriously championing PEP 448 for 3.5, in which case I'm -0.5 on anything, because it looks like PEP 448 would already give us one obvious way to do it, and none of the alternatives are sufficiently nicer than that way to be worth having another."

As it happens, PEP 448 was put in 3.5, and we got the one obvious way to do it.

Side-note: It occurs to me there will be one more "way to do it" in 3.8 already, thanks to PEP 572:

(d3 := d1.copy()).update(d2)

I think I'll stick with d3 = {**d1, **d2} though. :-)
msg336854 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2019-02-28 17:30
Current python-ideas thread for the issue : https://mail.python.org/pipermail/python-ideas/2019-February/055509.html
msg337094 - (view) Author: Viktor Kharkovets (slam) * Date: 2019-03-04 12:09
If we're going to forget about commutativity of +, should we also implement +/+= for sets?
msg337107 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-03-04 13:05
> should we also implement +/+= for sets?

The question is: what would that do? The same as '|=' ? That would be rather confusing, I think. "|" (meaning: "or") seems a very natural operation for sets, in the same way that "|" operates on bits in integers. That suggests that "|" is the right operator for sets.

In any case, this is an unrelated proposal that is better not discussed in this ticket. The only link is whether "|" is the more appropriate operator also for dicts, which is to be discussed in the PEP and thus also not in this ticket.
msg337266 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-06 01:54
Is this issue directly or indirectly related to the PEP 584 "Add + and - operators to the built-in dict class"?
https://www.python.org/dev/peps/pep-0584/
msg337267 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-06 01:55
> Is this issue directly or indirectly related to the PEP 584 "Add + and - operators to the built-in dict class"?
> https://www.python.org/dev/peps/pep-0584/

Ah yes, it's written in the title of the PR. I add it to the bug title as well.
History
Date User Action Args
2019-03-06 01:55:57vstinnersetnosy: - vstinner
2019-03-06 01:55:53vstinnersetmessages: + msg337267
title: Dictionary addition. -> Dictionary addition. (PEP 584)
2019-03-06 01:54:38vstinnersetnosy: + vstinner
messages: + msg337266
2019-03-04 13:05:30scodersetmessages: + msg337107
2019-03-04 12:09:17slamsetnosy: + slam
messages: + msg337094
2019-02-28 17:30:25xtreaksetmessages: + msg336854
2019-02-28 16:42:22josh.rsetmessages: + msg336849
2019-02-28 16:34:00gvanrossumsetmessages: + msg336848
2019-02-28 16:21:02josh.rsetnosy: + josh.r
messages: + msg336847
2019-02-28 10:12:23mark.dickinsonsetnosy: + mark.dickinson
2019-02-28 09:20:07scodersetnosy: + scoder
messages: + msg336820
2019-02-28 08:23:10rhettingersetmessages: + msg336816
2019-02-28 07:52:06serhiy.storchakasetmessages: + msg336812
2019-02-28 07:36:52rhettingersetmessages: + msg336811
2019-02-28 07:29:11serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg336810
2019-02-28 07:07:07xtreaksetnosy: + xtreak
messages: + msg336808
2019-02-28 06:33:14rhettingersetnosy: + gvanrossum
messages: + msg336803
2019-02-28 05:01:17xtreaksetnosy: + rhettinger
2019-02-28 04:19:14brandtbuchersetkeywords: + patch
stage: patch review
pull_requests: + pull_request12098
2019-02-28 04:18:58brandtbuchercreate