Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionary union. (PEP 584) #80325

Closed
brandtbucher opened this issue Feb 28, 2019 · 66 comments
Closed

Dictionary union. (PEP 584) #80325

brandtbucher opened this issue Feb 28, 2019 · 66 comments
Labels
3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@brandtbucher
Copy link
Member

BPO 36144
Nosy @gvanrossum, @rhettinger, @mdickinson, @scoder, @serhiy-storchaka, @zooba, @MojoVampire, @aaronchall, @3lnc, @tirkarthi, @brandtbucher, @curtisbucher, @chaburkland, @justjais
PRs
  • bpo-36144: Dictionary Union (PEP 584) #12088
  • bpo-36144: Document PEP 584. #18659
  • bpo-36144: Implement defaultdict union. #18729
  • bpo-36144: Update MappingProxyType with PEP 584's operators #18814
  • bpo-36144: Add PEP 584 operators to collections.ChainMap #18832
  • bpo-36144: Update os.environ and os.environb for PEP 584  #18911
  • bpo-36144: Simplify defaultdict.__or__ #18931
  • bpo-36144: OrderedDict Union (PEP 584) #18967
  • bpo-36144: Add union operators to WeakKeyDictionary #19106
  • bpo-36144: Add union operators to WeakValueDictionary584 #19127
  • bpo-36144: Fixed test in test_collections.py #19221
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-03-25.15:24:24.165>
    created_at = <Date 2019-02-28.04:18:58.342>
    labels = ['interpreter-core', 'type-feature', '3.9']
    title = 'Dictionary union. (PEP 584)'
    updated_at = <Date 2020-07-07.05:39:40.553>
    user = 'https://github.com/brandtbucher'

    bugs.python.org fields:

    activity = <Date 2020-07-07.05:39:40.553>
    actor = 'Raps Uk'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-03-25.15:24:24.165>
    closer = 'brandtbucher'
    components = ['Interpreter Core']
    creation = <Date 2019-02-28.04:18:58.342>
    creator = 'brandtbucher'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 36144
    keywords = ['patch']
    message_count = 66.0
    messages = ['336798', '336803', '336808', '336810', '336811', '336812', '336816', '336820', '336847', '336848', '336849', '336854', '337094', '337107', '337266', '337267', '358305', '362619', '362620', '362621', '362626', '362645', '362660', '362663', '362666', '362726', '362729', '362740', '362757', '362759', '362761', '362774', '362779', '362810', '362813', '362816', '362817', '362819', '362821', '362823', '362827', '362828', '362831', '362835', '362852', '362855', '363526', '363527', '363528', '363613', '363625', '363889', '363953', '364106', '364107', '364195', '364196', '364887', '364900', '364969', '364992', '365245', '365333', '371520', '371549', '373205']
    nosy_count = 15.0
    nosy_names = ['gvanrossum', 'rhettinger', 'mark.dickinson', 'scoder', 'serhiy.storchaka', 'steve.dower', 'josh.r', 'Aaron Hall', 'slam', 'xtreak', 'brandtbucher', 'curtisbucher', 'chaburkland', 'justjais', 'Raps Uk']
    pr_nums = ['12088', '18659', '18729', '18814', '18832', '18911', '18931', '18967', '19106', '19127', '19221']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue36144'
    versions = ['Python 3.9']

    @brandtbucher
    Copy link
    Member Author

    ...as discussed in python-ideas. Semantically:

    d1 + d2 <-> d3 = d1.copy(); d3.update(d2); d3
    d1 += d2 <-> d1.update(d2)

    Attached is a working implementation with new/fixed tests for consideration. I've also updated collections.UserDict with the new __add__/radd/iadd methods.

    @brandtbucher brandtbucher added 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Feb 28, 2019
    @rhettinger
    Copy link
    Contributor

    I believe that Guido rejected this when it was proposed a few years ago.

    @tirkarthi
    Copy link
    Member

    @serhiy-storchaka
    Copy link
    Member

    I believe it was proposed and rejected multiple times.

    @rhettinger
    Copy link
    Contributor

    For the record, I'm opposed to the idea.

    • Use of the + operator is a temptation to produce new dictionaries rather than update an existing dict in-place which is usually what you want.

    • We already have ChainMap() which presents a single view of multiple mappings with any copying.

    • It is natural to expect the plus operator to be commutative, but this operation would necessarily be non-commutative.

    • Many other APIs are modeled on the dict API, so we should not grow the API unless there is a big win. The effects would be pervasive.

    • I don't see other languages going down this path, nor am I seeing dict subclasses that implement this functionality. Those are indications that this more of a "fun thing we could do" rather than a "thing that people need".

    • The existing code already reads nicely:

         options.update(user_selections)

    That reads more like self explanatory English than:

     options += user_selections
    

    The latter takes more effort to correctly parse and
    makes it less clear that you're working with dicts.

    • It isn't self-evident that the right operand needs to be another dictionary. If a person is trying to "add a key / value pair" to an existing dictionary, the "+=" operator would be tempting but it wouldn't work.

    @serhiy-storchaka
    Copy link
    Member

    • It is natural to expect the plus operator to be commutative, but this operation would necessarily be non-commutative.

    In Python, the plus operator for sequences (strings, lists, tuples) is non-commutative.

    But I have other arguments against it:

    • It conflicts with the plus operator of Counter (which is a specialized dict): Counter(a=2) + Counter(a=3) == Counter(a=5), but the proposed idea makes dict(a=2) + dict(a=3) == dict(a=3).

    • We already have a syntax for dict merging: {**d1, **d2}. It works with arbitrary mappings, in contrary to the plus operator, which needs a special support in argument types.

    @rhettinger
    Copy link
    Contributor

    In Python, the plus operator for sequences (strings, lists,
    tuples) is non-commutative.

    For sequences, that is obvious and expected, but not so much with mappings where the order of overlapping keys is determined by the left operand and the value associated with those keys is determined by the right operand.

    Also with sequences the + operator actually means "add to", but with dictionaries it means "add/or replace" which is contrary to the normal meaning of plus. I think that was one of Guido's reasons for favoring "|" instead of "+" for set-to-set operations.

    We already have a syntax for dict merging: {**d1, **d2}.
    It works with arbitrary mappings,

    This is a good point.

    @scoder
    Copy link
    Contributor

    scoder commented Feb 28, 2019

    We already have a syntax for dict merging: {**d1, **d2}.

    Which doesn't mean that "d1 + d2" isn't much more intuitive than this special-character heavy version. It takes me a while to see the dict merge under that heap of stars. And that's already the shortest example.

    It works with arbitrary mappings,

    The RHS of "d += M" doesn't have to be a dict IMHO, it could be any mapping. And even "dict(X) + M" doesn't look all too bad to me, even though there's "dict(X, **M)".

    Use of the + operator is a temptation to produce new dictionaries rather than update an existing dict in-place which is usually what you want.

    That's why there would be support for "+=". The exact same argument already fails for lists, where concatenation is usually much more performance critical than for the average little dict. (And remember that most code isn't performance critical at all.)

    We already have ChainMap() which presents a single view of multiple mappings with any copying.

    Which is a different use case that is unlikely to go away with this proposal.

    makes it less clear that you're working with dicts.

    This is a valid argument, although it always depends on the concrete code what the most readable way to express its intentions is. Again, this doesn't really differ for lists.

    Let's wait for the PEP, I'd say.

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Feb 28, 2019

    scoder: dict(X, **M) is broken unless M is known to be string keyed (it used to work, but in Python 3, it will raise a TypeError). It's part of the argument for the additional unpacking generalizations from PEP-448; {**X, **M} does what dict(X, **M) is trying to do, but without abusing the keyword argument passing convention.

    You also claim "It takes me a while to see the dict merge under that heap of stars", but that's at least as much about the newness of PEP-448 (and for many Python coders, a complete lack of familiarity with the pre-existing varargs unpacking rules for functions) as it is about the punctuation; after all, you clearly recognize dict(X, **M) even though it's been wrong in most contexts for years.

    In any event, I'm a strong -1 on this, for largely the same reasons as Raymond and others:

    1. It doesn't provide any new functionality, just one more way to do it; += is satisfied by .update, + is satisfied (more generally and efficiently) by the unpacking generalizations

    2. It's needlessly confusing; addition is, for all existing types in the standard library I can think of, lossless; the information from both sides of the + is preserved in some form, either by addition or concatenation (and in the concatenation case, addition is happening, just to the length of the resulting sequence, and order is preserved). Addition for dictionaries would introduce new rules specific to dicts that do not exist for any other type regarding loss of values, non-additive resulting length, etc. Those rules would likely be similar to those of dict literals and the update method, but they'd need to be made explicit. By contrast, the PEP-448 unpacking generalization rules followed the existing rules for dict literals; no special rules occur, it just behaves intuitively (if you already knew the rules for dict literals without unpacking being involved).

    3. Almost any generic, duck-typing based code for which addition makes sense will not make sense for dicts simply because it loosens the definition of addition too much to be useful, so best case, it still raises TypeError (when dicts added to non-dict things), worst case, it silently operates in a way that violates the rules of both addition and concatenation rather than raising a TypeError that the generic code could use to determine the correct thing to do.

    4. The already mentioned conflict with Counter (which already has an addition operator, with lossless semantics)

    5. (Minor) It means PyDict_Type needs a non-NULL tp_as_number, so now it's slightly slower to reject dicts as being non-numeric at the C layer

    Problem #2 could be used to argue for allowing | instead of + (which would also resolve #4, and parts of #3), since | is already used for unioning with sets, and this operation is much closer to a union operation than addition or concatenation. Even so, it would still be misleading; at least with sets, there is no associated value, so it's still mostly lossless (you lose the input lengths, but the unique input data is kept); with dicts, you'd be losing values too.

    Basically, I think the PEP-448 unpacking syntax should remain as the "one-- and preferably only one --obvious way to" combine dictionaries as a one-liner. It's more composable, since it allows adding arbitrary additional key/value pairs, and more efficient, since it allows combining more than two dicts at once with no additional temporaries: dicta + dictb + dictc requires "dictab" to be made first, then thrown away after dictab + dictc produces dictabc, while {**dicta, **dictb, **dictc} builds dictabc directly.

    The only real argument I can see for not sticking to unpacking is that it doesn't allow for arbitrary dict-like things to produce new dict-like things directly; you'd have to rewrap as myspecialdict({**speciala, **specialb}). But I don't think that's a flaw worth fixing if it means major changes to the behavior of what I'm guessing is one of the three most commonly used types in Python (along with int and tuple, thanks to the integration of dicts into so many facets of the implementation).

    @gvanrossum
    Copy link
    Member

    I changed my mind and am now in favor. Most of the arguments against could also be used against list+list. Counter addition is actually a nice special case of this -- it produces the same keys but has a more sophisticated way of merging values for common keys. Please read the python-ideas thread!

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Feb 28, 2019

    Also note: That Python ideas thread that xtreak linked ( https://mail.python.org/pipermail/python-ideas/2015-February/031748.html ) largely rejected the proposal a couple weeks before PEP-448 was approved. At the time, the proposal wasn't just about +/+=; that was the initial proposal, but operator overloading was heavily criticized for the failure to adhere to either addition or concatenation semantics, so alternate constructors and top-level functions similar to sorted were proposed as alternatives (e.g. merged(dicta, dictb)). The whole thread ended up being about creating an approved, built-in way of one-lining: d3 = d1.copy(); d3.update(d2)

    A key quote though is that this was needed because there was no other option without rolling your own merged function. Andrew Barnert summarized it best:

    "I'm +1 on constructor, +0.5 on a function (whether it's called updated or merged, whether it's in builtins or collections), +0.5 on both constructor and function, -0.5 on a method, and -1 on an operator.

    "Unless someone is seriously championing PEP-448 for 3.5, in which case I'm -0.5 on anything, because it looks like PEP-448 would already give us one obvious way to do it, and none of the alternatives are sufficiently nicer than that way to be worth having another."

    As it happens, PEP-448 was put in 3.5, and we got the one obvious way to do it.

    Side-note: It occurs to me there will be one more "way to do it" in 3.8 already, thanks to PEP-572:

    (d3 := d1.copy()).update(d2)

    I think I'll stick with d3 = {**d1, **d2} though. :-)

    @tirkarthi
    Copy link
    Member

    Current python-ideas thread for the issue : https://mail.python.org/pipermail/python-ideas/2019-February/055509.html

    @3lnc
    Copy link
    Mannequin

    3lnc mannequin commented Mar 4, 2019

    If we're going to forget about commutativity of +, should we also implement +/+= for sets?

    @scoder
    Copy link
    Contributor

    scoder commented Mar 4, 2019

    should we also implement +/+= for sets?

    The question is: what would that do? The same as '|=' ? That would be rather confusing, I think. "|" (meaning: "or") seems a very natural operation for sets, in the same way that "|" operates on bits in integers. That suggests that "|" is the right operator for sets.

    In any case, this is an unrelated proposal that is better not discussed in this ticket. The only link is whether "|" is the more appropriate operator also for dicts, which is to be discussed in the PEP and thus also not in this ticket.

    @vstinner
    Copy link
    Member

    vstinner commented Mar 6, 2019

    Is this issue directly or indirectly related to the PEP-584 "Add + and - operators to the built-in dict class"?
    https://www.python.org/dev/peps/pep-0584/

    @vstinner
    Copy link
    Member

    vstinner commented Mar 6, 2019

    Is this issue directly or indirectly related to the PEP-584 "Add + and - operators to the built-in dict class"?
    https://www.python.org/dev/peps/pep-0584/

    Ah yes, it's written in the title of the PR. I add it to the bug title as well.

    @vstinner vstinner changed the title Dictionary addition. Dictionary addition. (PEP 584) Mar 6, 2019
    @aaronchall
    Copy link
    Mannequin

    aaronchall mannequin commented Dec 12, 2019

    Another obvious way to do it, but I'm +1 on it.

    A small side point however - PEP-584 reads:

    To create a new dict containing the merged items of two (or more) dicts, one can currently write:

    {**d1, **d2}

    but this is neither obvious nor easily discoverable. It is only guaranteed to work if the keys are all strings. If the keys are not strings, it currently works in CPython, but it may not work with other implementations, or future versions of CPython[2].

    ...

    [2] Non-string keys: https://bugs.python.org/issue35105 and https://mail.python.org/pipermail/python-dev/2018-October/155435.html

    The references cited does not back this assertion up. Perhaps the intent is to reference the "cool/weird hack" dict(d1, **d2) (see https://mail.python.org/pipermail/python-dev/2010-April/099485.html and https://mail.python.org/pipermail/python-dev/2010-April/099459.html), which allowed any hashable keys in Python 2 but only strings in Python 3.

    If I see {**d1, **d2}, my expectations are that this is the new generalized unpacking and I currently expect any keys to be allowed, and the PEP should be updated to accurately reflect this to prevent future misunderstandings.

    @gvanrossum
    Copy link
    Member

    PEP-584 has been approved by the Steering Council (at my recommendation). We will shortly begin landing PRs related to this.

    @gvanrossum gvanrossum added 3.9 only security fixes and removed 3.8 only security fixes labels Feb 25, 2020
    @gvanrossum
    Copy link
    Member

    New changeset eb8ac57 by Brandt Bucher in branch 'master':
    bpo-36144: Dictionary Union (PEP-584) (bpo-12088)
    eb8ac57

    @gvanrossum
    Copy link
    Member

    While the main code has been merged now, I propose to keep this issue open until some other things have happened:

    • Documentation
    • Add | operators to some dict subclasses in the stdlib
    • (What else?)

    @brandtbucher
    Copy link
    Member Author

    My current PR plans are:

    • Docs. This will include the dict docs and the whatsnew 3.9. I assume we have no plans to cover this in the tutorials, etc. Let me know if I'm missing anything here.
    • collections.defaultdict, with tests. I don't think this needs docs beyond a short "changed in version 3.9" note.
    • collections.OrderedDict, with tests. Ditto defaultdict on docs.
    • collections.ChainMap, ditto.
    • types.MappingProxy, ditto.

    I'll also create a BPO issue to discuss whether the dict subclasses in http.cookies should be updated.

    That should do it for CPython; I'm planning on updating typeshed and adding a handful of tests to mypy for TypedDict, etc. after these are landed.

    Guido, okay to tag you on these for review?

    @gvanrossum
    Copy link
    Member

    Yup, great plan.

    On Mon, Feb 24, 2020 at 22:29 Brandt Bucher <report@bugs.python.org> wrote:

    Brandt Bucher <brandtbucher@gmail.com> added the comment:

    My current PR plans are:

    • Docs. This will include the dict docs and the whatsnew 3.9. I assume we
      have no plans to cover this in the tutorials, etc. Let me know if I'm
      missing anything here.
    • collections.defaultdict, with tests. I don't think this needs docs
      beyond a short "changed in version 3.9" note.
    • collections.OrderedDict, with tests. Ditto defaultdict on docs.
    • collections.ChainMap, ditto.
    • types.MappingProxy, ditto.

    I'll also create a BPO issue to discuss whether the dict subclasses in
    http.cookies should be updated.

    That should do it for CPython; I'm planning on updating typeshed and
    adding a handful of tests to mypy for TypedDict, etc. after these are
    landed.

    Guido, okay to tag you on these for review?

    ----------


    Python tracker <report@bugs.python.org>
    <https://bugs.python.org/issue36144\>


    --
    --Guido (mobile)

    @zooba
    Copy link
    Member

    zooba commented Feb 25, 2020

    Not sure if this is a big deal or not, and it seems likely that the preexisting behaviour of .update() and ** unpacking have already decided it, but is it intentional that you end up with the first-seen key and the last-seen value in the case of collisions?

    class C:
        def __init__(self, *a): self.a = a
        def __hash__(self): return hash(self.a[0])
        def __eq__(self, o): return self.a[0] == o.a[0]
        def __repr__(self): return f"C{self.a}"
    >>> c1 = C(1, 1); c1
    C(1, 1)
    >>> c2 = C(1, 2); c2
    C(1, 2)
    
    For set union we get the first seen value:
    >>> {c1} | {c2}
    {C(1, 1)}
    
    For dict union we get the first seen key and the last seen value:
    >>> {c1: 'a'} | {c2: 'b'}
    {C(1, 1): 'b'}
    
    But similarly for dict unpack (and .update(); code left as an exercise to the reader):
    >>> {**{c1: 'a'}, **{c2: 'b'}}
    {C(1, 1): 'b'}

    So the union of two dicts may contain .items() elements that were not in either of the inputs.

    Honestly, I've never noticed this before, as the only time I create equivalent objects with meaningfully-distinct identities is to use with sets. I just figured I'd try it out after seeing suggestions that the dict union operands were transposed from set union.

    @brandtbucher
    Copy link
    Member Author

    As a somewhat simpler example:

    >>> f = {False: False}
    >>> z = {0: 0}
    >>> f | z
    {False: 0}
    >>> {**f, **z}
    {False: 0}
    >>> f.update(z); f
    {False: 0}

    Though these hairier cases aren't explicitly addressed, the conflict behavior is covered in the Rationale and Reference Implementation sections of the PEP. All of the above examples share code (dict_update_arg), and that's definitely intentional. I for one think it would be confusing (and probably a bug) if one of the examples above gave a different key-value pair!

    I find it makes more sense if you see a set as valueless keys (rather than keyless values).

    @gvanrossum
    Copy link
    Member

    OK, that makes sense, it works similar to ChainMap.copy(), which copies maps[0] and keeps links to the rest. So in particular cm | {} will do the same thing as cm.copy().

    Im not sure if the dict(other) cast is the best way to go about it. Maybe this would work?

    def __or__(self, other):
        new = self.copy()
        new |= other  # OR new.update(other) ???
        return new
    
    def __ior__(self, other):
        self.update(other)
        return self

    Note that there is no ChainMap.update() definition -- it relies on MutableMapping.update().

    I guess we need a __ror__ as well, in case there's some other mapping that doesn't implement __or__:

    def __ror__(self, other):
        new = other.copy()
        new.update(self)
        return new

    Note that this doesn't return a ChainMap but an instance of type(other). If other doesn't have a copy() method it'll fail.

    As a refinement, __or__ and __ror__ should perhaps check whether the operation can possibly succeed and return NotImplemented instead of raising? (Based on the type of other only, not its contents.)

    @gvanrossum
    Copy link
    Member

    I didn't see your second reply, with ChainMap(other, *cm.maps).

    I'm not so keen on that, because its special behavior can't be mimicked by |=.

    @brandtbucher
    Copy link
    Member Author

    Im not sure if the dict(other) cast is the best way to go about it. Maybe this would work?

    Yeah, I was imagining something like that... I used the cast for brevity in my reply but that probably wasn't helpful.

    Note that for __or__, we probably want to check the type of the argument (for either dict or ChainMap, or maybe just Mapping), to keep it from working on an iterable of key-value pairs.

    I guess we need a __ror__ as well, in case there's some other mapping that doesn't implement __or__:

    Agreed. Again, we can check for Mapping here to assure success for the copy() move.

    As a refinement, __or__ and __ror__ should perhaps check whether the operation can possibly succeed and return NotImplemented instead of raising? (Based on the type of other only, not its contents.)

    Yup, see above. I think a check for Mapping should be fine.

    @brandtbucher
    Copy link
    Member Author

    Just to clarify:

    If we decide to check isinstance(other, (ChainMap, dict)), '|' should probably be used.

    If we decide to check isinstance(other, Mapping), I think the copy/update methods should be used.

    @serhiy-storchaka
    Copy link
    Member

    def __or__(self, other):
        return self.__class__(self.maps[0] | other, *self.maps[1:])
    
    def __ror__(self, other):
        return other | dict(self)
    def __or__(self, other):
        return self.__class__(other, *self.maps)
    
    def __ror__(self, other):
        return self.__class__(*self.maps, other)

    There are problems with both variants, so I think it may be better to not add this operator to ChainMap.

    @brandtbucher
    Copy link
    Member Author

    I think we're only seriously considering the first variant (although implemented slightly differently, see my last two messages). And __ror__ would probably change, returning the type of self.

    What are the "problems" with it, exactly? We seem to be in agreement that the update behavior is reasonable, even for ChainMaps.

    @gvanrossum
    Copy link
    Member

    We already have somewhat different semantics of | for Counter, and hence I think it's fine to give it the most useful semantics for ChainMap given that class's special behavior. I think we've come up with the right solution there.

    Let's stop the debate and put up a PR.

    @brandtbucher
    Copy link
    Member Author

    Sounds good, I'll have these up soon.

    @gvanrossum
    Copy link
    Member

    New changeset 57c9d17 by Brandt Bucher in branch 'master':
    bpo-36144: Implement defaultdict union (GH-18729)
    57c9d17

    @gvanrossum
    Copy link
    Member

    Still waiting for ChainMap -- what else?

    @brandtbucher
    Copy link
    Member Author

    My brother will have a ChainMap PR up soon. I'm just finishing up MappingProxyType, myself. Probably both this weekend.

    Then I'll move on to OrderedDict, which looks like it could be tricky. I'll need to familiarize myself with the implementation better (unless there's somebody who is already familiar with it who wants to take over). It looks well-commented, though.

    I think we can pass on the http.cookies subclasses since there don't appear to be any experts/maintainers for that module.

    @gvanrossum
    Copy link
    Member

    New changeset 4663f66 by Brandt Bucher in branch 'master':
    bpo-36144: Update MappingProxyType with PEP-584's operators (bpo-18814)
    4663f66

    @brandtbucher
    Copy link
    Member Author

    bpo-39857 just reminded me that we should update os._Environ as well (the type of os.environ and os.environb).

    I have another first-timer who will probably want to take it.

    @vstinner
    Copy link
    Member

    Once this issue will be done, would you mind to update PEP-584? Currently, it only says "This PEP proposes adding merge (|) and update (|=) operators to the built-in dict class." It doesn't mention that other types like OrderedDict, MappingProxy or ChainMap are updated as well.

    @brandtbucher
    Copy link
    Member Author

    Yes, I can add a section explaining that after the PEP was accepted, we decided to add the operators to several non-dict mappings as well. I'm also going to add some explanation as to why Mapping/MutableMapping didn't grow them, too.

    @gvanrossum
    Copy link
    Member

    New changeset d648ef1 by Charles Burkland in branch 'master':
    bpo-36144: Update os.environ and os.environb for PEP-584 (bpo-18911)
    d648ef1

    @gvanrossum
    Copy link
    Member

    New changeset 6d674a1 by Brandt Bucher in branch 'master':
    bpo-36144: OrderedDict Union (PEP-584) (bpo-18967)
    6d674a1

    @brandtbucher
    Copy link
    Member Author

    Three other MutableMappings we might want to update:

    • shelve.Shelf
    • weakref.WeakKeyDictionary
    • weakref.WeakValueDictionary

    Shelf is up in the air, since it doesn't look like it defines a copy() equivalent... I also have no experience with it. Since it's a MutableMapping subclass, (not a dict subclass), we could in theory hold off on updating this until someone asks for it, without backward compatibility issues.

    I think the other two should be updated, though. I can coordinate PRs for them this week.

    @gvanrossum
    Copy link
    Member

    I definitely think we should leave Shelf alone, it's a toy class from a different era.

    It makes sense to update the weak dicts; hopefully the | and |= operators can be implemented in terms of other, more primitive operations, so we will have assurance that these classes' essential behavior is preserved.

    @gvanrossum
    Copy link
    Member

    New changeset f393b2c by Curtis Bucher in branch 'master':
    bpo-36144: Add PEP-584 operators to collections.ChainMap (bpo-18832)
    f393b2c

    @gvanrossum
    Copy link
    Member

    New changeset 25e580a by Curtis Bucher in branch 'master':
    bpo-36144: Add union operators to WeakKeyDictionary (bpo-19106)
    25e580a

    @gvanrossum
    Copy link
    Member

    New changeset 8f1ed21 by Curtis Bucher in branch 'master':
    bpo-36144: Add union operators to WeakValueDictionary584 (bpo-19127)
    8f1ed21

    @brandtbucher
    Copy link
    Member Author

    And... that's it! Big thanks to everybody who had a part in making this happen.

    @gvanrossum
    Copy link
    Member

    I'm guessing this can be closed?

    @gvanrossum
    Copy link
    Member

    I guess we should keep this open until Raymond Hettinger has given feedback on #18832 (where we have the option of changing to Brandt's proposal from #18832 (comment)).

    @justjais
    Copy link
    Mannequin

    justjais mannequin commented Jun 15, 2020

    First off, thanks for adding the feature, it's much appreciated.

    But it'd be great if you guys can enable list merge for the dict having list as its value, in current form I believe it's handling only "key: value" merge.

    for e.g.:
    >>> d1 = {'spam': [1, 2, 3]}
    >>> d2 = {'spam': [2, 3, 4]}
    >>> d1 | d2
    >>> {'spam': [1, 2, 3, 4]}

    @brandtbucher
    Copy link
    Member Author

    Similar behavior was considered and ultimately rejected by the PEP as being too specialized:

    https://www.python.org/dev/peps/pep-0584/#concatenate-values-in-a-list

    What you're asking for it subtly different, and even *more* specialized than that. I'd recommend just subclassing dict and overriding the operator, as the PEP suggests.

    @RapsUk
    Copy link
    Mannequin

    RapsUk mannequin commented Jul 7, 2020

    Taking the union of items() in Python 3 (viewitems() in Python 2.7) will also fail when values are unhashable objects (like lists, for example). Even if your values are hashable, since sets are semantically unordered, the behavior is undefined in regards to precedence. So don't do this:

    >>> c = dict(a.items() | b.items())
    This example demonstrates what happens when values are unhashable:
    
    >>> x = {'a': []}
    >>> y = {'b': []}
    >>> dict(x.items() | y.items())
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unhashable type: 'list'
    Here's an example where y should have precedence, but instead the value from x is retained due to the arbitrary order of sets:
    
    >>> x = {'a': 2}
    >>> y = {'a': 1}
    >>> dict(x.items() | y.items())
    {'a': 2}

    http://net-informations.com/python/ds/merge.htm

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants