Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reversible dict #77643

Closed
selik mannequin opened this issue May 11, 2018 · 28 comments
Closed

reversible dict #77643

selik mannequin opened this issue May 11, 2018 · 28 comments
Labels
3.8 only security fixes type-feature A feature request or enhancement

Comments

@selik
Copy link
Mannequin

selik mannequin commented May 11, 2018

BPO 33462
Nosy @tim-one, @rhettinger, @methane, @ericsnowcurrently, @serhiy-storchaka, @selik, @remilapeyre
PRs
  • bpo-33462: Add __reversed__ to dict and dict views #6827
  • bpo-38525: Fix segfault when using reverse iterators of empty dict literals #16847
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-11-06.00:39:55.185>
    created_at = <Date 2018-05-11.03:30:18.300>
    labels = ['type-feature', '3.8']
    title = 'reversible dict'
    updated_at = <Date 2019-10-19.15:00:56.369>
    user = 'https://github.com/selik'

    bugs.python.org fields:

    activity = <Date 2019-10-19.15:00:56.369>
    actor = 'pablogsal'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-11-06.00:39:55.185>
    closer = 'methane'
    components = []
    creation = <Date 2018-05-11.03:30:18.300>
    creator = 'selik'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 33462
    keywords = ['patch']
    message_count = 28.0
    messages = ['316386', '316436', '316437', '316789', '316922', '317422', '317547', '317548', '317549', '317551', '317555', '317559', '317561', '317562', '317564', '319088', '319146', '319177', '319194', '319199', '319246', '319257', '319433', '319487', '319505', '320171', '322101', '329323']
    nosy_count = 7.0
    nosy_names = ['tim.peters', 'rhettinger', 'methane', 'eric.snow', 'serhiy.storchaka', 'selik', 'remi.lapeyre']
    pr_nums = ['6827', '16847']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue33462'
    versions = ['Python 3.8']

    @selik
    Copy link
    Mannequin Author

    selik mannequin commented May 11, 2018

    Now that dicts are tracking insertion order, they can be made reversible via the built-in reversed, just like OrderedDict.

    @selik selik mannequin added the type-feature A feature request or enhancement label May 11, 2018
    @rhettinger rhettinger added the 3.8 only security fixes label May 11, 2018
    @remilapeyre
    Copy link
    Mannequin

    remilapeyre mannequin commented May 12, 2018

    Hi, I'm taking a look this issue, it look like a new type PyDictRevIterKey_Type needs to be defined with its associated methods. I will try to implement the modifications ; this is the first time i'm taking a dive in Python's internals so I can't guarantee results on this but will try nonetheless.

    @selik
    Copy link
    Mannequin Author

    selik mannequin commented May 12, 2018

    @serhiy-storchaka
    Copy link
    Member

    If implement __reduce__ in dict, it should be implemented in dict views too.

    @remilapeyre
    Copy link
    Mannequin

    remilapeyre mannequin commented May 17, 2018

    Hi Serhiy Storchaka, I will update the PR to implement this functionality in the views too

    @remilapeyre
    Copy link
    Mannequin

    remilapeyre mannequin commented May 23, 2018

    I updated the pull request, now reversed work on the dict and dict views:

    ➜  cpython git:(master) ./python.exe 
    Python 3.8.0a0 (heads/master-dirty:128576b88c, May 23 2018, 16:33:46) 
    [Clang 9.0.0 (clang-900.0.39.2)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> d = dict(a=1, b=2)
    >>> list(reversed(d))
    ['b', 'a']
    >>> list(reversed(d.keys()))
    ['b', 'a']
    >>> list(reversed(d.values()))
    [2, 1]
    >>> list(reversed(d.items()))
    [('b', 2), ('a', 1)]

    reversed on dict and dict views is not symmetric and applying twice will raise TypeError which is also the behavior on list. I'm not sure why this behavior has been chosen but it's coherent.

    @methane
    Copy link
    Member

    methane commented May 24, 2018

    I'm not sure it's worth enough for adding more builtin classes.

    Adding builtin class means Python interpreter core makes more fat, slow to start, and hard to maintain.

    @remilapeyre
    Copy link
    Mannequin

    remilapeyre mannequin commented May 24, 2018

    This change does add built-in types but I think it's a reasonable expectation as a python user to be able to do reversed(dict(a=1, b=20) since the order is know defined in the specifications. It seems inconsistent to have an order on dict, views and not have reversed work on them.

    It does increase fat in the interpreter, but the change is really simple and does not really increase its complexity or make it really harder to maintain.

    @methane
    Copy link
    Member

    methane commented May 24, 2018

    I think it's a reasonable expectation as a python user to be able to do reversed(dict(a=1, b=20) since the order is know defined in the specifications.

    I agree about "reasonable expectation". But I'm interested in is it really useful in real world?

    It seems inconsistent to have an order on dict, views and not have reversed work on them.

    "Have an order" doesn't mean "reversible". For example, single linked list is ordered, but not reversible.

    While CPython implementation can provide efficient __reverse__, adding __reverse__ means **all** Python implementation is expected to provide it.
    For example, some Python implementation may be able to implement dict with hashmap + single linked list. If __reverse__ is added, it's not possible anymore.

    "Preserve insertion order" is very useful for many people. So it's guaranteed.
    Then how useful "reversible" in real world, for many people?

    @remilapeyre
    Copy link
    Mannequin

    remilapeyre mannequin commented May 24, 2018

    > I think it's a reasonable expectation as a python user to be able to do reversed(dict(a=1, b=20) since the order is know defined in the specifications.

    I agree about "reasonable expectation". But I'm interested in is it really useful in real world?

    I do agree it's certainly used than the conservation of order but it's not useless either. For example, it could help to get the latest section defined in a YAML or INI file once parsed.

    > It seems inconsistent to have an order on dict, views and not have reversed work on them.

    "Have an order" doesn't mean "reversible". For example, single linked list is ordered, but not reversible.

    While CPython implementation can provide efficient __reverse__, adding __reverse__ means **all** Python implementation is expected to provide it.
    For example, some Python implementation may be able to implement dict with hashmap + single linked list. If __reverse__ is added, it's not possible anymore.

    Indeed they would have to use a double-linked-list here.

    "Preserve insertion order" is very useful for many people. So it's guaranteed.
    Then how useful "reversible" in real world, for many people?

    While this is true, the same argument could be said about the dict views. Many many people don't know about them but they are still an interesting feature that has its place in the standard library.

    It definitely won't be the most used feature in Python nor a killer feature but it seemed important enough to be included in OrderedDict (https://github.com/python/cpython/blob/master/Lib/collections/__init__.py#L63) since 3.5 and a C implementation of OrderedDict has been added in the same release so it seems to have mattered at the time.

    Having this feature in the built-in dicts could actually help to simplify the implementation of the collections module in this case.

    @methane
    Copy link
    Member

    methane commented May 24, 2018

    Rémi Lapeyre <remi.lapeyre@henki.fr> added the comment:

    >> I think it's a reasonable expectation as a python user to be able to
    do reversed(dict(a=1, b=20) since the order is know defined in the
    specifications.

    > I agree about "reasonable expectation". But I'm interested in is it
    really useful in real world?

    I do agree it's certainly used than the conservation of order but it's
    not useless either. For example, it could help to get the latest section
    defined in a YAML or INI file once parsed.

    For rare cases, OrderedDict can be used.

    >> It seems inconsistent to have an order on dict, views and not have
    reversed work on them.

    > "Have an order" doesn't mean "reversible". For example, single linked
    list is ordered, but not reversible.

    > While CPython implementation can provide efficient __reverse__, adding
    __reverse__ means **all** Python implementation is expected to provide it.
    > For example, some Python implementation may be able to implement dict
    with hashmap + single linked list. If __reverse__ is added, it's not
    possible anymore.

    Indeed they would have to use a double-linked-list here.

    Doubly linked list is memory inefficient than singly linked list.
    And memory efficiency of builtin type is very important, than types in
    library like OrderedDict.

    > "Preserve insertion order" is very useful for many people. So it's
    guaranteed.
    > Then how useful "reversible" in real world, for many people?

    While this is true, the same argument could be said about the dict views.
    Many many people don't know about them but they are still an interesting
    feature that has its place in the standard library.

    It's different problem and out of scope of this discussion.

    It definitely won't be the most used feature in Python nor a killer
    feature but it seemed important enough to be included in OrderedDict (
    https://github.com/python/cpython/blob/master/Lib/collections/__init__.py#L63)
    since 3.5 and a C implementation of OrderedDict has been added in the same
    release so it seems to have mattered at the time.

    OrderedDict is more about "preserving insertion order". It provides O(1)
    popitem(last=False) and move_to_end(last=False).
    So "Reversible is essential for OrderedDict" is not enough reason for
    "Reversible is essential for dict".

    Having this feature in the built-in dicts could actually help to simplify
    the implementation of the collections module in this case.

    Would you elaborate more?

    @serhiy-storchaka
    Copy link
    Member

    I concur with Inada. It is a "nice to have" feature, but it has too high cost. PR 6827 adds around 300 lines of new code in dictobject.c. Too large for rarely used feature. And dictobject.c is critically important, adding new code here can make the compiler generating less optimal code for other parts. That is besides increasing the maintenance cost.

    I may have a case for iterating dict in the reversed order (see bpo-33331), but adding three new types requires adding too much boilerplate code. In the current form PR 6827 can't be merged, and I don't see a way how to make the code much shorter without impact on the current code. :-(

    @remilapeyre
    Copy link
    Mannequin

    remilapeyre mannequin commented May 24, 2018

    Since there seems to be a consensus about this change being too much, should we go back to the first proposal to implement dict.__reversed__ only and not reversed for the views, this would greatly reduce the bload or dump the PR as a whole ?

    @serhiy-storchaka
    Copy link
    Member

    This would make the feature incomplete and will add a pressure of implementing support for dict views. And even one new type adds much bloat.

    @methane
    Copy link
    Member

    methane commented May 24, 2018

    Since there seems to be a consensus about this change being too much, should we go back to the first proposal to implement dict.__reversed__ only and not reversed for the views, this would greatly reduce the bload or dump the PR as a whole ?

    Since API of builtin type is part of language (not only CPython), I want agreement on python-dev before adding it.

    @selik
    Copy link
    Mannequin Author

    selik mannequin commented Jun 8, 2018

    It looks like there's general agreement on python-dev that this is appropriate for v3.8 (not v3.7).

    Guido van Rossum and Ramsey D'silva gave a +1. Raymond Hettinger noted some use cases. INADA Naoki raised a point about waiting for other implementations, which Guido agreed with, but after some research came around to OK'ing this for v3.8.

    Responding to Serhiy Storchaka's worry about adding new code possibly upsetting the compiler optimizations: The implementation seems straightforward and unlikely to confuse the compiler. Is there evidence that similar code has previously made CPython slower? If the compiler's black magic is too difficult to understand, it may be equally plausible that this code causes an improvement to the optimization of unrelated code.

    @methane
    Copy link
    Member

    methane commented Jun 9, 2018

    Would you try python_startup and python_startup_nosite benchmark with:

    • master branch
    • All reverse iteraters
    • All reverse iteraters + register them in _collections_abc module
      --
      INADA Naoki <songofacandy@gmail.com>

    @rhettinger
    Copy link
    Contributor

    Would you try python_startup and python_startup_nosite benchmark

    That seems entirely unnecessary. If adding __reversed__ has any effect on the rest of the build, that is pure random noise that can be ignored. It is normal to get small improvements or detriments that vary from compiler to compiler, os to os, etc. Our normal practice is to ignore those and not optimize for random or hard-to-control compiler idiosyncrasies which appear and disappear as we add methods or as compilers get updated.

    @methane
    Copy link
    Member

    methane commented Jun 10, 2018

    If adding __reversed__ has any effect on the rest of the build, that is pure random noise that can be ignored.

    Although initialization cost of each one type is small, time for _Py_Ready() is not negligible. And ABC.register() too. Import time for _collections_abc is not negligible too.

    I agree that cost of adding three builtin types and register them to ABC will be negligible or small enough.
    But I think it's good to know before merge.

    @methane
    Copy link
    Member

    methane commented Jun 10, 2018

    I confirmed the cost is negligible.

    python_startup_no_site
    ======================

    Mean +- std dev: [master] 7.31 ms +- 0.39 ms -> [reverse] 7.41 ms +- 0.44 ms: 1.01x slower (+1%)
    Mean +- std dev: [master] 7.31 ms +- 0.39 ms -> [register] 7.20 ms +- 0.28 ms: 1.01x faster (-1%)
    Benchmark hidden because not significant (1): python_startup

    "register" is "reverse" + following patch:

    diff --git a/Lib/_collections_abc.py b/Lib/_collections_abc.py
    index dbe30dff1f..28a7e2586c 100644
    --- a/Lib/_collections_abc.py
    +++ b/Lib/_collections_abc.py
    @@ -280,6 +280,9 @@ Iterator.register(bytearray_iterator)
     Iterator.register(dict_keyiterator)
     Iterator.register(dict_valueiterator)
     Iterator.register(dict_itemiterator)
    +Iterator.register(type(iter(reversed({}.keys()))))
    +Iterator.register(type(iter(reversed({}.values()))))
    +Iterator.register(type(iter(reversed({}.items()))))
     Iterator.register(list_iterator)
     Iterator.register(list_reverseiterator)
     Iterator.register(range_iterator)
    @@ -306,6 +309,12 @@ class Reversible(Iterable):
             return NotImplemented

    +Reversible.register(dict)
    +Reversible.register(type({}.keys()))
    +Reversible.register(type({}.values()))
    +Reversible.register(type({}.items()))
    +
    +
    class Generator(Iterator):

         __slots__ = ()

    @remilapeyre
    Copy link
    Mannequin

    remilapeyre mannequin commented Jun 10, 2018

    Hi INADA thanks for the benchmark, I did both of them too and got the same results (though I had to apply python/pyperformance#41 to get the performance module working).

    Should I apply your patch in PR 6827?

    @methane
    Copy link
    Member

    methane commented Jun 11, 2018

    My patch was quick and dirty.
    Please read _collections_abc module and follow the style. (you need to use
    temporary variables.)
    And new reviter types can be registered to Iterator ABC too.

    INADA Naoki <songofacandy@gmail.com>

    @remilapeyre
    Copy link
    Mannequin

    remilapeyre mannequin commented Jun 13, 2018

    Thanks INADA, I made the necessary changes to _collections_abc. Is there anything that I should change?

    @ericsnowcurrently
    Copy link
    Member

    Would it be possible to re-use the __reverse__() support that exists for OrderedDict? Doing so could help avoid adding a bunch of duplicated code.

    @remilapeyre
    Copy link
    Mannequin

    remilapeyre mannequin commented Jun 14, 2018

    Hi, I took a look at the code of OrderedDict it's using the double linked-list to iterate through the items using _odictnode_PREV and _odictnode_NEXT.

    Since ordereddict needs to support move_to_end that will change the iterating order while dict does not, is it possible to share the code for __reversed__? As I understand it, the current code for OrderedDict and dict are not sharing code for the implementation of __iter__ and I'm not sure how it would be possible to do so.

    @remilapeyre
    Copy link
    Mannequin

    remilapeyre mannequin commented Jun 21, 2018

    Hi, is everything good with attached PR or should I refactor it further?

    @remilapeyre
    Copy link
    Mannequin

    remilapeyre mannequin commented Jul 21, 2018

    Hi Serhiy and Inada, is there a reason not to move forward with this patch?

    @methane
    Copy link
    Member

    methane commented Nov 6, 2018

    New changeset 6531bf6 by INADA Naoki (Rémi Lapeyre) in branch 'master':
    bpo-33462: Add __reversed__ to dict and dict views (GH-6827)
    6531bf6

    @methane methane closed this as completed Nov 6, 2018
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants