classification
Title: reversible dict
Type: enhancement Stage: patch review
Components: Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eric.snow, inada.naoki, remi.lapeyre, rhettinger, selik, serhiy.storchaka, tim.peters
Priority: normal Keywords: patch

Created on 2018-05-11 03:30 by selik, last changed 2018-07-21 14:49 by remi.lapeyre.

Pull Requests
URL Status Linked Edit
PR 6827 open python-dev, 2018-05-14 19:50
Messages (27)
msg316386 - (view) Author: Michael Selik (selik) * Date: 2018-05-11 03:30
Now that dicts are tracking insertion order, they can be made reversible via the built-in reversed, just like OrderedDict.
msg316436 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2018-05-12 20:47
Hi, I'm taking a look this issue, it look like a new type `PyDictRevIterKey_Type` needs to be defined with its associated methods. I will try to implement the modifications ; this is the first time i'm taking a dive in Python's internals so I can't guarantee results on this but will try nonetheless.
msg316437 - (view) Author: Michael Selik (selik) * Date: 2018-05-12 20:59
Right, a blend of the code from dictiterobject (https://github.com/python/cpython/blob/master/Objects/dictobject.c#L3309) and listreviterobject (https://github.com/python/cpython/blob/master/Objects/listobject.c#L3128).
msg316789 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-05-16 08:53
If implement __reduce__ in dict, it should be implemented in dict views too.
msg316922 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2018-05-17 12:34
Hi Serhiy Storchaka, I will update the PR to implement this functionality in the views too
msg317422 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2018-05-23 16:18
I updated the pull request, now reversed work on the dict and dict views:

➜  cpython git:(master) ./python.exe 
Python 3.8.0a0 (heads/master-dirty:128576b88c, May 23 2018, 16:33:46) 
[Clang 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> d = dict(a=1, b=2)
>>> list(reversed(d))
['b', 'a']
>>> list(reversed(d.keys()))
['b', 'a']
>>> list(reversed(d.values()))
[2, 1]
>>> list(reversed(d.items()))
[('b', 2), ('a', 1)]

reversed on dict and dict views is not symmetric and applying twice will  raise TypeError which is also the behavior on list. I'm not sure why this behavior has been chosen but it's coherent.
msg317547 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2018-05-24 09:15
I'm not sure it's worth enough for adding more builtin classes.

Adding builtin class means Python interpreter core makes more fat, slow to start, and hard to maintain.
msg317548 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2018-05-24 10:03
This change does add built-in types but I think it's a reasonable expectation as a python user to be able to do reversed(dict(a=1, b=20) since the order is know defined in the specifications. It seems inconsistent to have an order on dict, views and not have reversed work on them.

It does increase fat in the interpreter, but the change is really simple and does not really increase its complexity or make it really harder to maintain.
msg317549 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2018-05-24 10:17
> I think it's a reasonable expectation as a python user to be able to do reversed(dict(a=1, b=20) since the order is know defined in the specifications.

I agree about "reasonable expectation".  But I'm interested in is it really useful in real world?

> It seems inconsistent to have an order on dict, views and not have reversed work on them.

"Have an order" doesn't mean "reversible".  For example, single linked list is ordered, but not reversible.

While CPython implementation can provide efficient __reverse__, adding __reverse__ means **all** Python implementation is expected to provide it.
For example, some Python implementation may be able to implement dict with hashmap + single linked list.  If __reverse__ is added, it's not possible anymore.

"Preserve insertion order" is very useful for many people.  So it's guaranteed.
Then how useful "reversible" in real world, for many people?
msg317551 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2018-05-24 10:54
>> I think it's a reasonable expectation as a python user to be able to do reversed(dict(a=1, b=20) since the order is know defined in the specifications.

> I agree about "reasonable expectation".  But I'm interested in is it really useful in real world?

I do agree it's certainly used than the conservation of order but it's not useless either. For example, it could help to get the latest section defined in a YAML or INI file once parsed. 

>> It seems inconsistent to have an order on dict, views and not have reversed work on them.

> "Have an order" doesn't mean "reversible".  For example, single linked list is ordered, but not reversible.

> While CPython implementation can provide efficient __reverse__, adding __reverse__ means **all** Python implementation is expected to provide it.
> For example, some Python implementation may be able to implement dict with hashmap + single linked list.  If __reverse__ is added, it's not possible anymore.

Indeed they would have to use a double-linked-list here.

> "Preserve insertion order" is very useful for many people.  So it's guaranteed.
> Then how useful "reversible" in real world, for many people?

While this is true, the same argument could be said about the dict views. Many many people don't know about them but they are still an interesting feature that has its place in the standard library.

It definitely won't be the most used feature in Python nor a killer feature but it seemed important enough to be included in OrderedDict (https://github.com/python/cpython/blob/master/Lib/collections/__init__.py#L63) since 3.5 and a C implementation of OrderedDict has been added in the same release so it seems to have mattered at the time.

Having this feature in the built-in dicts could actually help to simplify the implementation of the collections module in this case.
msg317555 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2018-05-24 11:06
> Rémi Lapeyre <remi.lapeyre@henki.fr> added the comment:

> >> I think it's a reasonable expectation as a python user to be able to
do reversed(dict(a=1, b=20) since the order is know defined in the
specifications.

> > I agree about "reasonable expectation".  But I'm interested in is it
really useful in real world?

> I do agree it's certainly used than the conservation of order but it's
not useless either. For example, it could help to get the latest section
defined in a YAML or INI file once parsed.

For rare cases, OrderedDict can be used.

> >> It seems inconsistent to have an order on dict, views and not have
reversed work on them.

> > "Have an order" doesn't mean "reversible".  For example, single linked
list is ordered, but not reversible.

> > While CPython implementation can provide efficient __reverse__, adding
__reverse__ means **all** Python implementation is expected to provide it.
> > For example, some Python implementation may be able to implement dict
with hashmap + single linked list.  If __reverse__ is added, it's not
possible anymore.

> Indeed they would have to use a double-linked-list here.

Doubly linked list is memory inefficient than singly linked list.
And memory efficiency of builtin type is very important, than types in
library like OrderedDict.

> > "Preserve insertion order" is very useful for many people.  So it's
guaranteed.
> > Then how useful "reversible" in real world, for many people?

> While this is true, the same argument could be said about the dict views.
Many many people don't know about them but they are still an interesting
feature that has its place in the standard library.

It's different problem and out of scope of this discussion.

> It definitely won't be the most used feature in Python nor a killer
feature but it seemed important enough to be included in OrderedDict (
https://github.com/python/cpython/blob/master/Lib/collections/__init__.py#L63)
since 3.5 and a C implementation of OrderedDict has been added in the same
release so it seems to have mattered at the time.

OrderedDict is more about "preserving insertion order".  It provides O(1)
`popitem(last=False)` and `move_to_end(last=False)`.
So "Reversible is essential for OrderedDict" is not enough reason for
"Reversible is essential for dict".

> Having this feature in the built-in dicts could actually help to simplify
the implementation of the collections module in this case.

Would you elaborate more?
msg317559 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-05-24 11:42
I concur with Inada. It is a "nice to have" feature, but it has too high cost. PR 6827 adds around 300 lines of new code in dictobject.c. Too large for rarely used feature. And dictobject.c is critically important, adding new code here can make the compiler generating less optimal code for other parts. That is besides increasing the maintenance cost.

I may have a case for iterating dict in the reversed order (see issue33331), but adding three new types requires adding too much boilerplate code. In the current form PR 6827 can't be merged, and I don't see a way how to make the code much shorter without impact on the current code. :-(
msg317561 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2018-05-24 11:46
Since there seems to be a consensus about this change being too much, should we go back to the first proposal to implement dict.__reversed__ only and not reversed for the views, this would greatly reduce the bload or dump the PR as a whole ?
msg317562 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-05-24 11:49
This would make the feature incomplete and will add a pressure of implementing support for dict views. And even one new type adds much bloat.
msg317564 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2018-05-24 11:56
> Since there seems to be a consensus about this change being too much, should we go back to the first proposal to implement dict.__reversed__ only and not reversed for the views, this would greatly reduce the bload or dump the PR as a whole ?

Since API of builtin type is part of language (not only CPython), I want agreement on python-dev before adding it.
msg319088 - (view) Author: Michael Selik (selik) * Date: 2018-06-08 17:54
It looks like there's general agreement on python-dev that this is appropriate for v3.8 (not v3.7).

Guido van Rossum and Ramsey D'silva gave a +1. Raymond Hettinger noted some use cases. INADA Naoki raised a point about waiting for other implementations, which Guido agreed with, but after some research came around to OK'ing this for v3.8.

Responding to Serhiy Storchaka's worry about adding new code possibly upsetting the compiler optimizations: The implementation seems straightforward and unlikely to confuse the compiler. Is there evidence that similar code has previously made CPython slower? If the compiler's black magic is too difficult to understand, it may be equally plausible that this code causes an improvement to the optimization of unrelated code.
msg319146 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2018-06-09 12:32
Would you try python_startup and python_startup_nosite benchmark with:

* master branch
* All reverse iteraters
* All reverse iteraters + register them in _collections_abc module
-- 
INADA Naoki  <songofacandy@gmail.com>
msg319177 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-06-09 20:00
> Would you try python_startup and python_startup_nosite benchmark

That seems entirely unnecessary.  If adding __reversed__ has any effect on the rest of the build, that is pure random noise that can be ignored. It is normal to get small improvements or detriments that vary from compiler to compiler, os to os, etc.  Our normal practice is to ignore those and not optimize for random or hard-to-control compiler idiosyncrasies which appear and disappear as we add methods or as compilers get updated.
msg319194 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2018-06-10 02:01
> If adding __reversed__ has any effect on the rest of the build, that is pure random noise that can be ignored.

Although initialization cost of each one type is small, time for _Py_Ready() is not negligible.  And ABC.register() too.  Import time for _collections_abc is not negligible too.

I agree that cost of adding three builtin types and register them to ABC will be negligible or small enough.
But I think it's good to know before merge.
msg319199 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2018-06-10 06:14
I confirmed the cost is negligible.

python_startup_no_site
======================

Mean +- std dev: [master] 7.31 ms +- 0.39 ms -> [reverse] 7.41 ms +- 0.44 ms: 1.01x slower (+1%)
Mean +- std dev: [master] 7.31 ms +- 0.39 ms -> [register] 7.20 ms +- 0.28 ms: 1.01x faster (-1%)
Benchmark hidden because not significant (1): python_startup

"register" is "reverse" + following patch:

diff --git a/Lib/_collections_abc.py b/Lib/_collections_abc.py
index dbe30dff1f..28a7e2586c 100644
--- a/Lib/_collections_abc.py
+++ b/Lib/_collections_abc.py
@@ -280,6 +280,9 @@ Iterator.register(bytearray_iterator)
 Iterator.register(dict_keyiterator)
 Iterator.register(dict_valueiterator)
 Iterator.register(dict_itemiterator)
+Iterator.register(type(iter(reversed({}.keys()))))
+Iterator.register(type(iter(reversed({}.values()))))
+Iterator.register(type(iter(reversed({}.items()))))
 Iterator.register(list_iterator)
 Iterator.register(list_reverseiterator)
 Iterator.register(range_iterator)
@@ -306,6 +309,12 @@ class Reversible(Iterable):
         return NotImplemented


+Reversible.register(dict)
+Reversible.register(type({}.keys()))
+Reversible.register(type({}.values()))
+Reversible.register(type({}.items()))
+
+
 class Generator(Iterator):

     __slots__ = ()
msg319246 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2018-06-10 21:04
Hi INADA thanks for the benchmark, I did both of them too and got the same results (though I had to apply https://github.com/python/performance/pull/41 to get the performance module working).

Should I apply your patch in PR 6827?
msg319257 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2018-06-11 00:17
My patch was quick and dirty.
Please read _collections_abc module and follow the style. (you need to use
temporary variables.)
And new reviter types can be registered to Iterator ABC too.
-- 
INADA Naoki  <songofacandy@gmail.com>
msg319433 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2018-06-13 06:38
Thanks INADA, I made the necessary changes to _collections_abc. Is there anything that I should change?
msg319487 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2018-06-13 22:49
Would it be possible to re-use the __reverse__() support that exists for OrderedDict?  Doing so could help avoid adding a bunch of duplicated code.
msg319505 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2018-06-14 10:15
Hi, I took a look at the code of OrderedDict it's using the double linked-list to iterate through the items using _odictnode_PREV and _odictnode_NEXT.

Since ordereddict needs to support move_to_end that will change the iterating order while dict does not, is it possible to share the code for __reversed__? As I understand it, the current code for OrderedDict and dict are not sharing code for the implementation of __iter__ and I'm not sure how it would be possible to do so.
msg320171 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2018-06-21 12:23
Hi, is everything good with attached PR or should I refactor it further?
msg322101 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2018-07-21 14:49
Hi Serhiy and Inada, is there a reason not to move forward with this patch?
History
Date User Action Args
2018-07-21 14:49:40remi.lapeyresetmessages: + msg322101
2018-06-21 12:23:54remi.lapeyresetmessages: + msg320171
2018-06-14 10:15:40remi.lapeyresetmessages: + msg319505
2018-06-13 22:49:23eric.snowsetnosy: + eric.snow
messages: + msg319487
2018-06-13 06:38:17remi.lapeyresetmessages: + msg319433
2018-06-11 00:17:06inada.naokisetmessages: + msg319257
2018-06-10 21:04:49remi.lapeyresetmessages: + msg319246
2018-06-10 06:14:15inada.naokisetmessages: + msg319199
2018-06-10 02:01:31inada.naokisetmessages: + msg319194
2018-06-09 20:00:54rhettingersetnosy: + tim.peters
messages: + msg319177
2018-06-09 12:32:57inada.naokisetmessages: + msg319146
2018-06-08 17:54:33seliksetmessages: + msg319088
2018-05-24 11:56:56inada.naokisetmessages: + msg317564
2018-05-24 11:49:14serhiy.storchakasetmessages: + msg317562
2018-05-24 11:46:26remi.lapeyresetmessages: + msg317561
2018-05-24 11:42:30serhiy.storchakasetmessages: + msg317559
2018-05-24 11:06:28inada.naokisetmessages: + msg317555
2018-05-24 10:54:16remi.lapeyresetmessages: + msg317551
2018-05-24 10:17:51inada.naokisetmessages: + msg317549
2018-05-24 10:03:16remi.lapeyresetmessages: + msg317548
2018-05-24 09:15:00inada.naokisetmessages: + msg317547
2018-05-23 16:18:18remi.lapeyresetmessages: + msg317422
2018-05-17 12:34:42remi.lapeyresetmessages: + msg316922
2018-05-16 08:53:32serhiy.storchakasetnosy: + serhiy.storchaka, inada.naoki, rhettinger
messages: + msg316789
2018-05-14 19:50:51python-devsetkeywords: + patch
stage: patch review
pull_requests: + pull_request6511
2018-05-12 20:59:49seliksetmessages: + msg316437
2018-05-12 20:47:38remi.lapeyresetnosy: + remi.lapeyre
messages: + msg316436
2018-05-11 04:11:39rhettingersetversions: + Python 3.8
2018-05-11 03:30:18selikcreate