Issue19414
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2013-10-27 03:27 by nikratio, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
issue19414.diff | nikratio, 2014-01-08 05:05 | |||
issue19414_r2.diff | nikratio, 2014-04-13 19:18 | review |
Messages (28) | |||
---|---|---|---|
msg201408 - (view) | Author: Nikolaus Rath (nikratio) * | Date: 2013-10-27 03:27 | |
The documentation says the following about modifying a dict while iterating through its view: | Iterating views while adding or deleting entries in the dictionary may | raise a RuntimeError or fail to iterate over all entries. (http://docs.python.org/3/library/stdtypes.html#dict-views) The OrderedDict documentation doesn't have anything to say on the subject. In practice, its implementation actually mostly correponds to naive expectations: >>> from collections import OrderedDict >>> d = OrderedDict() >>> for i in range(5): ... d[i] = i ... >>> i = iter(d.values()) >>> next(i) 0 >>> del d[0] >>> next(i) 1 >>> del d[2] >>> next(i) 3 >>> d.move_to_end(1) >>> next(i) 4 >>> next(i) 1 >>> etc. However, there is one case that results in a rather confusing error: >>> a = OrderedDict() >>> a[0] = 0 >>> a[1] = 1 >>> a[2] = 2 >>> i = iter(a.values()) >>> next(i) 0 >>> del a[0] >>> del a[1] >>> next(i) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.3/collections/abc.py", line 495, in __iter__ yield self._mapping[key] KeyError: 1 What happens here is that a[0] is returned from the linked list, but it still contains links to a[1]. When a[1] is deleted, the links of its predecessor and successor are updated, but a[0] still points to a[1]. The OrderedList then hands a non-existing key to the values() implementation in collections.abc. The result is not only mightily confusing, but it is also not covered by the documentation (KeyError isn't a RuntimeError). I would very much like to see this fixed, but I'm not sure if there's a good way to do that. I see the following options: (a) When deleting an element from an OrderedList, update the pointers in the corresponding linked list element to the end of the list. Then removing the currently "active" element during the iteration would automatically end the iteration. For that, we'd just have to add two lines to __delitem__: def __delitem__(self, key, dict_delitem=dict.__delitem__): dict_delitem(self, key) link = self.__map.pop(key) link_prev = link.prev link_next = link.next link_prev.next = link_next link_next.prev = link_prev link.prev = root # new link.next = root # new The new behavior is explicitly covered by the documentation (changing the dict may result in incomplete iteration), but it's a change to what happened before. (b) When iterating through an OrderedDict, check that the next element is actually still in the hash, i.e. change def __iter__(self): root = self.__root curr = root.next while curr is not root: yield curr.key curr = curr.next to def __iter__(self): root = self.__root curr = root.next while curr is not root and curr.key in self: yield curr.key curr = curr.next that would terminate the iteration only in the special case, but incur an extra dict lookup at every iteration. Alternatively, one could try very hard to not stop the iteration: while curr is not root: yield curr.key while curr is not root: curr = curr.next if curr.key in self: break (c) Catch the KeyError in values(), and re-raise the proper exception in class ValuesView: def __iter__(self): for key in self._mapping: try: yield self._mapping[key] except KeyError: raise RuntimeError("underlying Mapping instance modified during iteration") I consider this a bit ugly, because it may mask other problems in a custom Mapping implementation (that may raise a KeyError because of a bug in the Mapping implementation itself). (d) Extend the OrderedDict documentation to explicity document this case. This has the drawback that it would probably be nicer to just have the behavior be consistent and correspond to intuitive expectations. Would any of those fixes be acceptable? Or is there an even better option? |
|||
msg201409 - (view) | Author: Nikolaus Rath (nikratio) * | Date: 2013-10-27 03:31 | |
After thinking about this a bit more, I think this is actually a true bug in OrderedDict(), and only option (a) or be (b) really fix it. Rationale: I would expect that for any OrderedDict d, and any function modify_dict(d), the following assertion holds: for key in d: assert key in d modify_dict(d) ..but this actually breaks for def modify_dict(d): del d[0] del d[1] |
|||
msg201502 - (view) | Author: Armin Rigo (arigo) * ![]() |
Date: 2013-10-27 21:09 | |
Another option for the "try harder to raise RuntimeError" category (which I tend to like, because otherwise people are bound to write programs that rely on undocumented details): In __delitem__() set link.next = None. In the various iterators check explicitly if curr.next is None, and raise RuntimeError if it is. |
|||
msg201505 - (view) | Author: Nikolaus Rath (nikratio) * | Date: 2013-10-27 21:39 | |
Being able to modify an OrderedDict while iterating through it is pretty useful though. What about documenting that modifying an OrderedDict is allowed, but may cause iteration to skip or repeat items (but do not allow it to raise RuntimeError)? As far as I can tell, the current implementation will already guarantee that (as soon as this issue is fixed). |
|||
msg201506 - (view) | Author: Armin Rigo (arigo) * ![]() |
Date: 2013-10-27 21:58 | |
Modifying an ordered dict while iterating over it can be officially classified as an ok think to do, but then the precise behavior must described in the documentation in some details, with of course matching implementation(s) and tests. I mean by that that the behavior should not be "it might miss some elements", but rather something precise like: consider the ordered dict as simply a list of its elements, and consider __delitem__ as replacing an element with NULL. Then forward or backward iteration works like forward or backward iteration over this list, with the additional rule that NULL elements are skipped over. It would imho be a good possible solution, but it obviously requires more effort. |
|||
msg201511 - (view) | Author: Nikolaus Rath (nikratio) * | Date: 2013-10-28 05:05 | |
I'd be happy to provide a more extensive patch along the lines Armin suggested if that is considered a good idea (not sure who has to make that decision). |
|||
msg201513 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2013-10-28 05:50 | |
I'm inclined to document that "iterating views while adding or deleting entries an ordered dictionary is an undefined behavior". |
|||
msg201517 - (view) | Author: Armin Rigo (arigo) * ![]() |
Date: 2013-10-28 09:06 | |
Hi Raymond! Yes, I had the same reaction at first, but then it seemed to be possible to implement a reasonably good behavior with almost no performance hit. Plainly undefined behaviors are a mess for other implementations because in half of the big projects people don't realize they depend on it at some place --- particularly if the behavior is "mostly sane", as it seems to be right now for OrderedDict. For example, I believe the following code to always work: for key in ordered_dict: if some_condition: del ordered_dict[key] whereas it always raise RuntimeError with regular dicts, which are both ok choices. But silently not working would be much worse. |
|||
msg201739 - (view) | Author: Ethan Furman (ethan.furman) * ![]() |
Date: 2013-10-30 14:28 | |
Do we currently have any data structures in Python, either built-in or in the stdlib, that aren't documented as raising RuntimeError if the size changes during iteration? list, dict, set, and defaultdict all behave this way. If not, I think OrderedDict should behave this way as well. |
|||
msg201742 - (view) | Author: Armin Rigo (arigo) * ![]() |
Date: 2013-10-30 16:05 | |
'list' doesn't, precisely. |
|||
msg201743 - (view) | Author: Ethan Furman (ethan.furman) * ![]() |
Date: 2013-10-30 16:24 | |
Ah, right you are: list just acts wierd. ;) So the question then becomes is OrderedDict more like a list or more like a dict? It seems to me that OrderedDict is more like a dict. |
|||
msg201745 - (view) | Author: Nikolaus Rath (nikratio) * | Date: 2013-10-30 16:32 | |
I agree that OrderedDict is more a dict than a list, but it is not clear to me why this means that it cannot extend a dict's functionality in that respect. OrderedDict already adds functionality to dict (preserving the order), so why shouldn't it also allow changes during iteration? I think these two things actually come together quite naturally, since it is the existence of an ordering that makes the behavior under changes during iteration well defined. Is there really a danger that people will get confused because a previously undefined operation now becomes officially supported with a defined meaning? |
|||
msg201751 - (view) | Author: Ethan Furman (ethan.furman) * ![]() |
Date: 2013-10-30 17:15 | |
The further from dict it goes, the more there is to remember. Considering the work around is so simple, I just don't think it's worth it: for key in list(ordered_dict): if some_condition: del ordered_dict[key] A simple list around the dict and we're good to go; and this trick works with dicts, defaultdicts, sets, lists (when you don't want the skipping behavior), etc. |
|||
msg201757 - (view) | Author: Nikolaus Rath (nikratio) * | Date: 2013-10-30 18:41 | |
The workaround is trivial, but there is no technical necessity for it, and it involves copying the entire dict into a list purely for.. what exactly? I guess I do not understand the drawback of allowing changes. What is wrong with for key in ordered_dict: if some_condition: del ordered_dict[key] to be working? Is it really just the fact that the above could does not work for regular dicts? |
|||
msg201759 - (view) | Author: Nikolaus Rath (nikratio) * | Date: 2013-10-30 18:45 | |
Ethan: when you say "..the more there is to remember", what exactly do you mean? I can see that it is important to remember that you're *not allowed* to make changes during iteration for a regular dict. But is there really a significant cognitive burden if it were allowed for ordered dicts? You're not forced to use the feature, since you can still safely continue to use ordered dicts like regular dicts. |
|||
msg201763 - (view) | Author: Ethan Furman (ethan.furman) * ![]() |
Date: 2013-10-30 18:56 | |
Firstly, you're not copying the entire dict, just its keys. [1] Secondly, you're only copying the keys when you'll be adding/deleting keys from the dict. Thirdly, using the list idiom means you can use OrderedDicts, defaultdicts, dicts, sets, and most likely any other mapping type the same way, whereas if you make OrderedDict special in this area you've locked out the other types. In my opinion the differences in OrderedDict should be limited to what is necessary to make a dict ordered. The ability to insert/delete keys while iterating is not necessary to maintaining an order, and the break from other dicts doesn't add enough value to make it worth it. So, yes, the short answer is because Python's other similar types don't do it that way, OrderedDict shouldn't either. [1] If the dict is large, collect the to_be_deleted keys in a separate list and remove them after the loop. |
|||
msg201766 - (view) | Author: Ethan Furman (ethan.furman) * ![]() |
Date: 2013-10-30 19:02 | |
Nikolaus, in reply to your question about "more to remember": Even though I may not use it myself, if it is allowed then at some point I will see it in code; when that happens the sequence of events is something like: 1) hey, that won't work 2) oh, wait, is this an OrderedDict? 3) (yes) oh, okay, it's fine then <done> 3) (no) hmm, well, it was supposed to be, but a regular dict was passed in 4) okay, we either fix the code here to handle regular dicts (use list idiom); or go back to calling code and make this an OrderedDict I see that as a lot of extra effort for a non-necessary change. |
|||
msg201775 - (view) | Author: Nikolaus Rath (nikratio) * | Date: 2013-10-30 19:43 | |
Hmm. I see your point. You might be right (I'm not fully convinced yet though), but this bug is probably not a good place to go into more detail about this. So what would be the best way to fix the immediate problem this was originally about? Raise a RuntimeError (as Armin suggested), or just end the iteration? Note that RuntimeError would only be raised when the current element is removed from the dict, and the ordered dict would still tolerate other kinds of modifications. Both variants would also be a change from previous behavior (were removing the current elements works as long as you do not remove the next one as well). The minimally intrusive change would be to skip over elements that have been removed, but that comes at the expense of an additional membership test in each iteration. |
|||
msg201777 - (view) | Author: Ethan Furman (ethan.furman) * ![]() |
Date: 2013-10-30 20:13 | |
Personally, I would rather see a RuntimeError raised. |
|||
msg201779 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2013-10-30 20:24 | |
See also issue19332. |
|||
msg207666 - (view) | Author: Nikolaus Rath (nikratio) * | Date: 2014-01-08 05:05 | |
I have attached a patch that fixes this issue. Looking at Raymonds comments in issue 19332, I have kept new code out of the critical path in __iter__ and instead modified the __delitem__ method (I assume that an element is removed only once, but may be iterated over many times). The updated __delitem__ now also updates the prev and next links of the removed item itself. When the current item is removed during an iteration, the iteration thus stops. I hope that's an acceptable solution. |
|||
msg215980 - (view) | Author: Chris Angelico (Rosuav) * | Date: 2014-04-12 19:12 | |
I agree that current behaviour is a bit confusing; also, the implication is that deleting from the dictionary while you have an iterator may leave some hanging references around the place, which raises a red flag in my mind (maybe something else might find the stale data, too). My inclination would be to Armin's idea of setting next to None. Seems simpler than trying too hard to patch around something that's a bad idea anyway. |
|||
msg216032 - (view) | Author: Nikolaus Rath (nikratio) * | Date: 2014-04-13 19:18 | |
The patch applies cleanly to 3.4 and 3.5, not sure why the review link does not show up. I'm attaching the file again, maybe that helps. |
|||
msg217857 - (view) | Author: Roundup Robot (python-dev) ![]() |
Date: 2014-05-04 04:58 | |
New changeset a3c345ba3563 by Raymond Hettinger in branch 'default': Issue #19414: Have the OrderedDict mark deleted links as unusable. http://hg.python.org/cpython/rev/a3c345ba3563 |
|||
msg217858 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2014-05-04 05:05 | |
To address Armin's concern, I'm triggering an early failure by setting the link fields to None. That will help prevent accidental reliance on non-guaranteed behaviors. |
|||
msg218095 - (view) | Author: Nikolaus Rath (nikratio) * | Date: 2014-05-08 03:36 | |
Raymond, I think your patch does not really address the issue reported here. The dict documentation still guarantees that mutating a dict during iteration will raise a RuntimeError or may skip elements. The OrderedDict documentation still does not point out that OrderedDicts behave differently, yet they still raise a different exception than a regular dict in the same situation. I believe it should be possible to pass an OrderedDict to any function expecting a regular dict. This is still not possible. But even if you think this is not desirable (or not worth the cost), could we at least *document* that OrderedDicts behave differently? |
|||
msg220584 - (view) | Author: Nikolaus Rath (nikratio) * | Date: 2014-06-14 21:52 | |
Raymond, it would be nice if you could respond to my last comment... |
|||
msg220600 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2014-06-15 01:00 | |
Sorry Nikolaus, I'm happy with the code and docs as they are. In general, you should assume that unless documented otherwise, any pure python container (stdlib or 3rd party) has undefined behavior if you mutate while iterating (like you should not assume thread-safety unless a container is documented as threadsafe). At your behest, I added extra code to trigger an earlier and more visible failure in some circumstances, but that is the limit. OrderedDicts have been around for a good while (not just the stdlib but also in other code predating the one in the stdlib), so we know that this hasn't been a problem in practice. Please declare victory, move on, and don't keep extending this closed thread. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:52 | admin | set | github: 63613 |
2014-06-15 01:00:31 | rhettinger | set | messages: + msg220600 |
2014-06-14 21:52:45 | nikratio | set | messages: + msg220584 |
2014-05-08 03:36:10 | nikratio | set | messages: + msg218095 |
2014-05-04 05:05:02 | rhettinger | set | status: open -> closed resolution: fixed messages: + msg217858 |
2014-05-04 04:58:54 | python-dev | set | nosy:
+ python-dev messages: + msg217857 |
2014-04-13 19:18:06 | nikratio | set | files:
+ issue19414_r2.diff messages: + msg216032 |
2014-04-12 19:19:32 | belopolsky | set | nosy:
+ belopolsky |
2014-04-12 19:12:26 | Rosuav | set | nosy:
+ Rosuav messages: + msg215980 |
2014-01-08 05:05:57 | nikratio | set | files:
+ issue19414.diff keywords: + patch messages: + msg207666 |
2013-10-30 23:31:05 | nikratio | set | title: OrderedDict.values() behavior for modified instance -> iter(ordered_dict) yields keys not in dict in some circumstances |
2013-10-30 20:24:42 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg201779 |
2013-10-30 20:13:36 | ethan.furman | set | messages: + msg201777 |
2013-10-30 19:43:18 | nikratio | set | messages: + msg201775 |
2013-10-30 19:02:47 | ethan.furman | set | messages: + msg201766 |
2013-10-30 18:56:06 | ethan.furman | set | messages: + msg201763 |
2013-10-30 18:45:52 | nikratio | set | messages: + msg201759 |
2013-10-30 18:41:58 | nikratio | set | messages: + msg201757 |
2013-10-30 17:15:29 | ethan.furman | set | messages: + msg201751 |
2013-10-30 16:32:19 | nikratio | set | messages: + msg201745 |
2013-10-30 16:24:45 | ethan.furman | set | messages: + msg201743 |
2013-10-30 16:05:00 | arigo | set | messages: + msg201742 |
2013-10-30 14:28:57 | ethan.furman | set | nosy:
+ ethan.furman messages: + msg201739 |
2013-10-28 09:06:17 | arigo | set | messages: + msg201517 |
2013-10-28 05:50:40 | rhettinger | set | messages: + msg201513 |
2013-10-28 05:31:37 | rhettinger | set | assignee: rhettinger |
2013-10-28 05:05:21 | nikratio | set | messages: + msg201511 |
2013-10-27 21:58:46 | arigo | set | messages: + msg201506 |
2013-10-27 21:39:08 | nikratio | set | messages: + msg201505 |
2013-10-27 21:09:06 | arigo | set | nosy:
+ arigo messages: + msg201502 |
2013-10-27 08:33:59 | ned.deily | set | nosy:
+ rhettinger |
2013-10-27 03:31:52 | nikratio | set | messages: + msg201409 |
2013-10-27 03:27:54 | nikratio | create |