classification
Title: Guard against changing dict during iteration
Type: enhancement Stage: patch review
Components: Interpreter Core Versions: Python 3.4
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: ethan.furman, pitrou, python-dev, rhettinger, serhiy.storchaka, tim.peters
Priority: normal Keywords: patch

Created on 2013-10-21 14:02 by serhiy.storchaka, last changed 2016-01-10 23:43 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
dict_mutating_iteration.patch serhiy.storchaka, 2013-10-21 14:02 review
dict_mutating_iteration_2.patch serhiy.storchaka, 2013-10-23 19:46 review
Messages (9)
msg200784 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-21 14:02
Currently dict iterating is guarded against changing dict's size. However when dict changed during iteration so that it's size left unchanged, this modification left unnoticed.

>>> d = dict.fromkeys('abcd')
>>> for i in d:
...     print(i)
...     d[i + 'x'] = None
...     del d[i]
... 
d
a
dx
dxx
ax
c
b

In general iterating over mutating dict considered logical error. It is good detect it as early as possible.

The proposed patch introduces a counter which changed every time when added or removed key. If an iterator detects that this counter is changed, it raises runtime error.
msg200995 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-10-23 04:20
The decision to not monitor adding or removing keys was intentional.  It is just not worth the cost in either time or space.
msg201062 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-23 19:46
In the first patch the counter was placed in the _dictkeysobject structure. In the second place it is placed in the PyDictObject so it now has no memory cost. Access time to new counter for non-modifying operations is same as in current code. The only additional cost is time cost for modifying operations. But modifying operations is usually much rare than non-modifying operations, and the incrementing one field takes only small part of the time needed for all operation. I don't think this will affect total performance of real programs.
msg201065 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-10-23 20:10
If there's no performance regression, then this sounds like a reasonable idea. The remaining question would be whether it can break existing code. Perhaps you should ask python-dev?
msg201156 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-10-24 16:34
I disagree with adding such unimportant code to the critical path.
msg201780 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2013-10-30 21:47
Raymond, please don't be so concise.

Is the code unimportant because the scenario is so rare, or something else?
msg202262 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2013-11-06 12:32
Duplicate of this: http://bugs.python.org/issue6017
msg202287 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-11-06 20:56
A few thoughts:

* No existing, working code will benefit from this patch; however, almost all code will pay a price for it -- bigger size for an empty dict and a runtime cost (possibly very small) on the critical path (every time a value is stored in a dict).

* The sole benefit of the patch is provide an earlier warning that someone is doing something weird.  For most people, this will never come up (we have 23 years of Python history indicating that there isn't a real problem to that needs to be solved). 

* The normal rule (not just for Python) is that a data structures have undefined behavior for mutating while iterating, unless there is a specific guarantee (for example, we guarantee that the dicts are allowed to mutate values but not keys during iteration and we guarantee the behavior of list iteration while iterating).

* It is not clear that other implementations such as IronPython and Jython would be able to implement this behavior (Jython wraps the Java ConcurrentHashMap).

* The current patch second guesses a decision that was made long ago to only detect size changes (because it is cheap, doesn't take extra memory, isn't on the critical path, and handles the common case).

* The only case whether we truly need a stronger protection is when it is needed to defend against a segfault.  That is why collections.deque() implement a change counter.  It has a measureable cost that slows down  deque operations (increasing the number of memory accesses per append, pop, or next) but it is needed to prevent the iterator from spilling into freed memory.
msg257946 - (view) Author: Roundup Robot (python-dev) Date: 2016-01-10 23:43
New changeset a576199a5350 by Victor Stinner in branch 'default':
PEP 509
https://hg.python.org/peps/rev/a576199a5350
History
Date User Action Args
2017-02-02 14:38:17r.david.murraylinkissue29420 superseder
2016-01-10 23:43:55python-devsetnosy: + python-dev
messages: + msg257946
2013-11-06 20:56:30rhettingersetmessages: + msg202287
2013-11-06 12:32:40steven.dapranosetnosy: - steven.daprano
2013-11-06 12:32:11steven.dapranosetnosy: + steven.daprano
messages: + msg202262
2013-10-30 21:47:23ethan.furmansetnosy: + ethan.furman
messages: + msg201780
2013-10-28 06:15:54rhettingersetstatus: open -> closed
resolution: rejected
2013-10-24 16:34:30rhettingersetmessages: + msg201156
2013-10-23 20:10:35pitrousetmessages: + msg201065
2013-10-23 19:58:35pitrousetnosy: + tim.peters
2013-10-23 19:46:02serhiy.storchakasetfiles: + dict_mutating_iteration_2.patch

messages: + msg201062
2013-10-23 04:20:22rhettingersetassignee: rhettinger
messages: + msg200995
2013-10-21 14:02:54serhiy.storchakacreate