Issue 22084: Mutating while iterating

Hi, I asked about the inconsistency of the "RuntimeError" being raised when mutating a container while iterating over it here [1], "set and dict iteration" on Aug 16, 2012.

[1] http://www.gossamer-threads.com/lists/python/python/1004659

I posted a patch on the ML but never submitted it.  People's reaction seemed ambivalent.  Now I have an idea for a different implementation.  I'd like to take another shot at it.

It's one of the worst silent errors, since there's an error in the *iterator* when we call a *set* method.  We're going to add something to make it safer, at least in the sense of getting a clear failure, if the programmer does something that's always been ill-advised.

We have a number of options for the implementation.  We still have the option to introduce "IterationError", possibly a subclass of "RuntimeError".  These options are still applicable:

1) Collection of iterators
. Invalidate all "open" iterators on mutating
. a) Linked list
.. i) Uncounted references
.. ii) Counted references
.. iii) Weak references
. b) Weak set
2) Version index / timestamp / "memo"
. Iterators check whether the container has been mutated
    since they were created
. a) No overflow - Python longs
.. i) Reset index if no iterators left
. b) Overflow - C ints / shorts (silent error)
3) Iterator count
. Raise exception on mutation, not iteration

The new option is:

2d) Use a dedicated empty *object* for a timestamp or "memo".  A new memo is created on every mutation.  Before advancing, the iterator checks whether the current memo is a different object than it was when it was created.

Costs: The existing silent error is fairly rare.  The container gains a pointer to its current memo.  The iterator loses the cached length but gains a pointer to a memo.  The memos are blank objects: a "Py ssize t" and a pointer with certain flags at time of writing.  Speed is the same: comparing the lengths is replaced with comparing the memos.

Some caveats: The memory manager is used to obtain perpetually unique IDs.  A unique algorithm could be used instead of the memory manager, though the memo needs to contain a reference count more or less regardless.  There can at most be one memo per iterator.  The approach is outlined in pseudocode here [2]. Implementation could be optimized slightly by only creating new memos if iterators have been opened, shown here [3].

[2] http://home.comcast.net/~castironpi-misc/irc-0168%20mutating%20while%20iterating%20markup.html

[3] http://home.comcast.net/~castironpi-misc/irc-0168%20mutating%20while%20iterating%202%20markup.html

Title:	Mutating while iterating
Type:	behavior	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.5

Status:	closed	Resolution:	wont fix
Dependencies:		Superseder:
Assigned To:		Nosy List:	castironpi, ncoghlan, rhettinger, serhiy.storchaka
Priority:	normal	Keywords:

Messages (3)
msg224071 - (view)	Author: Aaron Brady (castironpi)	Date: 2014-07-26 15:52
Hi, I asked about the inconsistency of the "RuntimeError" being raised when mutating a container while iterating over it here [1], "set and dict iteration" on Aug 16, 2012. [1] http://www.gossamer-threads.com/lists/python/python/1004659 I posted a patch on the ML but never submitted it. People's reaction seemed ambivalent. Now I have an idea for a different implementation. I'd like to take another shot at it. It's one of the worst silent errors, since there's an error in the iterator when we call a set method. We're going to add something to make it safer, at least in the sense of getting a clear failure, if the programmer does something that's always been ill-advised. We have a number of options for the implementation. We still have the option to introduce "IterationError", possibly a subclass of "RuntimeError". These options are still applicable: 1) Collection of iterators . Invalidate all "open" iterators on mutating . a) Linked list .. i) Uncounted references .. ii) Counted references .. iii) Weak references . b) Weak set 2) Version index / timestamp / "memo" . Iterators check whether the container has been mutated since they were created . a) No overflow - Python longs .. i) Reset index if no iterators left . b) Overflow - C ints / shorts (silent error) 3) Iterator count . Raise exception on mutation, not iteration The new option is: 2d) Use a dedicated empty object for a timestamp or "memo". A new memo is created on every mutation. Before advancing, the iterator checks whether the current memo is a different object than it was when it was created. Costs: The existing silent error is fairly rare. The container gains a pointer to its current memo. The iterator loses the cached length but gains a pointer to a memo. The memos are blank objects: a "Py ssize t" and a pointer with certain flags at time of writing. Speed is the same: comparing the lengths is replaced with comparing the memos. Some caveats: The memory manager is used to obtain perpetually unique IDs. A unique algorithm could be used instead of the memory manager, though the memo needs to contain a reference count more or less regardless. There can at most be one memo per iterator. The approach is outlined in pseudocode here [2]. Implementation could be optimized slightly by only creating new memos if iterators have been opened, shown here [3]. [2] http://home.comcast.net/~castironpi-misc/irc-0168%20mutating%20while%20iterating%20markup.html [3] http://home.comcast.net/~castironpi-misc/irc-0168%20mutating%20while%20iterating%202%20markup.html
msg224073 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2014-07-26 16:31
It would be better discuss such ideas on python-ideas mailing list (http://mail.python.org/mailman/listinfo/python-ideas). Option 3 breaks existing code such as for k, v in d.items(): if pred(k, v): d[k] = newvalue break Option 1 is memory inefficient. It requires a list of iterators in every dict (well, in almost every dict). And it doesn't look more time efficient than option 2. Implementation of option 2 was provided and rejected in issue19332.
msg224100 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2014-07-27 03:33
Raymond's answer at http://bugs.python.org/issue19332#msg202287 still stands: the checks for mutation while iterating are there primarily to protect the interpreter itself, rather than to help detect bugs where the user code misses some items because it mutated the dict during iteration. The fact it helps detect user errors is a helpful side effect of the interpreter protecting itself, rather than the purpose of the change. Mutations that don't change the mapping size don't do the interpreter any harm, even if they're not what the user intended: >>> d = dict(a=1, b=2, c=3) >>> for k, v in d.items(): ... del d[k] ... d[v] = k ... >>> d {1: 'a', 2: 'b', 3: 'c'} >>> for k, v in d.items(): ... del d[k] ... d[v] = k ... >>> d {'a': 1, 'b': 2, 'c': 3} There are lots of ways to write buggy code that are significantly less obscure than this, and we don't change the way core data types work to prevent them. For example, list iterators don't care if the underlying list is mutated at all, as again, such mutation poses no threat to the interpreter: >>> seq = [x for x in range(10)] >>> for i, x in enumerate(seq): ... print(i, x) ... del seq[i] ... 0 0 1 2 2 4 3 6 4 8 This kind of quirky behaviour is why "be careful when mutating containers you're iterating over" is sound advice. It isn't a reason to change the behaviour of the language to make currently legal (albeit odd) code a runtime error.

History
Date	User	Action	Args
2022-04-11 14:58:06	admin	set	github: 66282
2014-07-27 03:33:49	ncoghlan	set	status: open -> closed versions: - Python 3.1, Python 2.7, Python 3.2, Python 3.3, Python 3.4 nosy: + ncoghlan messages: + msg224100 resolution: wont fix stage: resolved
2014-07-26 16:32:00	serhiy.storchaka	set	nosy: + rhettinger, serhiy.storchaka messages: + msg224073
2014-07-26 15:52:21	castironpi	create