This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Mutating while iterating
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: castironpi, ncoghlan, rhettinger, serhiy.storchaka
Priority: normal Keywords:

Created on 2014-07-26 15:52 by castironpi, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (3)
msg224071 - (view) Author: Aaron Brady (castironpi) Date: 2014-07-26 15:52
Hi, I asked about the inconsistency of the "RuntimeError" being raised when mutating a container while iterating over it here [1], "set and dict iteration" on Aug 16, 2012.

[1] http://www.gossamer-threads.com/lists/python/python/1004659

I posted a patch on the ML but never submitted it.  People's reaction seemed ambivalent.  Now I have an idea for a different implementation.  I'd like to take another shot at it.

It's one of the worst silent errors, since there's an error in the *iterator* when we call a *set* method.  We're going to add something to make it safer, at least in the sense of getting a clear failure, if the programmer does something that's always been ill-advised.

We have a number of options for the implementation.  We still have the option to introduce "IterationError", possibly a subclass of "RuntimeError".  These options are still applicable:

1) Collection of iterators
. Invalidate all "open" iterators on mutating
. a) Linked list
.. i) Uncounted references
.. ii) Counted references
.. iii) Weak references
. b) Weak set
2) Version index / timestamp / "memo"
. Iterators check whether the container has been mutated
    since they were created
. a) No overflow - Python longs
.. i) Reset index if no iterators left
. b) Overflow - C ints / shorts (silent error)
3) Iterator count
. Raise exception on mutation, not iteration

The new option is:

2d) Use a dedicated empty *object* for a timestamp or "memo".  A new memo is created on every mutation.  Before advancing, the iterator checks whether the current memo is a different object than it was when it was created.

Costs: The existing silent error is fairly rare.  The container gains a pointer to its current memo.  The iterator loses the cached length but gains a pointer to a memo.  The memos are blank objects: a "Py ssize t" and a pointer with certain flags at time of writing.  Speed is the same: comparing the lengths is replaced with comparing the memos.

Some caveats: The memory manager is used to obtain perpetually unique IDs.  A unique algorithm could be used instead of the memory manager, though the memo needs to contain a reference count more or less regardless.  There can at most be one memo per iterator.  The approach is outlined in pseudocode here [2]. Implementation could be optimized slightly by only creating new memos if iterators have been opened, shown here [3].

[2] http://home.comcast.net/~castironpi-misc/irc-0168%20mutating%20while%20iterating%20markup.html

[3] http://home.comcast.net/~castironpi-misc/irc-0168%20mutating%20while%20iterating%202%20markup.html
msg224073 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-07-26 16:31
It would be better discuss such ideas on python-ideas mailing list (http://mail.python.org/mailman/listinfo/python-ideas).

Option 3 breaks existing code such as

    for k, v in d.items():
        if pred(k, v):
            d[k] = newvalue
            break

Option 1 is memory inefficient. It requires a list of iterators in every dict (well, in almost every dict). And it doesn't look more time efficient than option 2.

Implementation of option 2 was provided and rejected in issue19332.
msg224100 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-07-27 03:33
Raymond's answer at http://bugs.python.org/issue19332#msg202287 still stands: the checks for mutation while iterating are there primarily to *protect the interpreter itself*, rather than to help detect bugs where the user code misses some items because it mutated the dict during iteration. The fact it helps detect user errors is a helpful side effect of the interpreter protecting itself, rather than the *purpose* of the change.

Mutations that don't change the mapping size don't do the interpreter any harm, even if they're not what the user intended:

>>> d = dict(a=1, b=2, c=3)
>>> for k, v in d.items():
...     del d[k]
...     d[v] = k
... 
>>> d
{1: 'a', 2: 'b', 3: 'c'}
>>> for k, v in d.items():
...     del d[k]
...     d[v] = k
...                                                                                                              
>>> d
{'a': 1, 'b': 2, 'c': 3}

There are *lots* of ways to write buggy code that are significantly less obscure than this, and we don't change the way core data types work to prevent them.

For example, list iterators don't care if the underlying list is mutated at all, as again, such mutation poses no threat to the interpreter:

>>> seq = [x for x in range(10)]
>>> for i, x in enumerate(seq):
...     print(i, x)
...     del seq[i]
...
0 0
1 2
2 4
3 6
4 8

This kind of quirky behaviour is why "be careful when mutating containers you're iterating over" is sound advice. It *isn't* a reason to change the behaviour of the language to make currently legal (albeit odd) code a runtime error.
History
Date User Action Args
2022-04-11 14:58:06adminsetgithub: 66282
2014-07-27 03:33:49ncoghlansetstatus: open -> closed

versions: - Python 3.1, Python 2.7, Python 3.2, Python 3.3, Python 3.4
nosy: + ncoghlan

messages: + msg224100
resolution: wont fix
stage: resolved
2014-07-26 16:32:00serhiy.storchakasetnosy: + rhettinger, serhiy.storchaka
messages: + msg224073
2014-07-26 15:52:21castironpicreate