New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyError thrown by optimised collections.OrderedDict.popitem() #71462
Comments
The C-based optimised version of collections.OrderedDict occasionally throws KeyErrors when deleting items. See mailgun/expiringdict#16 for an example of this regression. Backporting 3.6's patches to 3.5.1 does not resolve the issue. :( |
Could you please provide short example? |
A frequent reproducer is to run the expiringdict tests on Python 3.5.1, unfortunately I cannot come up with a testcase. Replacing use of popitem() with "del self[next(OrderedDict.__iter__(self))]" removes the KeyErrors and the structure otherwise works fine. |
I think your expiringdict seems not work with the C version OrderedDict, you may need to change your implementation or clarify that :(. The C version's OrderedDict.popitem may call your __getitem__ which then does deletion and emit KeyError when expires. I think the new OrderedDict may call your __getitem__ even in iteration which leads to the 'RuntimeError: OrderedDict mutated during iteration'. I haven't checked that. So a simple working example in Py3.4: d = ExpiringDict(max_len=3, max_age_seconds=0.01)
d['a'] = 'z'
sleep(1)
d.popitem() will fail in Py3.5+. |
I'm wondering if the expiringdict(1) needs to have locked wrappers for the inherited methods: def __delitem__(self, key):
with self.lock:
OrderedDict.__delitem__(self, key) Otherwise, there is a risk that one thread is deleting a key with no lock held, while another thread is running expiringdict.popitem() which holds a lock while calling both __getitem__ and del. If the first thread runs between the two steps in the second, the race condition would cause a KeyError. This might explain why you've observed, '''Replacing use of popitem() with "del self[next(OrderedDict.__iter__(self))]" removes the KeyErrors and the structure otherwise works fine.''' (1) https://github.com/mailgun/expiringdict/blob/master/expiringdict/__init__.py |
At least in my case, the application is single-threaded. I don't think this is a locking-related issue as the expiringdict test case itself fails which is also single-threaded. |
Raymond, In single threaded case popitem may still fail. I want to correct my last message that popitem does not fail in this case because it calls __getitem__ but instead it calls __contains__[1]. In __contains__ it deletes the item since it expires, and finally emit a KeyError[2]. Even if it passes __contains__, it will call __getitem__[3]. [1] https://hg.python.org/cpython/file/tip/Objects/odictobject.c#l1115 |
It seems to me that calling __contains__() (PySequence_Contains()) isn't necessary, as the first and last elements of the list are already known, and therefore known to be in the list. Revising the behaviour of popitem() to avoid calling _odict_popkey_hash() seems like it may provide a marginal performance benefit as well as fix the problem. Calling PyObject_DelItem() directly on the node should work fine I believe. |
There seems to be some difference behaviours between C version and pure Python version when it comes to subclass. Except popitem, the constructor also goes different code path. There may be more. Should these differences be eliminated or they are accepted? |
Attaching test case from bpo-28014 here since this issue looks close enough to that one to be caused by the same thing. |
Proposed patch makes the implementation of pop() and popitem() methods of the C implementation of OrderedDict matching the Python implementation. This fixes bpo-28014 and I suppose this fixes this issue too. |
lgtm |
Eric, could you please look at the patch? Maybe I missed some reasons for current implementation. |
New changeset 9f7505019767 by Serhiy Storchaka in branch '3.5': New changeset 2def8a24c299 by Serhiy Storchaka in branch '3.6': New changeset 19e199038704 by Serhiy Storchaka in branch 'default': |
Serhiy, doesn't this patch "fix" the issue by making subclasses with custom __getitem__/delitem implementations not have them invoked by the superclass's pop/popitem? The old code meant that pop and popitem didn't need to be overridden even if you overrode __getitem__/delitem in a way that differed from the default (e.g. __setitem__ might add some tracking data to the value that __getitem__ strips). Now they must be overwritten. The expiringdict's flaw seems to be that its __contains__ call and its __getitem__ are not idempotent, which the original code assumed (reasonably) they would be. The original code should probably be restored here. The general PyObject_GetItem/DelItem are needed to work with arbitrary subclasses correctly. The Sequence_Contains check is needed to avoid accidentally invoking __missing__ (though if __missing__ is not defined for the subclass, the Sequence_Contains check could be skipped). The only reason OrderedDict has the problem and dict doesn't is that OrderedDict was trying to be subclassing friendly (perhaps to ensure it remains compatible with code that subclassed the old Python implementation), while dict makes no such efforts. dict happily bypasses custom __getitem__/delitem calls when it uses pop/popitem. |
Explaining expiringdict's issue: It's two race conditions, with itself, not another thread. Example scenario (narrow race window):
An alternative scenario with a *huge* race window is:
expiringdict is unusually good at bringing this on itself. The failing popitem call is in __setitem__ for limited length expiringdicts, self.popitem(last=False), where they're intentionally removing the oldest entry, when the oldest entry is the most likely to have expired (and since __len__ isn't overridden to expire old entries, it may have been expired for quite a while). The del self[next(OrderedDict.__iter__(self))] works because they didn't override __iter__, so it's not expiring anything to get the first item, and therefore only __delitem__ is involved, not __contains__ or __getitem__ (note: This is also why the bug they reference has an issue with "OrderedDict mutated during iteration"; iteration returns long expired keys, but looking the expired keys up deletes them, causing the mutation issue). Possible correct fixes:
TL;DR: expiringdict is doing terrible things, assuming the superclass will handle them even though the superclass has completely different assumptions, and therefore expiringdict has only itself to blame. |
Ah, what is the reason for this code! But Python implementation of popitem() don't call overridden __getitem__/delitem. It uses dict.pop(). Simplified C implementation is closer to Python implementation. expiringdict is not the only implementation broken by accelerated OrderedDict. See other example in bpo-28014. |
The Python implementation of OrderedDict breaks for bpo-28014, at least on 3.4.3 (it doesn't raise KeyError, but if you check the repr, it's only showing one of the two entries, because calling __getitem__ is rearranging the OrderedDict). >>> s = SimpleLRUCache(2)
>>> s['t1'] = 1
>>> s
SimpleLRUCache([('t1', 1)])
>>> s['t2'] = 2
>>> s
SimpleLRUCache([('t1', 1)])
>>> s
SimpleLRUCache([('t2', 2)]) Again, the OrderedDict code (in the Python case, __repr__, in the C case, popitem) assumes __getitem__ is idempotent, and again, the violation of that constraint makes things break. They break differently in the Python implementation and the C implementation, but they still break, because people are trying to force OrderedDict to do unnatural things without implementing their own logic to ensure their violations of the dict pseudo-contract actually works. popitem happens to be a common cause of problems because it's logically a get and delete combined. People aren't using it for the get feature, it's just a convenient way to remove items from the end; if they bypassed getting and just deleted it would work, but it's a more awkward construction, so they don't. If they implemented their own popitem that avoided their own non-idempotent __getitem__, that would also work. I'd be perfectly happy with making popitem implemented in terms of pop on subclasses when pop is overridden (if pop isn't overridden though, that's effectively what popitem already does). I just don't think we should be making the decision that popitem *requires* inheritance for all dict subclasses that have (normal) idempotent __contains__ and __getitem__ because classes that violate the usual expectations of __contains__ and __getitem__ have (non-segfaulting) problems. Note: In the expiring case, the fix is still "wrong" if someone used popitem for the intended purpose (to get and delete). The item popped might have expired an hour ago, but because the fixed code bypasses __getitem__, it will happily return the expired a long expired item (having bypassed expiration checking). It also breaks encapsulation, returning the expiry time that is supposed to be stripped on pop. By fixing one logic flaw on behalf of a fundamentally broken subclass, we introduced another. |
In bpo-28014 __getitem__() is idempotent. Multiple calls of __getitem__() return the same result and keep the OrderedDict in the same state.
I like this idea.
Good catch! But old implementation still looks doubtful to me. |
New changeset 3f816eecc53e by Serhiy Storchaka in branch '3.5': |
bpo-44782 was opened about the I think Serhiy's patch here (before revert) may be a good idea (to re-apply). def never_called(self, *args):
print("Never called.")
raise ZeroDivisionError
class MyList(list):
__setitem__ = __delitem__ = __getitem__ = __len__ = __iter__ = __contains__ = never_called
class MyDict(dict):
__setitem__ = __delitem__ = __getitem__ = __len__ = __iter__ = __contains__ = never_called
class MySet(set):
__setitem__ = __delitem__ = __getitem__ = __len__ = __iter__ = __contains__ = never_called
L = MyList([5, 4, 3, 2])
L.sort()
L.pop(1)
L.insert(0, 42)
L.pop()
L.reverse()
assert type(L) is MyList
D = MyDict({"a": 1, "b": 2, "c": 3})
assert D.get(0) is None
assert D.get("a") == 1
assert D.pop("b") == 2
assert D.popitem() == ("c", 3)
assert type(D) is MyDict
S = MySet({"a", "b", "c"})
S.discard("a")
S.remove("b")
S.isdisjoint(S)
S |= S
S &= S
S ^= S
assert type(S) is MySet |
That seems sensible to me as well. It keeps the C version in harmony with the pure python version and it follows how regular dict's are implemented. Serhiy, do you remember why your patch was reverted? |
It was reverted because it did not keep the C version in harmony with the pure Python version. In the pure Python version pop() calls __getitem__ and __delitem__ which can be overridden in subclasses of OrederedDict. My patch always called dict.__getitem__ and dict.__delitem__. But I see now clearer what is the problem with the current C code. It removes the key from the linked list before calling __delitem__ which itself removes the key from the linked list. Perhaps I can fix it correctly this time. |
It is complicated. The pure Python implementation of OrderedDict.popitem() and OrderedDict.pop() are not consistent. The former uses dict.pop() which doesn't call __getitem__ and __setitem__. The latter calls __getitem__ and __setitem__. The C implementation shared code between popitem() and pop(), therefore it will differ from the pure Python implementation until we write separate code for popitem() and pop(). |
Let's do the right thing and fix the pure python OrderedDict.pop() method as well. |
PR 27528 makes the C implementation of OrderedDict.popitem() consistent with the Python implementation (do not call overridden __getitem__ and __setitem__). PR 27530 changes also both implementations of OrderedDict.pop(). It simplifies the C code, but adds a duplication of the code in Python. I am not sure how far we should backport these changes if backport them. |
We've had no reports of the current code causing problems for any existing applications (except the LRU recipe in the docs), so there is likely no value in making backports. Instead, we can clean it up so there won't be new issues going forward. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: