msg267957 - (view) |
Author: William Pitcock (kaniini) |
Date: 2016-06-09 04:33 |
The C-based optimised version of collections.OrderedDict occasionally throws KeyErrors when deleting items.
See https://github.com/mailgun/expiringdict/issues/16 for an example of this regression.
Backporting 3.6's patches to 3.5.1 does not resolve the issue. :(
|
msg267960 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-06-09 04:57 |
Could you please provide short example?
|
msg267984 - (view) |
Author: William Pitcock (kaniini) |
Date: 2016-06-09 08:13 |
A frequent reproducer is to run the expiringdict tests on Python 3.5.1, unfortunately I cannot come up with a testcase.
Replacing use of popitem() with "del self[next(OrderedDict.__iter__(self))]" removes the KeyErrors and the structure otherwise works fine.
|
msg268122 - (view) |
Author: Xiang Zhang (xiang.zhang) * |
Date: 2016-06-10 14:05 |
I think your expiringdict seems not work with the C version OrderedDict, you may need to change your implementation or clarify that :(.
The C version's OrderedDict.popitem may call your __getitem__ which then does deletion and emit KeyError when expires. I think the new OrderedDict may call your __getitem__ even in iteration which leads to the 'RuntimeError: OrderedDict mutated during iteration'. I haven't checked that.
So a simple working example in Py3.4:
d = ExpiringDict(max_len=3, max_age_seconds=0.01)
d['a'] = 'z'
sleep(1)
d.popitem()
will fail in Py3.5+.
|
msg268130 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2016-06-10 17:31 |
I'm wondering if the expiringdict(1) needs to have locked wrappers for the inherited methods:
def __delitem__(self, key):
with self.lock:
OrderedDict.__delitem__(self, key)
Otherwise, there is a risk that one thread is deleting a key with no lock held, while another thread is running expiringdict.popitem() which holds a lock while calling both __getitem__ and del. If the first thread runs between the two steps in the second, the race condition would cause a KeyError.
This might explain why you've observed, '''Replacing use of popitem() with "del self[next(OrderedDict.__iter__(self))]" removes the KeyErrors and the structure otherwise works fine.'''
(1) https://github.com/mailgun/expiringdict/blob/master/expiringdict/__init__.py
|
msg268138 - (view) |
Author: William Pitcock (kaniini) |
Date: 2016-06-10 18:24 |
At least in my case, the application is single-threaded. I don't think this is a locking-related issue as the expiringdict test case itself fails which is also single-threaded.
|
msg268139 - (view) |
Author: Xiang Zhang (xiang.zhang) * |
Date: 2016-06-10 18:31 |
Raymond, In single threaded case popitem may still fail.
I want to correct my last message that popitem does not fail in this case because it calls __getitem__ but instead it calls __contains__[1]. In __contains__ it deletes the item since it expires, and finally emit a KeyError[2]. Even if it passes __contains__, it will call __getitem__[3].
[1] https://hg.python.org/cpython/file/tip/Objects/odictobject.c#l1115
[2] https://hg.python.org/cpython/file/tip/Objects/odictobject.c#l1135
[3] https://hg.python.org/cpython/file/tip/Objects/odictobject.c#l1119
|
msg268148 - (view) |
Author: William Pitcock (kaniini) |
Date: 2016-06-10 19:58 |
It seems to me that calling __contains__() (PySequence_Contains()) isn't necessary, as the first and last elements of the list are already known, and therefore known to be in the list. Revising the behaviour of popitem() to avoid calling _odict_popkey_hash() seems like it may provide a marginal performance benefit as well as fix the problem. Calling PyObject_DelItem() directly on the node should work fine I believe.
|
msg271733 - (view) |
Author: Xiang Zhang (xiang.zhang) * |
Date: 2016-07-31 09:20 |
There seems to be some difference behaviours between C version and pure Python version when it comes to subclass. Except popitem, the constructor also goes different code path. There may be more. Should these differences be eliminated or they are accepted?
|
msg274962 - (view) |
Author: Zachary Ware (zach.ware) * |
Date: 2016-09-08 03:40 |
Attaching test case from #28014 here since this issue looks close enough to that one to be caused by the same thing.
|
msg277513 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-09-27 11:51 |
Proposed patch makes the implementation of pop() and popitem() methods of the C implementation of OrderedDict matching the Python implementation. This fixes issue28014 and I suppose this fixes this issue too.
|
msg277615 - (view) |
Author: Inada Naoki (methane) * |
Date: 2016-09-28 13:30 |
lgtm
|
msg277750 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-09-30 08:12 |
Eric, could you please look at the patch? Maybe I missed some reasons for current implementation.
|
msg279401 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2016-10-25 12:38 |
New changeset 9f7505019767 by Serhiy Storchaka in branch '3.5':
Issue #27275: Fixed implementation of pop() and popitem() methods in
https://hg.python.org/cpython/rev/9f7505019767
New changeset 2def8a24c299 by Serhiy Storchaka in branch '3.6':
Issue #27275: Fixed implementation of pop() and popitem() methods in
https://hg.python.org/cpython/rev/2def8a24c299
New changeset 19e199038704 by Serhiy Storchaka in branch 'default':
Issue #27275: Fixed implementation of pop() and popitem() methods in
https://hg.python.org/cpython/rev/19e199038704
|
msg279405 - (view) |
Author: Josh Rosenberg (josh.r) * |
Date: 2016-10-25 13:38 |
Serhiy, doesn't this patch "fix" the issue by making subclasses with custom __getitem__/__delitem__ implementations not have them invoked by the superclass's pop/popitem?
The old code meant that pop and popitem didn't need to be overridden even if you overrode __getitem__/__delitem__ in a way that differed from the default (e.g. __setitem__ might add some tracking data to the value that __getitem__ strips). Now they must be overwritten.
The expiringdict's flaw seems to be that its __contains__ call and its __getitem__ are not idempotent, which the original code assumed (reasonably) they would be.
The original code should probably be restored here. The general PyObject_GetItem/DelItem are needed to work with arbitrary subclasses correctly. The Sequence_Contains check is needed to avoid accidentally invoking __missing__ (though if __missing__ is not defined for the subclass, the Sequence_Contains check could be skipped).
The only reason OrderedDict has the problem and dict doesn't is that OrderedDict was trying to be subclassing friendly (perhaps to ensure it remains compatible with code that subclassed the old Python implementation), while dict makes no such efforts. dict happily bypasses custom __getitem__/__delitem__ calls when it uses pop/popitem.
|
msg279408 - (view) |
Author: Josh Rosenberg (josh.r) * |
Date: 2016-10-25 14:31 |
Explaining expiringdict's issue: It's two race conditions, with itself, not another thread.
Example scenario (narrow race window):
1. An entry has an expiry time of X (so it will self-delete at time X or later)
2. At time X-1, the PySequence_Contains check is run, and it returns 1 (true)
3. Because contains returned True, at time X PyObject_GetItem is run, but because it's time X, expiringdict's __getitem__ deletes the entry and raises KeyError
An alternative scenario with a *huge* race window is:
1. An entry has an expiry time of X (so it will self-delete at time X or later)
2. No lookups or membership tests or any other operations that implicitly clean the expiring dict occur for a while
3. At time X+1000, _odict_FIRST (in popitem) grabs the first entry in the OrderedDict without invoking the __contains__ machinery that would delete the entry
3. At time X+1000 or so, the PySequence_Contains check is run, and it returns 0 (false), because the __contains__ machinery is invoked, and again, because no default is provided for popitem, a KeyError is raised (this time by the popitem machinery, not __getitem__)
expiringdict is unusually good at bringing this on itself. The failing popitem call is in __setitem__ for limited length expiringdicts, self.popitem(last=False), where they're intentionally removing the oldest entry, when the oldest entry is the most likely to have expired (and since __len__ isn't overridden to expire old entries, it may have been expired for quite a while).
The del self[next(OrderedDict.__iter__(self))] works because they didn't override __iter__, so it's not expiring anything to get the first item, and therefore only __delitem__ is involved, not __contains__ or __getitem__ (note: This is also why the bug they reference has an issue with "OrderedDict mutated during iteration"; iteration returns long expired keys, but looking the expired keys up deletes them, causing the mutation issue).
Possible correct fixes:
1. Make popitem _more_ subclass friendly; when it's an OrderedDict subclass, instead of using _odict_LAST and _odict_FIRST, use (C equivalent) of `next(reversed(self))` and `next(iter(self))` respectively. This won't fix expiringdict as is (because it's broken by design; it doesn't override __iter__, so it will happily return long expired keys that disappear on access), but if we're going for subclass friendliness and allowing non-idempotent __contains__/__getitem__/__iter__ implementations, it's the right thing to do. If expiringdict implemented __iter__ to copy the keys, then loop over the copy, deleting expired values and yielding unexpired values, this would at least reduce the huge race window to a narrow window (because it could still yield a key that's almost expired)
2. Check for the presence of __missing__ on the type and only perform the PySequence_Contains check if __missing__ is defined (to avoid autovivification). This fixes the narrow race condition for subclasses without __missing__, but not the huge race condition
3. Explicitly document the idempotence assumptions made by OrderedDict (specifically, that all non-mutating methods of OrderedDict must not be made mutating in subclasses unless the caller also overrides all multistep operations, e.g. pop/popitem/setdefault).
TL;DR: expiringdict is doing terrible things, assuming the superclass will handle them even though the superclass has completely different assumptions, and therefore expiringdict has only itself to blame.
|
msg279433 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-10-25 17:59 |
Ah, what is the reason for this code!
But Python implementation of popitem() don't call overridden __getitem__/__delitem__. It uses dict.pop(). Simplified C implementation is closer to Python implementation.
expiringdict is not the only implementation broken by accelerated OrderedDict. See other example in issue28014.
|
msg279470 - (view) |
Author: Josh Rosenberg (josh.r) * |
Date: 2016-10-26 00:37 |
The Python implementation of OrderedDict breaks for issue28014, at least on 3.4.3 (it doesn't raise KeyError, but if you check the repr, it's only showing one of the two entries, because calling __getitem__ is rearranging the OrderedDict).
>>> s = SimpleLRUCache(2)
>>> s['t1'] = 1
>>> s
SimpleLRUCache([('t1', 1)])
>>> s['t2'] = 2
>>> s
SimpleLRUCache([('t1', 1)])
>>> s
SimpleLRUCache([('t2', 2)])
Again, the OrderedDict code (in the Python case, __repr__, in the C case, popitem) assumes __getitem__ is idempotent, and again, the violation of that constraint makes things break. They break differently in the Python implementation and the C implementation, but they still break, because people are trying to force OrderedDict to do unnatural things without implementing their own logic to ensure their violations of the dict pseudo-contract actually works.
popitem happens to be a common cause of problems because it's logically a get and delete combined. People aren't using it for the get feature, it's just a convenient way to remove items from the end; if they bypassed getting and just deleted it would work, but it's a more awkward construction, so they don't. If they implemented their own popitem that avoided their own non-idempotent __getitem__, that would also work.
I'd be perfectly happy with making popitem implemented in terms of pop on subclasses when pop is overridden (if pop isn't overridden though, that's effectively what popitem already does).
I just don't think we should be making the decision that popitem *requires* inheritance for all dict subclasses that have (normal) idempotent __contains__ and __getitem__ because classes that violate the usual expectations of __contains__ and __getitem__ have (non-segfaulting) problems.
Note: In the expiring case, the fix is still "wrong" if someone used popitem for the intended purpose (to get and delete). The item popped might have expired an hour ago, but because the fixed code bypasses __getitem__, it will happily return the expired a long expired item (having bypassed expiration checking). It also breaks encapsulation, returning the expiry time that is supposed to be stripped on pop. By fixing one logic flaw on behalf of a fundamentally broken subclass, we introduced another.
|
msg279530 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-10-27 08:26 |
In issue28014 __getitem__() is idempotent. Multiple calls of __getitem__() return the same result and keep the OrderedDict in the same state.
> I'd be perfectly happy with making popitem implemented in terms of pop on subclasses when pop is overridden (if pop isn't overridden though, that's effectively what popitem already does).
I like this idea.
> Note: In the expiring case, the fix is still "wrong" if someone used popitem for the intended purpose (to get and delete).
Good catch! But old implementation still looks doubtful to me.
|
msg279727 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2016-10-30 15:28 |
New changeset 3f816eecc53e by Serhiy Storchaka in branch '3.5':
Backed out changeset 9f7505019767 (issue #27275).
https://hg.python.org/cpython/rev/3f816eecc53e
|
msg398631 - (view) |
Author: Dennis Sweeney (Dennis Sweeney) * |
Date: 2021-07-31 05:47 |
bpo-44782 was opened about the `class LRU(OrderedDict)` in the OrderedDict docs, and its pop() method failing.
I think Serhiy's patch here (before revert) may be a good idea (to re-apply).
I think it is reasonable to ignore user-implemented dunder methods from subclasses.
Concrete type implementations generally do not behave as mix-ins:
def never_called(self, *args):
print("Never called.")
raise ZeroDivisionError
class MyList(list):
__setitem__ = __delitem__ = __getitem__ = __len__ = __iter__ = __contains__ = never_called
class MyDict(dict):
__setitem__ = __delitem__ = __getitem__ = __len__ = __iter__ = __contains__ = never_called
class MySet(set):
__setitem__ = __delitem__ = __getitem__ = __len__ = __iter__ = __contains__ = never_called
L = MyList([5, 4, 3, 2])
L.sort()
L.pop(1)
L.insert(0, 42)
L.pop()
L.reverse()
assert type(L) is MyList
D = MyDict({"a": 1, "b": 2, "c": 3})
assert D.get(0) is None
assert D.get("a") == 1
assert D.pop("b") == 2
assert D.popitem() == ("c", 3)
assert type(D) is MyDict
S = MySet({"a", "b", "c"})
S.discard("a")
S.remove("b")
S.isdisjoint(S)
S |= S
S &= S
S ^= S
assert type(S) is MySet
|
msg398705 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2021-08-01 15:01 |
> I think Serhiy's patch here (before revert) may be a
> good idea (to re-apply).
That seems sensible to me as well. It keeps the C version in harmony with the pure python version and it follows how regular dict's are implemented.
Serhiy, do you remember why your patch was reverted?
|
msg398708 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2021-08-01 15:40 |
It was reverted because it did not keep the C version in harmony with the pure Python version. In the pure Python version pop() calls __getitem__ and __delitem__ which can be overridden in subclasses of OrederedDict. My patch always called dict.__getitem__ and dict.__delitem__.
But I see now clearer what is the problem with the current C code. It removes the key from the linked list before calling __delitem__ which itself removes the key from the linked list. Perhaps I can fix it correctly this time.
|
msg398713 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2021-08-01 16:51 |
It is complicated. The pure Python implementation of OrderedDict.popitem() and OrderedDict.pop() are not consistent. The former uses dict.pop() which doesn't call __getitem__ and __setitem__. The latter calls __getitem__ and __setitem__. The C implementation shared code between popitem() and pop(), therefore it will differ from the pure Python implementation until we write separate code for popitem() and pop().
|
msg398714 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2021-08-01 17:05 |
Let's do the right thing and fix the pure python OrderedDict.pop() method as well.
|
msg398720 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2021-08-01 19:52 |
PR 27528 makes the C implementation of OrderedDict.popitem() consistent with the Python implementation (do not call overridden __getitem__ and __setitem__).
PR 27530 changes also both implementations of OrderedDict.pop(). It simplifies the C code, but adds a duplication of the code in Python.
I am not sure how far we should backport these changes if backport them.
|
msg398722 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2021-08-01 23:30 |
> I am not sure how far we should backport these changes
> if backport them.
We've had no reports of the current code causing problems for any existing applications (except the LRU recipe in the docs), so there is likely no value in making backports. Instead, we can clean it up so there won't be new issues going forward.
|
msg398819 - (view) |
Author: Łukasz Langa (lukasz.langa) * |
Date: 2021-08-03 11:01 |
New changeset 8c9f847997196aa76500d1ae104cbe7fe2a467ed by Serhiy Storchaka in branch 'main':
bpo-27275: Change popitem() and pop() methods of collections.OrderedDict (GH-27530)
https://github.com/python/cpython/commit/8c9f847997196aa76500d1ae104cbe7fe2a467ed
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:32 | admin | set | github: 71462 |
2021-08-03 11:25:55 | lukasz.langa | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
2021-08-03 11:01:30 | lukasz.langa | set | nosy:
+ lukasz.langa messages:
+ msg398819
|
2021-08-01 23:30:23 | rhettinger | set | messages:
+ msg398722 versions:
- Python 3.9, Python 3.10 |
2021-08-01 19:52:12 | serhiy.storchaka | set | messages:
+ msg398720 |
2021-08-01 19:47:37 | serhiy.storchaka | set | pull_requests:
+ pull_request26045 |
2021-08-01 17:05:18 | rhettinger | set | messages:
+ msg398714 |
2021-08-01 16:53:08 | serhiy.storchaka | set | pull_requests:
+ pull_request26043 |
2021-08-01 16:51:37 | serhiy.storchaka | set | messages:
+ msg398713 |
2021-08-01 15:40:35 | serhiy.storchaka | set | versions:
+ Python 3.9, Python 3.10, Python 3.11, - Python 3.5, Python 3.6, Python 3.7 |
2021-08-01 15:40:20 | serhiy.storchaka | set | messages:
+ msg398708 |
2021-08-01 15:01:50 | rhettinger | set | messages:
+ msg398705 |
2021-07-31 05:47:59 | Dennis Sweeney | set | nosy:
+ Dennis Sweeney messages:
+ msg398631
|
2016-10-30 15:28:13 | python-dev | set | messages:
+ msg279727 |
2016-10-27 08:26:18 | serhiy.storchaka | set | messages:
+ msg279530 |
2016-10-26 00:37:26 | josh.r | set | messages:
+ msg279470 |
2016-10-25 17:59:52 | serhiy.storchaka | set | messages:
+ msg279433 |
2016-10-25 14:31:17 | josh.r | set | messages:
+ msg279408 |
2016-10-25 13:38:08 | josh.r | set | nosy:
+ josh.r messages:
+ msg279405
|
2016-10-25 12:38:50 | python-dev | set | nosy:
+ python-dev messages:
+ msg279401
|
2016-10-25 12:32:37 | serhiy.storchaka | set | assignee: serhiy.storchaka |
2016-09-30 08:12:46 | serhiy.storchaka | set | messages:
+ msg277750 |
2016-09-28 13:30:34 | methane | set | nosy:
+ methane messages:
+ msg277615
|
2016-09-27 11:51:47 | serhiy.storchaka | set | files:
+ ordered_dict_subclass_pop.patch versions:
+ Python 3.7 messages:
+ msg277513
keywords:
+ patch stage: test needed -> patch review |
2016-09-08 03:41:20 | zach.ware | link | issue28014 superseder |
2016-09-08 03:40:58 | zach.ware | set | files:
+ simple_lru.py nosy:
+ zach.ware messages:
+ msg274962
|
2016-07-31 09:20:56 | xiang.zhang | set | messages:
+ msg271733 |
2016-06-10 19:58:07 | kaniini | set | messages:
+ msg268148 |
2016-06-10 18:31:55 | xiang.zhang | set | messages:
+ msg268139 |
2016-06-10 18:28:12 | serhiy.storchaka | set | nosy:
+ eric.snow
|
2016-06-10 18:24:22 | kaniini | set | messages:
+ msg268138 |
2016-06-10 17:31:59 | rhettinger | set | nosy:
+ rhettinger messages:
+ msg268130
|
2016-06-10 14:05:11 | xiang.zhang | set | nosy:
+ xiang.zhang messages:
+ msg268122
|
2016-06-09 08:13:13 | kaniini | set | messages:
+ msg267984 |
2016-06-09 04:57:45 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka
messages:
+ msg267960 stage: test needed |
2016-06-09 04:33:55 | kaniini | create | |