Message279408
Explaining expiringdict's issue: It's two race conditions, with itself, not another thread.
Example scenario (narrow race window):
1. An entry has an expiry time of X (so it will self-delete at time X or later)
2. At time X-1, the PySequence_Contains check is run, and it returns 1 (true)
3. Because contains returned True, at time X PyObject_GetItem is run, but because it's time X, expiringdict's __getitem__ deletes the entry and raises KeyError
An alternative scenario with a *huge* race window is:
1. An entry has an expiry time of X (so it will self-delete at time X or later)
2. No lookups or membership tests or any other operations that implicitly clean the expiring dict occur for a while
3. At time X+1000, _odict_FIRST (in popitem) grabs the first entry in the OrderedDict without invoking the __contains__ machinery that would delete the entry
3. At time X+1000 or so, the PySequence_Contains check is run, and it returns 0 (false), because the __contains__ machinery is invoked, and again, because no default is provided for popitem, a KeyError is raised (this time by the popitem machinery, not __getitem__)
expiringdict is unusually good at bringing this on itself. The failing popitem call is in __setitem__ for limited length expiringdicts, self.popitem(last=False), where they're intentionally removing the oldest entry, when the oldest entry is the most likely to have expired (and since __len__ isn't overridden to expire old entries, it may have been expired for quite a while).
The del self[next(OrderedDict.__iter__(self))] works because they didn't override __iter__, so it's not expiring anything to get the first item, and therefore only __delitem__ is involved, not __contains__ or __getitem__ (note: This is also why the bug they reference has an issue with "OrderedDict mutated during iteration"; iteration returns long expired keys, but looking the expired keys up deletes them, causing the mutation issue).
Possible correct fixes:
1. Make popitem _more_ subclass friendly; when it's an OrderedDict subclass, instead of using _odict_LAST and _odict_FIRST, use (C equivalent) of `next(reversed(self))` and `next(iter(self))` respectively. This won't fix expiringdict as is (because it's broken by design; it doesn't override __iter__, so it will happily return long expired keys that disappear on access), but if we're going for subclass friendliness and allowing non-idempotent __contains__/__getitem__/__iter__ implementations, it's the right thing to do. If expiringdict implemented __iter__ to copy the keys, then loop over the copy, deleting expired values and yielding unexpired values, this would at least reduce the huge race window to a narrow window (because it could still yield a key that's almost expired)
2. Check for the presence of __missing__ on the type and only perform the PySequence_Contains check if __missing__ is defined (to avoid autovivification). This fixes the narrow race condition for subclasses without __missing__, but not the huge race condition
3. Explicitly document the idempotence assumptions made by OrderedDict (specifically, that all non-mutating methods of OrderedDict must not be made mutating in subclasses unless the caller also overrides all multistep operations, e.g. pop/popitem/setdefault).
TL;DR: expiringdict is doing terrible things, assuming the superclass will handle them even though the superclass has completely different assumptions, and therefore expiringdict has only itself to blame. |
|
Date |
User |
Action |
Args |
2016-10-25 14:31:17 | josh.r | set | recipients:
+ josh.r, rhettinger, methane, python-dev, eric.snow, zach.ware, serhiy.storchaka, xiang.zhang, kaniini |
2016-10-25 14:31:17 | josh.r | set | messageid: <1477405877.21.0.834421053007.issue27275@psf.upfronthosting.co.za> |
2016-10-25 14:31:17 | josh.r | link | issue27275 messages |
2016-10-25 14:31:16 | josh.r | create | |
|